Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN

Authors

  • Marji Marji Computer Science Faculty Brawijaya University
  • Imam Cholissodin Brawijaya University
  • Dian Eka Ratnawati Brawijaya University
  • Edy Santoso Brawijaya University
  • Nurul Hidayat Brawijaya University

DOI:

https://doi.org/10.25126/jitecs.202271401

Abstract

Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method.

References

R. Kurnianti. 2013. Penggunaan Metode Pengelompokan K-Means pada klasifikasi KNN untuk penentuan jenis kanker berdasarkan susunan protein. Skripsi PTIIK UB.

Retwitasari, A., 2016. Penentuan Jenis Kanker Berdasarkan Struktur Protein Menggunakan Algoritma Modified K-Nearest Neighbor (MKNN). Skripsi PTIIK UB.

Wulandari, T. 2018. Classification Of Cancer Types Based On Protein Structure Using The Naive Bayes Algorithm, address http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/2718. Skripsi Filkom UB.

Rizby, L. P. 2018. Clustering pasien kanker berdasarkan struktur protein dalam tubuh menggunakan metode K-Medoids, alamat http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/2740, Skripsi Filkom UB.

Satria, A., 2018. Klasifikasi Jenis Kanker Berdasarkan Struktur Protein Menggunakan Metode Neighbor Weighted K-Nearest Neighbor (NWKNN), alamat : http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/ view/4988, Skripsi PTIIK UB.

Utami, T. N., 2018. Implementasi Fuzzy k-Nearest Neighbor (Fk-NN) untuk Klasifikasi Jenis Kanker berdasarkan Susunan Protein, address : http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4105, Skripsi PTIIK UB.

Wang, J. T., et al., 2006. Data mining in bioinformatic (Advanced information and knowledge processing). Berlin Heidelberg: Springer London.

BioNinja, “Transcription and Translation,” [online] Available at: < http://www.old-ib.bioninja.com.au/standard-level/topic-3-chemicals-of-life/35-transcription-and-transl.html >. [Accessed January, 29 2020]

ThoughtCo, “Learn About the 4 Types of Protein Structure,” [online] Available at: < https://www.thoughtco.com/protein-structure-373563 >, 2019. [Accessed Jan, 29 2020]

Murray, R. K., Granner, D. K., and Rodwell, V. W. 2006. Harper's Illustrated Biochemistry (27 ed.). The McGraw-Hill Companies inc.

Keedwell, E., and Narayanan, A. 2005. Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems. Hoboken, New Jersey: John Wiley & Sons, Inc.

Pusztai, L., Lewis, C., and Yap, E. 1996. Cell Proliferation in Cancer- Regulation Mechanisms of Neoplastic Cell Growth. Oxford: Oxford University Press.

Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning Second, New York: Springer-Verlag.

scikit-learn, “Cross-validation: evaluating estimator performance,” [online] Available at: < https://scikit-learn.org/stable/modules/cross_validation.html >, 2007 - 2019. [Accessed Jan, 29 2020]

Baharsyah, I., Cholissodin, I., and Setiawan, B. D. 2014. Klasifikasi Deep Sentiment Analysis E-Complaint Universitas Brawijaya Menggunakan Metode K-Nearest Neighbor," in Journal PTIIK Doro, 2014. Doro 2014. Vol. 3 no. 8.

Afandie, M. N., Cholissodin, I., and Supianto, A. A., 2014. Implementasi Metode K-Nearest Neighbor Untuk Pendukung Keputusan Pemilihan Menu Makanan Sehat Dan Bergizi in Journal PTIIK Doro, 2014. Doro 2014. Vol. 3 no. 1.

Downloads

Published

2022-09-29

How to Cite

Marji, M., Cholissodin, I., Eka Ratnawati, D., Santoso, E., & Hidayat, N. (2022). Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN. Journal of Information Technology and Computer Science, 7(1), 110–116. https://doi.org/10.25126/jitecs.202271401

Issue

Section

Articles