Classification Tuberculosis DNA using LDA-SVM

Author

Mochammad Anshori, Wayan Firdaus Mahmudy, Ahmad Afif Supianto

Abstract

Tuberculosis is a disease caused by the mycobacterium tuberculosis virus. Tuberculosis is very dangerous and it is included in the top 10 causes of the death in the world. In its detection, errors often occur because it is similar to other diffuse lungs. The challenge is how to better detect using DNA sequence data from mycobacterium tuberculosis. Therefore, preprocessing data is necessary. Preprocessing method is used for feature extraction, it is k-Mer which is then processed again with TF-IDF. The use of dimensional reduction is needed because the data is very large. The used method is LDA. The overall result of this study is the best k value is k = 4 based on the experiment. With performance evaluation accuracy = 0.927, precision = 0.930, recall = 0.927, F score = 0.924, and MCC = 0.875 which obtained from extraction using TF-IDF and dimension reduction using LDA.

Full Text:

PDF

References


S. Asia, W. Paci, I. Congress, T. Evolution, and T. B. E. Meeting, “Tuberculosis in evolution,” no. April, pp. 3–5, 2015.

S. A. Yimer, G. Norheim, A. Namouchi, E. D. Zegeye, W. Kinander, and T. Tønjum, “Mycobacterium tuberculosis Lineage 7 Strains Are Associated with Prolonged Patient Delay in Seeking Treatment for Pulmonary Tuberculosis in Amhara Region , Ethiopia,” J. Clin. Microbiol., vol. 53, no. 4, pp. 1301–1309, 2015.

R. De Janeiro, “Artificial Neural Network Models for Diagnosis Support of Drug and Multidrug Resistant Tuberculosis,” Lat. Am. Congr. Comput. Intell., pp. 1–5, 2015.

Y. Zhan, B. Li, Y. Huo, A. Lin, and H. Wu, “A case of multiple organ tuberculosis,” Radiol. Infect. Dis., pp. 0–4, 2018.

J. T. Wassan, H. Wang, and H. Zheng, “Machine Learning in Bioinformatics,” Encycl. Bioinforma. Comput. Biol., pp. 300–308, 2019.

W. Ashlock and S. Datta, “Evolved features for DNA sequence classification and their fitness landscapes,” IEEE Trans. Evol. Comput., vol. 17, no. 2, pp. 185–197, 2013.

M. Martínez-porchas and F. Vargas-albores, “An efficient strategy using k-mers to analyse 16S rRNA sequences,” Heliyon, no. May, p. e00370, 2017.

G. Han and D. Cho, “Genomics Genome classification improvements based on k-mer intervals in sequences,” Genomics, no. October, pp. 0–1, 2018.

S. Ilias, N. Tahir, R. Jailani, and S. Alam, “Feature Extraction of Autism Gait Data Using Principal Component Analysis and Linear Discriminant Analysis,” 2016 IEEE Ind. Electron. Appl. Conf., pp. 275–279, 2016.

D. Novitasari, I. Cholissodin, and W. F. Mahmudy, “Optimizing SVR using Local Best PSO for Software Effort Estimation,” J. Inf. Technol. Comput. Sci., vol. 1, no. 1, pp. 28–37, 2016.

D. Novitasari, I. Cholissodin, and W. F. Mahmudy, “Hybridizing PSO with SA for Optimizing SVR Applied to Software Effort Estimation,” TELKOMNIKA, vol. 14, no. 1, pp. 245–253, 2016.

D. Phan, N. G. Nguyen, F. R. Lumbanraja, and M. R. Faisal, “Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification,” J. Biomed. Sci. Eng., vol. 10, no. 8, pp. 390–401, 2017.

A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Syst. Appl., vol. 57, pp. 117–126, 2016.

Y. Wang and Y. Chen, “A New Feature Extraction Algorithm Based on Fisher Linear Discriminant Analysis,” 2017 3rd Int. Conf. Control. Autom. Robot., no. 1, pp. 414–417.

V. N. Boser, Bernhard E. and Guyon, Isabelle M. and Vapnik, “Training Algorithm Margin for Optimal Classifiers,” COLT ’92 Proc. fifth Annu. Work. Comput. Learn. theory, pp. 144–152, 1992.




DOI: http://dx.doi.org/10.25126/jitecs.201943113