Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur'an Translation Clustering with K-Means Algorithm

Authors

  • M Didik R Wahyudi UIN Sunan Kalijaga Yogyakarta

DOI:

https://doi.org/10.25126/jitecs.202162295

Abstract

The Al-Quran translation index issued by the Ministry of Religion can be used in text mining to search for similar patterns of Al-Quran translation. This study performs sentence grouping using the K-Means Clustering algorithm and three weighting scheme models of the TF-IDF algorithm to get the best performance of the Tf-IDF algorithm. From the three models of the TF-IDF algorithm weighting scheme, the highest percentage results were obtained in the traditional TF-IDF weighting scheme, namely 62.16% with an average percentage of 36.12% and a standard deviation of 12.77%. The smallest results are shown in the TF-IDF 1 normalization weighting scheme, namely 48.65% with an average percentage of 25.65% and a standard deviation of 10.16%. The smallest standard deviation results in a normalized 2 TF-IDF weighting of 8.27% with an average percentage of 28.15% and the largest percentage weighting of 48.65% which is the same as the normalized TF-IDF 1 weighting.

References

T. Khotimah, “Pengelompokan Surat Dalam Al Qur ’ an Menggunakan Algoritma K-Means,” J. SIMETRIS, vol. 5, no. 1, pp. 83–88, 2014.

C. C. Aggarwal and C. X. Zhai, Mining text data, vol. 9781461432. Springer US, 2013.

G. Domeniconi, G. Moro, R. Pasolini, and C. Sartori, “A Comparison of Term Weighting Schemes for Text Classification and Sentiment Analysis with a Supervised Variant of tf.idf,” Commun. Comput. Inf. Sci., vol. 584, no. February, p. v, 2016, doi: 10.1007/978-3-319-30162-4.

S. Albitar and B. Espinasse, “An Effective TF / IDF-based Text-to-Text Semantic Similarity Measure for Text Classification,” no. January 2015, 2014, doi: 10.1007/978-3-319-11749-2.

C. Hiram, “Simple TF-IDF Is Not the Best You Can Get for Regionalism Classification,” 2014, doi: https://doi.org/10.1007/978-3-642-54906-9_8.

M. D. R.Wahyudi, “Penerapan Algoritma Cosine Similarity pada Text Mining Terjemah Al- Qur ’ an Berdasarkan Keterkaitan Topik,” Semesta Tek., vol. 22, no. 1, pp. 41–50, 2019, doi: 10.18196/st.221235.

R. K. Roul, “Modified TF-IDF Term Weighting Strategies for Text Categorization,” no. October, 2018, doi: 10.1109/INDICON.2017.8487593.

B. K. Hananto, A. Pinandito, and A. P. Kharisma, “Penerapan Maximum TF- IDF Normalization Terhadap Metode KNN Untuk Klasifikasi Dataset Multiclass Panichella Pada Review Aplikasi Mobile,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 12, pp. 6812–6823, 2018.

M. Zed, Metode Penelitian Kepustakaan. Yayasan Pustaka Obor Indonesia, 2004.

A. Rajaraman and J. D. Ullman, Mining of massive datasets, vol. 9781107015. Cambridge: Cambridge University Press, 2011.

B. Joeran, C. Breitinger, B. Gipp, and S. Langer, “Research-paper recommender systems: a literature survey,” Int. J. Digit. Libr., vol. 17, no. 4, pp. 305–338, 2015.

D. Pedregosa, F., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825--2830, 2011, [Online]. Available: https://scikit- learn.org/stable/modules/feature_extraction.html.

A. Singhal, C. Buckley, and M. Mitra, “Pivoted document length normalization,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’96, 1996, pp. 21–29, doi: 10.1145/243199.243206.

S. Andayani, “Formation of clusters in Knowledge Discovery in Databases by Algorithm K-Means,” SEMNAS Mat. dan Pendidik. Mat. 2007, 2007.

Y. Agusta, “K-Means – Penerapan, Permasalahan dan Metode Terkait,” J. Sist. dan Inform., vol. 3, 2007.

B. Santosa, Data mining teknik pemanfaatan data untuk keperluan bisnis. Yogyakarta: Graha Ilmu, 2007.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

Al Fatih:Mushaf Al Qur’an Tafsir Per Kata Kode Arab. Jakarta: PT Insan Media Pustaka, 2013.

Hani M. Atiyyah, Quranic Text: Toward a Retrieval System. 1996.

A. Singhal, G. Salton, M. Mitra, and C. Buckley, “Pivot Document length normalization,” Inf. Process. Manag., vol. 32, no. 5, pp. 619–633, 1996, doi: 10.1016/0306-4573(96)00008-8.

Downloads

Published

2021-09-03

Issue

Section

Articles

How to Cite

Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur’an Translation Clustering with K-Means Algorithm. (2021). Journal of Information Technology and Computer Science, 6(2), 117-129. https://doi.org/10.25126/jitecs.202162295