Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur'an Translation Clustering with K-Means Algorithm
DOI:
https://doi.org/10.25126/jitecs.202162295Abstract
The Al-Quran translation index issued by the Ministry of Religion can be used in text mining to search for similar patterns of Al-Quran translation. This study performs sentence grouping using the K-Means Clustering algorithm and three weighting scheme models of the TF-IDF algorithm to get the best performance of the Tf-IDF algorithm. From the three models of the TF-IDF algorithm weighting scheme, the highest percentage results were obtained in the traditional TF-IDF weighting scheme, namely 62.16% with an average percentage of 36.12% and a standard deviation of 12.77%. The smallest results are shown in the TF-IDF 1 normalization weighting scheme, namely 48.65% with an average percentage of 25.65% and a standard deviation of 10.16%. The smallest standard deviation results in a normalized 2 TF-IDF weighting of 8.27% with an average percentage of 28.15% and the largest percentage weighting of 48.65% which is the same as the normalized TF-IDF 1 weighting.
References
T. Khotimah, “Pengelompokan Surat Dalam Al Qur ’ an Menggunakan Algoritma K-Means,” J. SIMETRIS, vol. 5, no. 1, pp. 83–88, 2014.
C. C. Aggarwal and C. X. Zhai, Mining text data, vol. 9781461432. Springer US, 2013.
G. Domeniconi, G. Moro, R. Pasolini, and C. Sartori, “A Comparison of Term Weighting Schemes for Text Classification and Sentiment Analysis with a Supervised Variant of tf.idf,” Commun. Comput. Inf. Sci., vol. 584, no. February, p. v, 2016, doi: 10.1007/978-3-319-30162-4.
S. Albitar and B. Espinasse, “An Effective TF / IDF-based Text-to-Text Semantic Similarity Measure for Text Classification,” no. January 2015, 2014, doi: 10.1007/978-3-319-11749-2.
C. Hiram, “Simple TF-IDF Is Not the Best You Can Get for Regionalism Classification,” 2014, doi: https://doi.org/10.1007/978-3-642-54906-9_8.
M. D. R.Wahyudi, “Penerapan Algoritma Cosine Similarity pada Text Mining Terjemah Al- Qur ’ an Berdasarkan Keterkaitan Topik,” Semesta Tek., vol. 22, no. 1, pp. 41–50, 2019, doi: 10.18196/st.221235.
R. K. Roul, “Modified TF-IDF Term Weighting Strategies for Text Categorization,” no. October, 2018, doi: 10.1109/INDICON.2017.8487593.
B. K. Hananto, A. Pinandito, and A. P. Kharisma, “Penerapan Maximum TF- IDF Normalization Terhadap Metode KNN Untuk Klasifikasi Dataset Multiclass Panichella Pada Review Aplikasi Mobile,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 12, pp. 6812–6823, 2018.
M. Zed, Metode Penelitian Kepustakaan. Yayasan Pustaka Obor Indonesia, 2004.
A. Rajaraman and J. D. Ullman, Mining of massive datasets, vol. 9781107015. Cambridge: Cambridge University Press, 2011.
B. Joeran, C. Breitinger, B. Gipp, and S. Langer, “Research-paper recommender systems: a literature survey,” Int. J. Digit. Libr., vol. 17, no. 4, pp. 305–338, 2015.
D. Pedregosa, F., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825--2830, 2011, [Online]. Available: https://scikit- learn.org/stable/modules/feature_extraction.html.
A. Singhal, C. Buckley, and M. Mitra, “Pivoted document length normalization,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’96, 1996, pp. 21–29, doi: 10.1145/243199.243206.
S. Andayani, “Formation of clusters in Knowledge Discovery in Databases by Algorithm K-Means,” SEMNAS Mat. dan Pendidik. Mat. 2007, 2007.
Y. Agusta, “K-Means – Penerapan, Permasalahan dan Metode Terkait,” J. Sist. dan Inform., vol. 3, 2007.
B. Santosa, Data mining teknik pemanfaatan data untuk keperluan bisnis. Yogyakarta: Graha Ilmu, 2007.
F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
Al Fatih:Mushaf Al Qur’an Tafsir Per Kata Kode Arab. Jakarta: PT Insan Media Pustaka, 2013.
Hani M. Atiyyah, Quranic Text: Toward a Retrieval System. 1996.
A. Singhal, G. Salton, M. Mitra, and C. Buckley, “Pivot Document length normalization,” Inf. Process. Manag., vol. 32, no. 5, pp. 619–633, 1996, doi: 10.1016/0306-4573(96)00008-8.
Downloads
Published
Issue
Section
License
Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).