The Influence of Word Vectorization for Kawi Language to Indonesian Language Neural Machine Translation
DOI:
https://doi.org/10.25126/jitecs.202271387Abstract
People relatively use machine translation to learn any textual knowledge beyond their native language. There is already robust machine translation such as Google translate. However, the language list has only covered the high resource language such as English, France, etc., but not for Kawi Language as one of the local languages used in Bali's old works of literature. Therefore, it is necessary to study the development of machine translation from the Kawi language to the more active user language such as the Indonesian language to make easier learning access for the young learner. The research developed the neural machine translation (NMT) using recurrent neural network (RNN) based neural models and analyzed the influence of word vectorization using Word2Vec for the machine translation performance based on BLEU scores. The result shows that word vectorization indeed significantly increases the NMT models performance, and Long-Short Term Memory (LSTM) with attention mechanism has the highest BLEU scores equal to 20.86. The NMT models still could not achieve the BLEU scores on par with those human experts and high resource language machine translation. On the other hand, this initial study could be the reference for the future development of Kawi to Indonesian NMT.
References
T. F. Kai and T. K. Hua. 2021. Enhancing english language vocabulary learning among indigenous learners through google translate. Journal of Education and e-Learning Research, vol. 8, no. 2, pp. 143–148, 2021, doi: 10.20448/JOURNAL.509.2021.82.143.148.
H. Bahri and T. S. T. Mahadi. 2016. Google translate as a supplementary tool for learning Malay: A case study at Universiti Sains Malaysia. Advances in Language and Literary Studies, vol. 7, no. 3, pp. 161–167.
M. S. Zurbuchen, Introduction to Old Javanese language and literature: A Kawi prose anthology. University of Michigan Press, 2020.
A. A. G. e A. Geria. 2020. Lontar: Tradisi Hidup Dan Lestari Di Bal. Media Pustakawan, vol. 17, no. 1, pp. 39–45.
N. S. Ardiyasa and M. K. S. E-mail. 2021. Eksistensi Naskah Lontar Masyarakat Bali ( Studi Kasus Hasil Pemetaan Penuyuluh Bahasa Bali Tahun 2016-2018 ). Vol. 11, no. 1.
P. Koehn. 2005. Europarl : A Parallel Corpus for Statistical Machine Translation. MT Summit, vol. 11, pp. 79--86.
P. Koehn, F. J. Och, and D. Marcu, “Statistical Phrase-Based Translation,” 2003.
S. Yang, Y. Wang, and X. Chu. 2020. A Survey of Deep Learning Techniques for Neural Machine Translation.
D. Bahdanau, K. H. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.
M. K. Vathsala and G. Holi. 2020. RNN based machine translation and transliteration for Twitter data. Int J Speech Technol, vol. 23, no. 3, pp. 499–504.
R. Sennrich and B. Zhang. 2020. Revisiting low-resource neural machine translation: A case study. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 211–221, doi: 10.18653/v1/p19-1021.
I. M. Suweta. 2019. BAHASA DAN SASTRA BALI DALAM KONTEKS BAHASA DAN SASTRA JAWA KUNA. Widyacarya: Jurnal Pendidikan, Agama dan Budaya, vol. 3, no. 1, pp. 1–12.
I. N. Warta. 2018. Peran Wasi Dalam Pembinaan Umat,” Widya Aksara, vol. 23, no. 2.
R. B. Allen. 1989. Sequence generation with connectionist state machines,” IJCNN Int Jt Conf Neural Network, p. 593, doi: 10.1109/ijcnn.1989.118376.
R. B. Allen. 1987. Several Studies on Natural Language and Back-Propagation..
L. Sehovac and K. Grolinger. 2020. Deep Learning for Load Forecasting: Sequence to Sequence Recurrent Neural Networks with Attention. IEEE Access, vol. 8, pp. 36411–36426, doi: 10.1109/ACCESS.2020.2975738.
S. A. B. Andrabi and A. Wahid. 2022. Machine Translation System Using Deep Learning for English to Urdu. Comput Intell Neurosci, vol. 2022, doi: 10.1155/2022/7873012.
S. H. Kumhar, M. M. Kirmani, J. Sheetlani, and M. Hassan. 2021. Word Embedding Generation for Urdu Language using Word2vec model. Mater Today Proc, doi: 10.1016/j.matpr.2020.11.766.
I. W. Suardiana. 2020. Kesusastraan Bali Purwa. Diakses pada tanggal, vol. 18, 2020.
I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, vol. 4, no. January, pp. 3104–3112.
G. Tiwari, A. Sharma, A. Sahotra, and R. Kapoor. 2020. English-Hindi Neural Machine Translation-LSTM Seq2Seq and ConvS2S. Proceedings of the 2020 IEEE International Conference on Communication and Signal Processing, ICCSP 2020, pp. 871–875, doi: 10.1109/ICCSP48568.2020.9182117.
A. A. G. A. Geria. 2017. Musala Parwa: Lontar, Teks Kawi Latin, dan Terjemahan Bali-Indonesia. Paramita.
S. Saini and V. Sahula. 2018. Neural Machine Translation for English to Hindi,” Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018, pp. 25–30, doi: 10.1109/INFRKM.2018.8464781.
Q. H. Nguyen et al.. 2021. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Mathematical Problems in Engineering, vol. 2021, doi: 10.1155/2021/4832864.
R. Rahman. 2020. Robust and Consistent Estimation of Word Embedding for Bangla Language by fine-tuning Word2Vec Model. ICCIT 2020 - 23rd International Conference on Computer and Information Technology, Proceedings, pp. 19–21, doi: 10.1109/ICCIT51783.2020.9392738.
J. Chung, C. Gulcehre, K. Cho, and Y. B. B. T.-P. of the 32nd I. C. on M. Learning, “Gated Feedback Recurrent Neural Networks,” vol. 37. PMLR, pp. 2067–2075.
G. Liu and J. Guo. 2019. Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomputing, vol. 337, pp. 325–338.
A. Sherstinsky. 2020. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, doi: https://doi.org/10.1016/j.physd.2019.132306.
S. Yang, X. Yu, and Y. Zhou. 2020. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example,” Proceedings - 2020 International Workshop on Electronic Communication and Artificial Intelligence, IWECAI 2020, pp. 98–101, doi: 10.1109/IWECAI50956.2020.00027.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2001. BLEU: a method for automatic evaluation of machine translation,” Acl, pp. 311–318, doi: 10.3115/1073083.1073135.
“Evaluating models | AutoML Translation Documentation | Google Cloud.” https://cloud.google.com/translate/automl/docs/evaluate (accessed Oct. 22, 2021).
L. L. Tan, J. Dehdari, and J. van Genabith. 2015. An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation. Proceedings of the Workshop on Asian Translation (WAT-2015). Workshop on Asian Translation (WAT-15), October 16, Kyoto, Japan, no. October, pp. 74–81.
Downloads
Published
How to Cite
Issue
Section
License
 Creative Common Attribution-ShareAlike 3.0 International (CC BY-SA 3.0)
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).