Main Article Content

Abstract

Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision  applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.

Article Details

Author Biography

Hendra Bunyamin, Informatics Engineering Maranatha Christian University Jl. Prof. drg. Surya Sumantri, M.P.H. No. 65, Bandung, West Java, Indonesia

Hendra Bunyamin is a senior lecturer at Informatics Engineering Maranatha Christian University. Mainly, he teaches Mathematics and Programming. His research interests are machine learning and its applications.

How to Cite
Bunyamin, H. (2021). Utilizing Indonesian Universal Language Model Fine-tuning for Text Classification. Journal of Information Technology and Computer Science, 5(3), 325–337. https://doi.org/10.25126/jitecs.202053215

References

  1. Altınel, B., Ganiz, M.C., Diri, B.: A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence 43, 54 – 66 (2015). https://doi.org/10.1016/j.engappai.2015.03.015, http://www.sciencedirect.com/science/article/pii/S0952197615000809
  2. Anshori, M., Mahmudy, W.F., Supianto, A.A.: Classification tuberculosis dna using lda-svm. Journal of Information Technology and Computer Science 4(3), 233–240 (2019). https://doi.org/10.25126/jitecs.201943113
  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv e-prints abs/1409.0473 (Sep 2014), https://arxiv. org/abs/1409.0473
  4. Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nature communications 5, 4308 (2014)
  5. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)
  6. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1800–1807 (2017)
  7. Chollet, F.: Deep Learning with Python 1st Edition. Manning Publications (2017)
  8. CireşAn, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural networks 32, 333–338 (2012)
  9. Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013)
  10. Dahl, G.E., Jaitly, N., Salakhutdinov, R.: Multi-task neural networks for QSAR predictions. CoRR abs/1406.1231 (2014), http://arxiv.org/abs/1406.1231
  11. Eisenstein, J.: Introduction to Natural Language Processing. Adaptive Computation and Machine Learning series, MIT Press (2019), https://books.google.co.id/books?id=72yuDwAAQBAJ
  12. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence 35(8), 1915–1929 (2013)
  13. Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. CoRR abs/1509.06113 (2015), http://arxiv.org/abs/1509.06113
  14. Gershgorn, D.: How to develop lstm models for time series forecast-
  15. ing. https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
  16. research-and-possibly-the-world/ (2017), accessed: 2019-01-15
  17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)
  19. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29(6), 82–97 (2012)
  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
  21. Howard, J., Ruder, S.: Introducing state of the art text classification with
  22. universal language models. http://nlp.fast.ai/classification/2018/05/15/
  23. introducting-ulmfit.html (2018), accessed: 2019-01-15
  24. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp. 328–339 (2018)
  25. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 448–456. PMLR, Lille, France (07–09 Jul 2015), http://proceedings.mlr.press/v37/ioffe15.html
  26. Knowles-Barley, S., Jones, T.R., Morgan, J., Lee, D., Kasthuri, N., Lichtman, J.W., Pfister, H.: Deep learning for the connectome. In: GPU Technology Conference. vol. 26 (2014)
  27. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)
  28. Lewis-Kraus, G.: The great a.i. awakening. https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html (2016), accessed: 2019-01-15
  29. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer (2013)
  30. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval (2008)
  31. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. CoRR abs/1708.02182 (2017), http://arxiv.org/abs/1708.02182
  32. Merity, S., McCann, B., Socher, R.: Revisiting activation regularization for language rnns. CoRR abs/1708.01009 (2017), http://arxiv.org/abs/1708.01009
  33. Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications 106, 36 – 54 (2018). https://doi.org/10.1016/j.eswa.2018.03.058, http://www.sciencedirect.com/science/article/pii/S095741741830215X
  34. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
  35. Ng, A.Y.: Machine Learning Yearning: Technical Strategy for AI Engineers, In the Era of Deep Learning. deeplearning.ai (2018)
  36. Nielsen, M.A.: Neural networks and deep learning, vol. 25. Determination press San Francisco, CA, USA: (2015)
  37. Oktaria, A.S., Prakasa, E., Suhartono, E.: Wood species identification using convolutional neural network (cnn) architectures on macroscopic images. Journal of Information Technology and Computer Science 4(3), 274–283 (2019). https://doi.org/10.25126/jitecs.201943155
  38. Pinheiro, R.H., Cavalcanti, G.D., Tsang, I.R.: Combining dissimilarity
  39. spaces for text categorization. Information Sciences 406-407, 87 – 101
  40. (2017). https://doi.org/10.1016/j.ins.2017.04.025, http://www.sciencedirect.
  41. com/science/article/pii/S0020025517306722
  42. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018), http://arxiv.org/abs/1801.04381
  43. Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3626–3633 (2013)
  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http://arxiv.org/abs/1409.1556
  45. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:
  46. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014)
  47. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems. pp. 3104–3112 (2014)
  48. Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015)
  49. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A Survey on Deep
  50. Transfer Learning: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4–7, 2018, Proceedings, Part III, pp. 270–279 (10 2018)
  51. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning. pp. 1058–1066 (2013)
  52. Wang, D., Wu, J., Zhang, H., Xu, K., Lin, M.: Towards enhancing centroid classifier for text classification—a border-instance approach. Neurocomputing 101, 299 – 308 (2013). https://doi.org/10.1016/j.neucom.2012.08.019, http://www.sciencedirect.com/science/article/pii/S0925231212006595
  53. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3320–3328. Curran Associates, Inc. (2014), http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf