Utilizing Indonesian Universal Language Model Fine-tuning for Text Classification


Hendra Bunyamin


Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision  applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.

Full Text:



Altınel, B., Ganiz, M.C., Diri, B.: A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence 43, 54 – 66 (2015). https://doi.org/10.1016/j.engappai.2015.03.015, http://www.sciencedirect.com/science/article/pii/S0952197615000809

Anshori, M., Mahmudy, W.F., Supianto, A.A.: Classification tuberculosis dna using lda-svm. Journal of Information Technology and Computer Science 4(3), 233–240 (2019). https://doi.org/10.25126/jitecs.201943113

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv e-prints abs/1409.0473 (Sep 2014), https://arxiv. org/abs/1409.0473

Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nature communications 5, 4308 (2014)

Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)

Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1800–1807 (2017)

Chollet, F.: Deep Learning with Python 1st Edition. Manning Publications (2017)

CireşAn, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural networks 32, 333–338 (2012)

Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013)

Dahl, G.E., Jaitly, N., Salakhutdinov, R.: Multi-task neural networks for QSAR predictions. CoRR abs/1406.1231 (2014), http://arxiv.org/abs/1406.1231

Eisenstein, J.: Introduction to Natural Language Processing. Adaptive Computation and Machine Learning series, MIT Press (2019), https://books.google.co.id/books?id=72yuDwAAQBAJ

Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence 35(8), 1915–1929 (2013)

Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. CoRR abs/1509.06113 (2015), http://arxiv.org/abs/1509.06113

Gershgorn, D.: How to develop lstm models for time series forecast-

ing. https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-

research-and-possibly-the-world/ (2017), accessed: 2019-01-15

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29(6), 82–97 (2012)

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)

Howard, J., Ruder, S.: Introducing state of the art text classification with

universal language models. http://nlp.fast.ai/classification/2018/05/15/

introducting-ulmfit.html (2018), accessed: 2019-01-15

Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp. 328–339 (2018)

Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 448–456. PMLR, Lille, France (07–09 Jul 2015), http://proceedings.mlr.press/v37/ioffe15.html

Knowles-Barley, S., Jones, T.R., Morgan, J., Lee, D., Kasthuri, N., Lichtman, J.W., Pfister, H.: Deep learning for the connectome. In: GPU Technology Conference. vol. 26 (2014)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)

Lewis-Kraus, G.: The great a.i. awakening. https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html (2016), accessed: 2019-01-15

Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer (2013)

Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval (2008)

Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. CoRR abs/1708.02182 (2017), http://arxiv.org/abs/1708.02182

Merity, S., McCann, B., Socher, R.: Revisiting activation regularization for language rnns. CoRR abs/1708.01009 (2017), http://arxiv.org/abs/1708.01009

Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications 106, 36 – 54 (2018). https://doi.org/10.1016/j.eswa.2018.03.058, http://www.sciencedirect.com/science/article/pii/S095741741830215X

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

Ng, A.Y.: Machine Learning Yearning: Technical Strategy for AI Engineers, In the Era of Deep Learning. deeplearning.ai (2018)

Nielsen, M.A.: Neural networks and deep learning, vol. 25. Determination press San Francisco, CA, USA: (2015)

Oktaria, A.S., Prakasa, E., Suhartono, E.: Wood species identification using convolutional neural network (cnn) architectures on macroscopic images. Journal of Information Technology and Computer Science 4(3), 274–283 (2019). https://doi.org/10.25126/jitecs.201943155

Pinheiro, R.H., Cavalcanti, G.D., Tsang, I.R.: Combining dissimilarity

spaces for text categorization. Information Sciences 406-407, 87 – 101

(2017). https://doi.org/10.1016/j.ins.2017.04.025, http://www.sciencedirect.


Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018), http://arxiv.org/abs/1801.04381

Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3626–3633 (2013)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http://arxiv.org/abs/1409.1556

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:

Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014)

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems. pp. 3104–3112 (2014)

Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015)

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A Survey on Deep

Transfer Learning: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4–7, 2018, Proceedings, Part III, pp. 270–279 (10 2018)

Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning. pp. 1058–1066 (2013)

Wang, D., Wu, J., Zhang, H., Xu, K., Lin, M.: Towards enhancing centroid classifier for text classification—a border-instance approach. Neurocomputing 101, 299 – 308 (2013). https://doi.org/10.1016/j.neucom.2012.08.019, http://www.sciencedirect.com/science/article/pii/S0925231212006595

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3320–3328. Curran Associates, Inc. (2014), http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf

DOI: http://dx.doi.org/10.25126/jitecs.202053215