Main Article Content

Abstract

The wealth of opinions expressed by users on micro-blogging sites can be beneficial for product manufacturers of service providers, as they can gain insights about certain aspects of their products or services. The most common approach for analyzing text opinion is using machine learning. However. opinion data are often imbalanced, e.g. the number of positive sentiments heavily outnumbered the negative sentiments. Ensemble technique, which combines multiple classification algorithms to make decisions, can be used to tackle imbalanced data to learn from multiple balanced datasets. The decision of ensemble is obtained by combining the decisions of individual classifiers using a certain rule. Therefore, rule selection is an important factor in ensemble design. This research aims to investigate the best decision combination rule for imbalanced text data. Multinomial Naïve Bayes, Complement Naïve Bayes, Support Vector Machine, and Softmax Regression are used for base classifiers, and max, min, product, sum, vote, and meta-classifier rules are considered for decision combination. The experiment is done on several Twitter datasets. From the experimental results, it is found that the Softmax Regression ensemble with meta-classifier combination rule performs the best in all except in one dataset. However, it is also found that the training of the Softmax Regression ensemble requires intensive computational resources.

Article Details

How to Cite
Cahya, R. A., Bachtiar, F. A., & Mahmudy, W. F. (2021). Comparison of Bagging Ensemble Combination Rules for Imbalanced Text Sentiment Analysis. Journal of Information Technology and Computer Science, 6(1), 33–49. https://doi.org/10.25126/jitecs.202161206

References

  1. M. S. Akhtar, D. Gupta, A. Ekbal, and P. Bhattacharyya, “Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis,†Knowledge-Based Syst., vol. 125, pp. 116–135, 2017.
  2. B. Liu, “Sentiment analysis and opinion mining,†Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1, pp. 1–167, 2012.
  3. Y. Zhang et al., “Does deep learning help topic extraction? A kernel k-means clustering method with word embedding,†J. Informetr., vol. 12, no. 4, pp. 1099–1117, 2018.
  4. G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,†Expert Syst. Appl., vol. 73, pp. 220–239, 2017.
  5. L. Muflikhah and D. J. Haryanto, “High performance of polynomial kernel at SVM Algorithm for sentiment analysis,†J. Inf. Technol. Comput. Sci., vol. 3, no. 2, pp. 194–201, 2018.
  6. M. Z. Sarwani and W. F. Mahmudy, “Campus Sentiment Analysis E-Complaint Using Probabilistic Neural Network Algorithm,†J. Ilm. Kursor, vol. 8, no. 3, 2016.
  7. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,†J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1157–1182, 2003.
  8. N. Burns, Y. Bi, H. Wang, and T. Anderson, “Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets BT - Knowledge-Based and Intelligent Information and Engineering Systems,†2011, pp. 161–170.
  9. L. Yijing, G. Haixiang, L. Xiao, L. Yanan, and L. Jinling, “Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data,†Knowledge-Based Syst., vol. 94, pp. 88–104, 2016.
  10. N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,†J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
  11. M. A. Tahir, J. Kittler, K. Mikolajczyk, and F. Yan, “A multiple expert approach to the class imbalance problem using inverse random under sampling,†in International Workshop on Multiple Classifier Systems, 2009, pp. 82–91.
  12. S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri, “Cost-sensitive learning of deep feature representations from imbalanced data,†IEEE Trans. neural networks Learn. Syst., vol. 29, no. 8, pp. 3573–3587, 2017.
  13. R. Polikar, “Ensemble based systems in decision making,†IEEE Circuits Syst. Mag., vol. 6, no. 3, pp. 21–45, 2006.
  14. Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,†Pattern Recognit., vol. 48, no. 5, pp. 1623–1637, 2015.
  15. L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification,†Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.
  16. D. M. Diab and K. M. El Hindi, “Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification,†Appl. Soft Comput., vol. 54, pp. 183–199, 2017.
  17. L. Jiang, L. Zhang, C. Li, and J. Wu, “A correlation-based feature weighting filter for Naive Bayes,†IEEE Trans. Knowl. Data Eng., vol. 31, no. 2, pp. 201–213, 2019.
  18. R. A. Cahya and F. A. Bachtiar, “Weakening Feature Independence of Naïve Bayes Using Feature Weighting and Selection on Imbalanced Customer Review Data,†in 2019 5th International Conference on Science in Information Technology (ICSITech), 2019, pp. 182–187.
  19. M. N. Injadat, F. Salo, and A. B. Nassif, “Data mining techniques in social media: A survey,†Neurocomputing, vol. 214, pp. 654–670, 2016.
  20. M. Jiang et al., “Text classification based on deep belief network and softmax regression,†Neural Comput. Appl., vol. 29, no. 1, pp. 61–70, 2018.
  21. Q. Jiang, W. Wang, X. Han, S. Zhang, X. Wang, and C. Wang, “Deep Feature Weighting In Naive Bayes For Chinese Text Classification,†in Proceedings of CCIS2016, 2016, pp. 1–5.
  22. A. C. Pandey, D. S. Rajpoot, and M. Saraswat, “Twitter sentiment analysis using hybrid cuckoo search method,†Inf. Process. Manag., vol. 53, no. 4, pp. 764–779, Jul. 2017.
  23. L. Breiman, “Bagging predictors,†Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.
  24. D. H. Wolpert, “Stacked generalization,†Neural networks, vol. 5, no. 2, pp. 241–259, 1992.
  25. J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, “Tackling the poor assumptions of naive bayes text classifiers,†in Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 616–623.
  26. L. Bottou, “Large-scale machine learning with stochastic gradient descent,†in Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177–186.
  27. T. Liu, “A novel text classification approach based on deep belief network,†in International Conference on Neural Information Processing, 2010, pp. 314–321.
  28. B. Zadrozny and C. Elkan, “Transforming classifier scores into accurate multiclass probability estimates,†in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 694–699.
  29. D. Agnihotri, K. Verma, and P. Tripathi, “Variable Global Feature Selection Scheme for automatic classification of text documents,†Expert Syst. Appl., vol. 81, pp. 268–281, 2017.