Development of Big Data App for Classification based on Map Reduce of Naive Bayes with or without Web and Mobile Interface by RESTful API Using Hadoop and Spark


Imam Cholissodin, Diajeng Sekar Seruni, Junda Alfiah Zulqornain, Audi Nuermey Hanafi, Afwan Ghofur, Mikhael Alexander, Muhammad Ismail Hasan


Big Data App is a developed framework that we made based on our previous project research and we have uploaded it on github, which is developing lightweight serverless both on Windows and Linux OS with the term of EdUBig as Open Source Hadoop Distribution. In this study, the focus is on solving problems related to difficulties in building a frontend and backend model of a Big Data application which by default only runs scripts through consoles in the terminal. This will be quite a tribulation for the end users when the Big Data application has been released and mass produced to general users (end users) and at the same time how the end users test the performance of the Map Reduce Naive Bayes algorithm used in several datasets. In accordance to these problems, we created the Big Data App framework to make the end users, especially developers, feel easier to build a Big Data application by integrating the frontend using the Web App from Django framework and Mobile App Native, while for the backend, we use Django framework that is able to communicate directly with the script either hadoop batch, streaming processing or spark streaming very easily and also to use the script for pig, hive, web hdfs, sqoop, oozie, etc. the making of which is extremely fast with reliable results. Based on the test results, a very significant result in the ease of data computation processing by the end users and the final results showing the highest classification accuracy of 88.3576% was obtained.

Keywords: big data, map reduce of naive bayes, serverless, web and mobile app, restful api, django framework

Full Text:



Juneja, P., and Kaur, P., (2019). "Software Engineering for Big Data Application Development: Systematic Literature Survey Using Snowballing," 2019 International Conference on Computing, Power and Communication Technologies (GUCON), NCR New Delhi, India, 2019, pp. 492-496.

Hui, Y., and Zesong, L., (2019). "Research on Real-time Analysis and Hybrid Encryption of Big Data," 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 2019, pp. 52-55, doi: 10.1109/ICAIBD.2019.8836992.

Gunaratna, K., Anderson, P., Ranabahu, A., and Sheth, A., (2010). "A Study in Hadoop Streaming with Matlab for NMR Data Processing," 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, IN, 2010, pp. 786-789, doi: 10.1109/CloudCom.2010.70.

Hiranandani, P., Pilli, E. S., Chand, N., Ramakrishna, C., and Gupta, M., (2018). "Big Data Analytics Using Multi-Classifier Approach with Rhadoop," 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, 2018, pp. 478-484, doi: 10.1109/CONFLUENCE.2018.8442876.

Khader, M., Awajan, A. and Al-Naymat, G., (2018). "The Effects of Natural Language Processing on Big Data Analysis: Sentiment Analysis Case Study," 2018 International Arab Conference on Information Technology (ACIT), Werdanye, Lebanon, 2018, pp. 1-7, doi: 10.1109/ACIT.2018.8672697.

Herraiz, I., (2018). “Notebooks are not enough: how to deliver machine learning products without getting killed”. accessed on July 30, 2020.

Miao, K., Li, J., Hong, W. & Chen, M. (2020). “A Microservice-Based Big Data Analysis Platform for Online Educational Applications”. Annual Review of Anthropology, 2020, [ "6929750" ]. Available from:

Roy, S., et al., (2017). "IoT, big data science & analytics, cloud computing and mobile app based hybrid system for smart agriculture," 2017 8th Annual Industrial Automation and Electromechanical Engineering Conference (IEMECON), Bangkok, 2017, pp. 303-304, doi: 10.1109/IEMECON.2017.8079610.

Dabek, F., (2016). "Leveraging Big Data to Provide a Web Service That Provides the Likelihood of Developing Psychological Conditions after a Concussion," 2016 IEEE International Conference on Mobile Services (MS), San Francisco, CA, 2016, pp. 160-165, doi: 10.1109/MobServ.2016.32.

Andy a., (2016). “Cloud Computing Part 5: SaaS (Software as a Service)”. accessed on July 30, 2020.

Naik, P. (2016). “MLHadoop”. accessed on October 1, 2016.

Gan, K. L., (2020). “Vector - Outdoor Club Games and Recreational Activities. Stick figure depict outdoor games lawn bowling, canoe, archery, horse riding, roller coaster, wall climbing, water park, swimming pool, and golf course”. accessed on March 8, 2020.

Putra, N. A., Putri, A. T., Prabowo, D. A., Surtiningsih, L., Arniantya, R., Cholissodin, I. (2017). “Klasifikasi Sepeda Motor Berdasarkan Karakteristik Konsumen Dengan Metode K-Nearest Neighbour Pada Big Data Menggunakan Hadoop Single Node Cluster”. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) FILKOM UB Vol. 4 No. 2, 81-86.

Naveen, N. (2020). “Hadoop MapReduce – Key Features & Highlights”. accessed on March 8, 2020.

stackchief. (2017). “MapReduce Quick Explanation”. accessed on March 8, 2020.

Kodžoman, V. (2019). “The hidden cost of shuffle – MapReduce”. accessed on March 8, 2020.

Lee, D. (2018). “RaspPi-Cluster”. accessed on March 8, 2020.

Cholissodin, I., Riyandani, E. (2016). “Analisis Big Data”. Fakultas Ilmu Komputer (Filkom), Universitas Brawijaya (UB), Malang.

Maryamah, M., Asikin, M. F., Kurniawaty, D., Sari, S. K., Cholissodin, I. (2016). “Implementasi Metode Naïve Bayes Classifier Untuk Seleksi Asisten Praktikum Pada Simulasi Hadoop Multinode Cluster”. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) FILKOM UB Vol. 3 No. 4, 273-278.

Cholissodin, I. and Supianto, A. A., "Enhancement Full Open Source Hadoop Distribution Universal Big Data Up Projects (UBig) From Education To Enterprise," 2019 International Conference on Sustainable Information Engineering and Technology (SIET), Lombok, Indonesia, 2019, pp. 90-93, doi: 10.1109/SIET48054.2019.8986040.