Voice Recognition to Classify “Buka” and “Tutup” Sound to Open and Closes Door Using Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN)

Authors

  • Blessius Sheldo Putra Laksono Brawijaya University, Malang
  • Tio Syaifuddin Brawijaya University, Malang
  • Fitri Utaminingrum Brawijaya University, Malang

DOI:

https://doi.org/10.25126/jitecs.202491579

Abstract

The consequences of the coronavirus called COVID-19 have been really impactful on society. Many things need to be changed in order to survive this pandemic. People have to avoid physical contact to minimize the probability of getting caught by other people who have been infected. A doorknob has a really big potential to be the medium to spread the virus because the same surface is used by several people. Speech recognition can be used to solve this problem. In this study, Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN) are going to be used as the extraction feature and classification method, respectively. We classify the sound signal into two classes (“buka” and “tutup”). People who want to open or close the door just need to say a specific command. This can be helpful to minimize the risk of COVID transmission. A CNN model is developed and fed with an audio file from a curated dataset for training and testing. With this system, we have successfully trained the model with an accuracy of 89% using an epoch of 50 and batch size of 32 as the parameters with a dataset distribution of 8:2 for training and validation. We believe this study will be influential in developing automated door systems using speech recognition, especially in the Indonesian language.

References

Arpit Jain; Abhinav Sharma; Jianwu Wang; Mangey Ram, "Use of AI, Robotics, and Modern Tools to Fight Covid-19," in Use of AI, Robotics, and Modern Tools to Fight Covid-19 , River Publishers, 2021, pp.ii-xxx.

H. Tahir, A. Iftikhar and M. Mumraiz, "Forecasting COVID-19 via Registration Slips of Patients using ResNet-101 and Performance Analysis and Comparison of Prediction for COVID-19 using Faster R-CNN, Mask R-CNN, and ResNet-50," 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 2021, pp. 1-6, doi: 10.1109/ICAECT49130.2021.9392487.

Morawska, L., Tang, J. W., Bahnfleth, W., Bluyssen, P. M., Boerstra, A., Buonanno, G., Cao, J., Dancer, S., Floto, A., Franchimon, F., Haworth, C., Hogeling, J., Isaxon, C., Jimenez, J. L., Kurnitski, J., Li, Y., Loomans, M., Marks, G., Marr, L. C., Mazzarella, L., Melikov, A. K., Miller, S., Milton, D. K., Nazaroff, W., Nielsen, P. V., Noakes, C., Peccia, J., Querol, X., Sekhar, C., Seppänen, O., Tanabe, S., Tellier, R., Tham, K. W., Wargocki, P., Wierzbicka, A., & Yao, M. (2020). How can airborne transmission of COVID-19 indoors be minimised?. Environment International, vol. 142, p. 105832.

R. Mishra, A. Ransingh, M. K. Behera and S. Chakravarty, "Convolutional Neural Network Based Smart Door Lock System," 2020 IEEE India Council International Subsections Conference (INDISCON), 2020, pp. 151-156, doi: 10.1109/INDISCON50162.2020.00041.

V. A. Kherdekar and S. A. Naik, “Convolution Neural Network Model for Recognition of Speecg for Words used in Mathematical Expression,” Turkish Journal of Computer and Mathematics Education, vol. 12 No. 6, pp. 4034-4042, 2021.

A. Mahmood and U. Kose, “Speech Recognition Based on Convolutional Neural Networks and MFCC Algorithm,” Advances in Artificial Intelligence Research (AAIR), vol. 1, No. 1, pp. 6-12, 2021.

M. S. Likitha, S. R. R. Gupta, K. Hasitha and A. U. Raju, "Speech based human emotion recognition using MFCC," 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2017, pp. 2257-2260, doi: 10.1109/WiSPNET.2017.8300161.

D. Anggraeni, W. S. M. Sanjaya, M. Munawwaroh, M. Y. S. Nurasyidiek and I. P. Santika, "Control of robot arm based on speech recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) method," 2017 International Conference on Advanced Mechatronics, Intelligent Manufacture, and Industrial Automation (ICAMIMIA), 2017, pp. 217-222, doi: 10.1109/ICAMIMIA.2017.8387590.

S. Park, Y. Jeong and H. S. Kim, "Multiresolution CNN for reverberant speech recognition," 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), 2017, pp. 1-4, doi: 10.1109/ICSDA.2017.8384470.

S. Kido, Y. Hirano and N. Hashimoto, "Detection and classification of lung abnormalities by use of convolutional neural network (CNN) and regions with CNN features (R-CNN)," 2018 International Workshop on Advanced Image Technology (IWAIT), 2018, pp. 1-4, doi: 10.1109/IWAIT.2018.8369798.

Wei Hu, Yangyu Huang, Li Wei, Fan Zhang, Hengchao Li, "Deep Convolutional Neural Networks for Hyperspectral Image Classification", Journal of Sensors, vol. 2015, Article ID 258619, 12 pages, 2015. https://doi.org/10.1155/2015/258619

G. S. Pavan, N. Kumar, K. Karthik N. and J. Manikandan, "Design of a Real-Time Speech Recognition System using CNN for Consumer Electronics," 2020 Zooming Innovation in Consumer Technologies Conference (ZINC), 2020, pp. 5-10, doi: 10.1109/ZINC50678.2020.9161432.

J. Banjara, K. R. Mishra, J. Rathi, K. Karki and S. Shakya, "Nepali Speech Recognition using CNN and Sequence Models," 2020 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), 2020, pp. 1-5, doi: 10.1109/ICMLANT50963.2020.9355707.

A. Ashar, M. S. Bhatti and U. Mushtaq, "Speaker Identification Using a Hybrid CNN-MFCC Approach," 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), 2020, pp. 1-4, doi: 10.1109/ICETST49965.2020.9080730.

A. B. Abdul Qayyum, A. Arefeen and C. Shahnaz, "Convolutional Neural Network (CNN) Based Speech-Emotion Recognition," 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), 2019, pp. 122-125, doi: 10.1109/SPICSCON48833.2019.9065172.

M. Z. Abbiyansyah and F. Utaminingrum, “Voice Recognition on Humanoid Robot Darwin OP Using Mel Frequency Cepstrum Coefficients (MFCC) Feature and Artificial Neural Networks (ANN) Method,” in Proceedings - 2022 2nd International Conference on Information Technology and Education, ICIT and E 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 251–256. doi: 10.1109/ICITE54466.2022.9759883.

Downloads

Published

2024-04-03

How to Cite

Laksono, B. S. P., Syaifuddin, T., & Utaminingrum, F. (2024). Voice Recognition to Classify “Buka” and “Tutup” Sound to Open and Closes Door Using Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN). Journal of Information Technology and Computer Science, 9(1), 58–66. https://doi.org/10.25126/jitecs.202491579

Issue

Section

Articles