Development and analysis of Punjabi ASR system for mobile phones under different acoustic models
- 14 Downloads
Speech technology is widely gaining importance in our daily life. Speech based mobile phone applications are becoming popular in masses due to their usability and ease of access. Speech technology is helping people, with disabilities like blindness and physical abnormalities, to access and control mobile phone applications through voice, without using keypad or touchpad. Punjabi is one of the widely spoken language in various parts of the world. In this paper, an automatic speech recognition (ASR) system for mobile phone applications in Punjabi has been proposed and implemented for four different acoustic models- context independent, context dependent untied, context dependent tied, and context dependent deleted interpolation models. The proposed ASR is evaluated at 4, 16, 32 and 64 GMMs for performance analysis in terms of parameters like accuracy, word error rate and storage space required. It is observed that context dependent untied models outperform others by having better accuracy and lower word error rate, while context independent models require less storage space than others. The choice of fruitful acoustic model depends upon the available storage space as well as desired recognition accuracy. Mobile phones having limited resources may use context independent models, while context dependent untied models can be used to develop ASR system for high end mobile phones.
KeywordsAcoustic model ASR Context dependent Context independent HMM Speech recognition
- Acoustic Model Types – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/acousticmodeltypes/.
- Adda-Decker, M., Adda, G., Gauvain, J., & Lamel, L. (1999). Large vocabulary speech recognition in French. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 45–48 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758058.
- Beaufays, F., & Weintraub, M. & Yochai Konig. (1999). Discriminative mixture weight estimation for large Gaussian mixture models. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 337–340 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758131.
- Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182).Google Scholar
- Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer. https://doi.org/10.1007/978-3-319-73059-2.CrossRefGoogle Scholar
- Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Cham: Springer. https://doi.org/10.1007/978-3-319-73059-2.CrossRefGoogle Scholar
- Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology. https://doi.org/10.1049/cp.2012.2528.
- Hasnat, M. A., Mowla, J., & Khan, M. (n.d.). Isolated and continuous bangla speech recognition: implementation, performance and application perspective. Retrieved January 3, 2018 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.372&rep=rep1&type=pdf.
- History of Punjabi Language & Gurmukhi Alphabet | Trumbull, CT Patch. (n.d.). Retrieved January 4, 2018 from https://patch.com/connecticut/trumbull/history-of-punjabi-language--gurmukhi-alphabet.
- Huang, X. D., Hwang, M.-Y., Li, J., & Mahajan, M. (n.d.). Deleted interpolation and density sharing for continuous hidden Markov models. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 885–888). IEEE. https://doi.org/10.1109/ICASSP.1996.543263.
- Huang, X. D., & Jack, M. A. (1990). Semi-continuous hidden Markov models for speech signals. Readings in speech recognition. San Francisco: Morgan Kaufmann Publishers Inc. Retrieved January 4, 2018 from https://dl.acm.org/citation.cfm?id=108259.
- Nkosi, M., Manamela, M., & Gasela, N. (n.d.). Creating a pronunciation dictionary for automatic speech recognition -a morphological approach. Retrieved January 3, 2018 from http://www.satnac.org.za/proceedings/2011/papers/Network_Services/176.pdf.
- Patel, H. N., & Virparia, P. V. (2011). A Small Vocabulary Speech Recognition for Gujarati. International Journal of Advanced Research in Computer Science, 2(1), 208–210.Google Scholar
- Persian Influence on Punjabi (Shahmukhi and Gurumukhi) Language | Universal Urdu Post. (n.d.). Retrieved March 16, 2018 from http://universalurdupost.com/english-articles/12-01-2016/33581.
- Pronunciation guide for English and Academic English Dictionaries at OxfordLearnersDictionaries.com. (n.d.). Retrieved March 16, 2018 from https://www.oxfordlearnersdictionaries.com/about/pronunciation_english.html.
- Punjabi/Phonetics - Wikibooks, open books for an open world. (n.d.). Retrieved March 16, 2018 from https://en.wikibooks.org/wiki/Punjabi/Phonetics.
- Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech is 3 × faster than typing for english and mandarin text entry on mobile devices. Retrieved January 3, 2018 from http://arxiv.org/abs/1608.07323.
- Shackle, C. (n.d.). Punjabi language | Britannica.com. Retrieved March 16, 2018 from https://www.britannica.com/topic/Punjabi-language.
- Smart Voice Recorder for Android - Download. (n.d.). Retrieved January 4, 2018 from https://smart-voice-recorder.en.softonic.com/android.
- The World Factbook — Central Intelligence Agency. (n.d.). Retrieved March 16, 2018 from https://www.cia.gov/library/publications/the-worldfactbook/fields/2098.html.
- Training an acoustic model for CMUSphinx – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/tutorialam/.
- Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143.Google Scholar
- Why your smartphone won’t be your next PC | Digital Trends. (n.d.). Retrieved January 4, 2018 from https://www.digitaltrends.com/computing/why-your-smartphone-wont-be-your-next-pc/.
- Yang, H., Oehlke, C., & Meinel, C. (2011). German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In 2011 10th IEEE/ACIS International Conference on Computer and Information Science (pp. 201–206). IEEE. https://doi.org/10.1109/ICIS.2011.38.