Advertisement

Accoustic Modeling for Development of Accented Indian English ASR

  • Partho Mandal
  • Gaurav OjhaEmail author
  • Anupam Shukla
  • S. S. Agrawal
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 394)

Abstract

This paper investigates Indian English from the point of view of a speech recognition problem. A novel approach towards building an Automated Speech Recognition System (ASR) for Indian English using PocketSphinx has been proposed. The system was trained with a database of English words spoken by Indians in three different accents using continuous as well as semi-continuous models. We have compared the performances in each case and the optimum case performance comes close to 98 % accurate. Based on this study, we tweaked the original PocketSphinx Android application in order to incorporate our results and present it as an Indian English-based SMS sending application. We are working further on this approach to identify ways of successfully training a speech recognition system to recognize a much wider variety of Indian accents with much more significant accuracy.

Keywords

Automatic speech recognition Indian English Discrete HMMs 

References

  1. 1.
    Discussion Forum about Siri on the official website of Apple Inc. https://discussions.apple.com/thread/3390280?tstart=0.
  2. 2.
    List of Countries by English Speaking Population—Wikipedia. http://en.wikipedia.org/wiki/List_of_countries_by_English-speaking_population.
  3. 3.
    Samudravijaya K. Automatic Speech Recognition. Tata Institute of Fundamental Research Archives. 2004.Google Scholar
  4. 4.
    Samudravijaya K. Speech and speaker recognition—a tutorial. Tata Institute of Fundamental Research Archives. 2004.Google Scholar
  5. 5.
    Samudravijaya K, Rao PVS, Agrawal SS. Hindi speech database. In: the Proceedings of the International Conference on Spoken Language Processing ICSLP00, Beijing, 2000; CDROM: 00192.pdf.Google Scholar
  6. 6.
    Huggins-Daines D, Kumar M, Chan A, Black AW, Ravishankar M, Rudnicky AI. Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: The proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), France, 2006.Google Scholar
  7. 7.
    Kulkarni K, Sengupta S, Ramasubramanian V, Bauer JG, Stemmer G. Accented Indian english ASR: some early results. In: The proceedings of the IEEE spoken language technology workshop, India, 2008.Google Scholar
  8. 8.
    Kumar R, Gangadharaiah R, Rao S, Prahallad K, Rosé CP, Black AW. Building a better Indian english voice using ‘more data’. In: The proceedings of the 6th ISCA workshop on speech synthesis, Germany, 2007.Google Scholar
  9. 9.
    Balyan A, Agrawal SS, Dev A. Automatic phonetic segmentation of Hindi speech using hidden Markov model 27:543–549, AI & Soc, Springer: London; 2012.Google Scholar
  10. 10.
    Sinha, S, Agrawal, SS, Jain, A. Continuous density hidden markov model for context dependent hindi speech recognition. In: The proceedings of the international conference on advances in computing, communication and informatics (ICACCI), India, 2013.Google Scholar
  11. 11.
    Picone J. Signal modeling techniques in speech recognition. In: Proceedings of the IEEE international conference, June 1993.Google Scholar
  12. 12.
    Geirhofer S. Feature reduction with linear discriminant analysis and its performance on phoneme recognition. Department of Electrical and Computer Engineering: University of Illinois at Urbana-Champaign; 2004.Google Scholar
  13. 13.
    Psutka JV. Benefit of maximum likelihood linear transform (MLLT) used at different levels of covariance matrices clustering in ASR systems., Lecture Notes in Computer ScienceBerlin Heidelberg: Springer; 2007.CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Reynolds DA. A Gaussian mixture modeling approach to text-independent speaker identification. Ph.D. thesis, Georgia Institute of Technology, 1992.Google Scholar
  16. 16.
    Raux A, Singh R. Maximum-likelihood adaptation of semi-continuous HMMs by latent variable decomposition of state distributions. In: The proceedings of the 8th international conference on spoken language processing (ICSLP), South Korea, 2004.Google Scholar
  17. 17.
    Duchateau J, Demuynck K, Van Compernolle D. Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Commun. 1998;24(1):5–17.CrossRefGoogle Scholar
  18. 18.
    Indian English SMS Sending App—PocketSphinx Derivative. https://github.com/parthoiiitm/smsforindeng.

Copyright information

© Springer India 2016

Authors and Affiliations

  • Partho Mandal
    • 1
  • Gaurav Ojha
    • 1
    Email author
  • Anupam Shukla
    • 1
  • S. S. Agrawal
    • 2
  1. 1.Department of Information TechnologyABV-IIITMGwaliorIndia
  2. 2.KIIT Group of CollegesGurgaonIndia

Personalised recommendations