Accoustic Modeling for Development of Accented Indian English ASR

Mandal, Partho; Ojha, Gaurav; Shukla, Anupam; Agrawal, S. S.

doi:10.1007/978-81-322-2656-7_16

Accoustic Modeling for Development of Accented Indian English ASR

Partho Mandal⁶,
Gaurav Ojha⁶,
Anupam Shukla⁶ &
…
S. S. Agrawal⁷

Conference paper
First Online: 06 February 2016

2610 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 394))

Abstract

This paper investigates Indian English from the point of view of a speech recognition problem. A novel approach towards building an Automated Speech Recognition System (ASR) for Indian English using PocketSphinx has been proposed. The system was trained with a database of English words spoken by Indians in three different accents using continuous as well as semi-continuous models. We have compared the performances in each case and the optimum case performance comes close to 98 % accurate. Based on this study, we tweaked the original PocketSphinx Android application in order to incorporate our results and present it as an Indian English-based SMS sending application. We are working further on this approach to identify ways of successfully training a speech recognition system to recognize a much wider variety of Indian accents with much more significant accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Discussion Forum about Siri on the official website of Apple Inc. https://discussions.apple.com/thread/3390280?tstart=0.
List of Countries by English Speaking Population—Wikipedia. http://en.wikipedia.org/wiki/List_of_countries_by_English-speaking_population.
Samudravijaya K. Automatic Speech Recognition. Tata Institute of Fundamental Research Archives. 2004.
Google Scholar
Samudravijaya K. Speech and speaker recognition—a tutorial. Tata Institute of Fundamental Research Archives. 2004.
Google Scholar
Samudravijaya K, Rao PVS, Agrawal SS. Hindi speech database. In: the Proceedings of the International Conference on Spoken Language Processing ICSLP00, Beijing, 2000; CDROM: 00192.pdf.
Google Scholar
Huggins-Daines D, Kumar M, Chan A, Black AW, Ravishankar M, Rudnicky AI. Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: The proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), France, 2006.
Google Scholar
Kulkarni K, Sengupta S, Ramasubramanian V, Bauer JG, Stemmer G. Accented Indian english ASR: some early results. In: The proceedings of the IEEE spoken language technology workshop, India, 2008.
Google Scholar
Kumar R, Gangadharaiah R, Rao S, Prahallad K, Rosé CP, Black AW. Building a better Indian english voice using ‘more data’. In: The proceedings of the 6th ISCA workshop on speech synthesis, Germany, 2007.
Google Scholar
Balyan A, Agrawal SS, Dev A. Automatic phonetic segmentation of Hindi speech using hidden Markov model 27:543–549, AI & Soc, Springer: London; 2012.
Google Scholar
Sinha, S, Agrawal, SS, Jain, A. Continuous density hidden markov model for context dependent hindi speech recognition. In: The proceedings of the international conference on advances in computing, communication and informatics (ICACCI), India, 2013.
Google Scholar
Picone J. Signal modeling techniques in speech recognition. In: Proceedings of the IEEE international conference, June 1993.
Google Scholar
Geirhofer S. Feature reduction with linear discriminant analysis and its performance on phoneme recognition. Department of Electrical and Computer Engineering: University of Illinois at Urbana-Champaign; 2004.
Google Scholar
Psutka JV. Benefit of maximum likelihood linear transform (MLLT) used at different levels of covariance matrices clustering in ASR systems., Lecture Notes in Computer ScienceBerlin Heidelberg: Springer; 2007.
Book Google Scholar
Arpabet. http://en.wikipedia.org/wiki/Arpabet.
Reynolds DA. A Gaussian mixture modeling approach to text-independent speaker identification. Ph.D. thesis, Georgia Institute of Technology, 1992.
Google Scholar
Raux A, Singh R. Maximum-likelihood adaptation of semi-continuous HMMs by latent variable decomposition of state distributions. In: The proceedings of the 8th international conference on spoken language processing (ICSLP), South Korea, 2004.
Google Scholar
Duchateau J, Demuynck K, Van Compernolle D. Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Commun. 1998;24(1):5–17.
Article Google Scholar
Indian English SMS Sending App—PocketSphinx Derivative. https://github.com/parthoiiitm/smsforindeng.

Download references

Author information

Authors and Affiliations

Department of Information Technology, ABV-IIITM, Gwalior, India
Partho Mandal, Gaurav Ojha & Anupam Shukla
KIIT Group of Colleges, Gurgaon, India
S. S. Agrawal

Authors

Partho Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Ojha
View author publications
You can also search for this author in PubMed Google Scholar
Anupam Shukla
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaurav Ojha .

Editor information

Editors and Affiliations

Electrical and Electronics Engineer, SRM Engineering College, Kattankulathur, Tamil Nadu, India
Subhransu Sekhar Dash
Electrical & Electronics Engineering, Velammal Engineering College, Chennai, Tamil Nadu, India
M. Arun Bhaskar
Dept Electrical & Electronics Engg, IIT Delhi, New Delhi, India
Bijaya Ketan Panigrahi
Indian Statistical Institute, Kolkata, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, P., Ojha, G., Shukla, A., Agrawal, S.S. (2016). Accoustic Modeling for Development of Accented Indian English ASR. In: Dash, S., Bhaskar, M., Panigrahi, B., Das, S. (eds) Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing, vol 394. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2656-7_16

Download citation

DOI: https://doi.org/10.1007/978-81-322-2656-7_16
Published: 06 February 2016
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2654-3
Online ISBN: 978-81-322-2656-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics