Journal of Psycholinguistic Research

, Volume 48, Issue 5, pp 947–960 | Cite as

Language Learnability Analysis of Hindi: A Comparison with Ideal and Constrained Learning Approaches

  • Sandeep SainiEmail author
  • Vineet Sahula


Native language acquisition is one of the initial processes undertaken by the human brain in the infant stage of life. The linguist community has always been interested in finding the method, which is adopted by the human brain to acquire the native language. Word segmentation in one of the most important tasks in acquiring the language. Statistical learning has been employed to be one of the earliest strategies that mimic the way an infant can adapt to segment a lot of different words. It is desired that the language learnability theories be universal in nature and work on most, if not all the languages. In the present work, we have analyzed the learnability of Hindi, the most popular Indian language, using ideal (universal) and constrained Bayesian learner models. We have analyzed the learnability of the language using unigram and bigram approaches by considering word, syllables, and phonemes as the smallest unit of the language. We demonstrate that Bayesian inference is indeed a viable cross-linguistic strategy and works well for Hindi also.


Language acquisition Language learnability Bayesian learners Hindi Language 



This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur. We thank MNIT’s computer labs for setting up the experiment and also the LNMIIT’s GPU services in simulations to obtain the results.

Compliance with Ethical Standards


The research is not funded by any external project/agency other than the LNMIIT Jaipur and MNIT Jaipur, India. This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur.

Conflict of Interest

The work is supported by the LNM Institute of Information Technology, Jaipur and Malaviya National Institute of Technology (MNIT), Jaipur, India only. The details are mentioned in funding section. We have no conflict of interest to disclose.


  1. Black, A. W., & Taylor, P. A. (1997). The festival speech synthesis system: System documentation. Technical Report HCRC/TR-83. Scotland: Human Communciation Research Centre, University of Edinburgh. Avaliable at
  2. Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., & Zeman, D. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).Google Scholar
  3. Clark, R. A. J., Richmond, K., & King, S. (2007). Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication, 49(4), 317–330.CrossRefGoogle Scholar
  4. Cognition Institute for Language and Indic Multi-parallel Corpus Computation, University of Edinburgh (2011).
  5. Eddington, D., Treiman, R., & Elzinga, D. (2013). Syllabification of american english: Evidence from a large-scale experiment. Part ii. Journal of Quantitative Linguistics, 20(2), 75–93.CrossRefGoogle Scholar
  6. Eimas, P. D. (1999). Segmental and syllabic representations in the perception of speech by young infants. The Journal of the Acoustical Society of America, 105(3), 1901–1911.CrossRefGoogle Scholar
  7. Felser, C., & Drummer, J.-D. (2017). Sensitivity to crossover constraints during native and non-native pronoun resolution. Journal of Psycholinguistic Research, 46(3), 771–789.CrossRefGoogle Scholar
  8. Floccia, C., Keren-Portnoy, T., DePaolis, R., Duffy, H., Luche, C. D., Durrant, S., et al. (2016). British english infants segment words only with exaggerated infant-directed speech stimuli. Cognition, 148, 1–9.CrossRefGoogle Scholar
  9. Gambell, T., & Yang, C. (2006). Word segmentation: Quick but not dirty. Unpublished manuscript.Google Scholar
  10. Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.CrossRefGoogle Scholar
  11. Graddol, D. (2004). The future of language (Vol. 303). Washington, DC.: American Association for the Advancement of Science.Google Scholar
  12. Gupta, K., Choudhury, M., & Bali, K. (2012). Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (pp. 2459–2465).Google Scholar
  13. Gural, S. K., Kecskes, I., Gillespie, D., Rijlaarsdam, G. C. W., Ter-Minasova, S. G., Karasik, V. I., et al. (2015). Word collocations as language knowledge patterns: A study of infant speech. Procedia–Social and Behavioral Sciences, 200, 353–358.CrossRefGoogle Scholar
  14. Halpern, M. (2016). How children learn their mother tongue: They dont. Journal of Psycholinguistic Research, 45(5), 1173–1181.CrossRefGoogle Scholar
  15. Haris, B. C., Gayadhar Pradhan, A., Misra, S. R. M. P., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in indian scenario. International Journal of Speech Technology, 15(4), 441–453.CrossRefGoogle Scholar
  16. Hyams, N. (2012). Language acquisition and the theory of parameters (Vol. 3). Berlin: Springer Science & Business Media.Google Scholar
  17. IIT-Bombay Hindi Corpus (2010).
  18. India Hindi Speech Corpus. TDIL: Technology Development for Indian Languages Programme (2010).
  19. Jusczyk, P. W., & Derrah, C. (1987). Representation of speech sounds by young infants. Developmental Psychology, 23(5), 648.CrossRefGoogle Scholar
  20. Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous hindi speech recognition using gaussian mixture hmm. In 2014 IEEE students’ conference on electrical, electronics and computer science (SCEECS) (pp. 1–5).Google Scholar
  21. Lignos, C., & Yang, C. (2010). Recession segmentation: Simpler online word segmentation using limited resources. In Proceedings of the fourteenth conference on computational natural language learning (pp. 88–97). Vancouver: Association for Computational Linguistics.Google Scholar
  22. MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). London: Psychology Press.Google Scholar
  23. Pearl, L. (2014). Evaluating learning-strategy components: Being fair (commentary on ambridge, pine, and lieven). Language, 90(3), e107–e114.CrossRefGoogle Scholar
  24. Phillips, L., & Pearl, L. (2015). The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation. Cognitive Science, 39(8), 1824–1854.CrossRefGoogle Scholar
  25. Swingley, D. (2005). Statistical clustering and the contents of the infant vocabulary. Cognitive Psychology, 50(1), 86–132.CrossRefGoogle Scholar
  26. Taha, H. (2017). How does the linguistic distance between spoken and standard language in arabic affect recall and recognition performances during verbal memory examination. Journal of Psycholinguistic Research, 46(3), 551–566.CrossRefGoogle Scholar
  27. Weerasinghe, R., Wasala, A., & Gamage, K. (2005). A rule based syllabification algorithm for sinhala. In Natural language processing–IJCNLP 2005 (pp. 438–449). Berlin: Springer.Google Scholar
  28. Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7(1), 49–63.CrossRefGoogle Scholar
  29. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2002). The htk book. Cambridge University Engineering Department, 3, 175.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringThe LNM Institute of Information TechnologyJaipurIndia
  2. 2.Department of Electronics and Communication EngineeringMalaviya National Institute of TechnologyJaipurIndia

Personalised recommendations