Language Learnability Analysis of Hindi: A Comparison with Ideal and Constrained Learning Approaches
- 36 Downloads
Native language acquisition is one of the initial processes undertaken by the human brain in the infant stage of life. The linguist community has always been interested in finding the method, which is adopted by the human brain to acquire the native language. Word segmentation in one of the most important tasks in acquiring the language. Statistical learning has been employed to be one of the earliest strategies that mimic the way an infant can adapt to segment a lot of different words. It is desired that the language learnability theories be universal in nature and work on most, if not all the languages. In the present work, we have analyzed the learnability of Hindi, the most popular Indian language, using ideal (universal) and constrained Bayesian learner models. We have analyzed the learnability of the language using unigram and bigram approaches by considering word, syllables, and phonemes as the smallest unit of the language. We demonstrate that Bayesian inference is indeed a viable cross-linguistic strategy and works well for Hindi also.
KeywordsLanguage acquisition Language learnability Bayesian learners Hindi Language
This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur. We thank MNIT’s computer labs for setting up the experiment and also the LNMIIT’s GPU services in simulations to obtain the results.
Compliance with Ethical Standards
The research is not funded by any external project/agency other than the LNMIIT Jaipur and MNIT Jaipur, India. This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur.
Conflict of Interest
The work is supported by the LNM Institute of Information Technology, Jaipur and Malaviya National Institute of Technology (MNIT), Jaipur, India only. The details are mentioned in funding section. We have no conflict of interest to disclose.
- Black, A. W., & Taylor, P. A. (1997). The festival speech synthesis system: System documentation. Technical Report HCRC/TR-83. Scotland: Human Communciation Research Centre, University of Edinburgh. Avaliable at http://www.cstr.ed.ac.uk/projects/festival.html.
- Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., & Zeman, D. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).Google Scholar
- Cognition Institute for Language and Indic Multi-parallel Corpus Computation, University of Edinburgh (2011). http://homepages.inf.ed.ac.uk/miles/babel.html.
- Gambell, T., & Yang, C. (2006). Word segmentation: Quick but not dirty. Unpublished manuscript.Google Scholar
- Graddol, D. (2004). The future of language (Vol. 303). Washington, DC.: American Association for the Advancement of Science.Google Scholar
- Gupta, K., Choudhury, M., & Bali, K. (2012). Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (pp. 2459–2465).Google Scholar
- Hyams, N. (2012). Language acquisition and the theory of parameters (Vol. 3). Berlin: Springer Science & Business Media.Google Scholar
- IIT-Bombay Hindi Corpus (2010). http://www.cfilt.iitb.ac.in/downloads.html.
- India Hindi Speech Corpus. TDIL: Technology Development for Indian Languages Programme (2010). http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=268&lang=en
- Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous hindi speech recognition using gaussian mixture hmm. In 2014 IEEE students’ conference on electrical, electronics and computer science (SCEECS) (pp. 1–5).Google Scholar
- Lignos, C., & Yang, C. (2010). Recession segmentation: Simpler online word segmentation using limited resources. In Proceedings of the fourteenth conference on computational natural language learning (pp. 88–97). Vancouver: Association for Computational Linguistics.Google Scholar
- MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). London: Psychology Press.Google Scholar
- Weerasinghe, R., Wasala, A., & Gamage, K. (2005). A rule based syllabification algorithm for sinhala. In Natural language processing–IJCNLP 2005 (pp. 438–449). Berlin: Springer.Google Scholar
- Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2002). The htk book. Cambridge University Engineering Department, 3, 175.Google Scholar