Abstract
The present study investigates four kinds of lexical diversity measurement and a computational experiment with corpus processing and statistical test has been conducted to find out the most effective lexical diversity measurement in evaluating a small-sized corpus of 350 ~ 550 words. The results show that the D-estimate is the most appropriate among the four lexical diversity measurements which were compared in this research. Also the D-estimate showed more stable results than other measurements when the number of words varied between texts. The D-estimate was applied to measure the morphological and grammatical diversities of L2 learners of the Korean language, and conduct a statistical test on whether the mother tongues of L2 learners affect the degree of acquisition of grammatical morphemes. The test shows that the native languages of L2 learners learning Korean did not seem to have a significant impact.
Similar content being viewed by others
References
Baayen RH (2008) Analyzing linguistic data: a practical introduction to statistics using R. Cambridge University Press, NY
Chang KH, Jeon EJ (2008) A study on the diversity of words used by middle and high school students. Korean Semant 27:225–242
Durán P, Malvern D, Brian R, Ngoni C (2004) Development trends in lexical diversity. Appl Linguist 25(2):220–242
Jin DY (2006) A study on vocabulary as a component of KSL writing ability. Biling Res 30:385–418
Kang S (2002) Korean morphological analyzer and information retrieval. Hongneung Science Publication, Seoul
Lee HY (2010) The comparison on the Korean language proficiency of American heritage learners and that of non-heritage learners in their beginning level. Biling Res 44:275–294
Mellor A (2011) Essay length, lexical diversity and automatic essay scoring. Mem Osaka Inst Technol Ser B 55(2):1–14
Ministry of Culture, Sports, and Tourism (2010) The research on the actual condition and demand of Korean language educational institutions. The National Institute of the Korean Language, Republic of Korea
Park JE, Kim YJ (2014) Lexical diversity in the writings of advanced Korean learners. J Korean Lang Educ 25(2):1–32
Text Corpus from Project Gutenberg available on http://www.gutenberg.org, (2011)
Tweedie FJ, Baayen RH (1998) How variable may a constant be? Measures of lexical richness in perspective. Comput Hum 32:323–335
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choi, W., Jeong, H. Finding an appropriate lexical diversity measurement for a small-sized corpus and its application to a comparative study of L2 learners’ writings. Multimed Tools Appl 75, 13015–13022 (2016). https://doi.org/10.1007/s11042-015-2529-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2529-1