Using Corpus Statistics to Evaluate Nonce Words

Kılıç, Özkan

doi:10.1007/978-3-662-44116-9_3

Using Corpus Statistics to Evaluate Nonce Words

Özkan Kılıç¹⁸

Conference paper

533 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8607))

Abstract

Nonce words are widely used in linguistic research to evaluate areas such as the acquisition of vowel harmony and consonant voicing, naturalness judgment of loanwords, and children’s acquisition of morphemes. Researchers usually create lists of nonce words intuitively by considering the phonotactic features of the target languages. In this study, a corpus of Turkish orthographic representations is used to propose a measure for the nonce word appropriateness for linearly concatenative languages. The conditional probabilities of orthographic co-occurrences and pairwise vowel collocations within the same word boundaries are used to evaluate a list of nonce words in terms of whether they would be rejected, moderately accepted or fully accepted as novel words. A group of 50 Turkish native speakers was asked to judge the same list of nonce words on how native-like the words sound. Both the model and the participants displayed similar results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hammond, M.: Gradience, phonotactics, and the lexicon in English phonology. Int. J. of English Studies 4, 1–24 (2004)
Google Scholar
Anshen, F., Aronoff, M.: Producing morphologically complex words. Linguistics 26, 641–655 (1988)
Article Google Scholar
Dabrowska, E.: Low-level schemas or general rules? The role of diminutives in the acquisition of Polish case inflections. Language Sciences 28, 120–135 (2006)
Article Google Scholar
MacDonald, S., Ramscar, M.: Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. In: Proc. of the 23rd Annual Conference of the Cognitive Science Society. University of Edinburgh (2001)
Google Scholar
Pycha, A., Novak, P., Shosted, R., Shin, E.: Phonological rule-learning and its implications for a theory of vowel harmony. In: Garding, G., Tsujimura, M. (eds.) Proc. of WCCFL, vol. 22, pp. 423–435 (2003)
Google Scholar
Kawahara, S.: OCP is active in loanwords and nonce words: Evidence from naturalness judgment studies. Lingua (to appear)
Google Scholar
Albright, A.: From clusters to words: Grammatical models of nonce word acceptability. Handout of talk presented at 82nd LSA, Chicago (January 3, 2008)
Google Scholar
Shademan, S.: From clusters to words: Grammatical models of nonce word acceptability. Grammar and Analogy in Phonotactic Well-formedness Judgments. Ph. D. thesis, University of California, Los Angeles (2007)
Google Scholar
Hay, J., Pierrehumbert, J., Beckman, M.: Speech perception, well-formedness and the statistics of the lexicon. In: Local, J., Ogden, R., Temple, R. (eds.) Phonetic Interpretation: Papersbin Laboratory Phonology VI. Cambridge University Press, Cambridge (2004)
Google Scholar
Frisch, S.A., Zawaydeh, B.A.: The psychological reality of OCP-Place in Arabic. Language 77, 91–106 (2001)
Article Google Scholar
Koo, H., Callahan, L.: Tier-adjacency is not a necessary condition for learning phonotactic dependencies. Language and Cognitive Processes 77, 1–8 (2011)
Google Scholar
Finley, S.: Testing the limits of long-distance learning: learning beyond a three-segment window. Cognitive Science 36, 740–756 (2012)
Article Google Scholar
Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., Bowman, M.: English speakers sensitivity to phonotactic patterns. In: Broe, M.B., Pierrehumbert, J. (eds.) Papers in Laboratory Phonology V: Acquisition and the Lexicon, pp. 269–282. Cambridge University Press, Cambridge (2000)
Google Scholar
Goldsmith, J., Riggle, J.: Information theoretic approaches to phonological structure: the case of Finnish vowel harmony. Natural Language & Linguistic Theory (to appear)
Google Scholar
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proc. of the Eleventh International Conference of Turkish Linguistics (2002)
Google Scholar
Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)
Book Google Scholar
Lewis, G.: Turkish Grammar, 2nd edn. University Press, Oxford (2000)
Google Scholar
Kılıç, Ö., Bozşahin, C.: Semi-supervised morpheme segmentation without morphological analysis. In: Pro. of the LREC 2012 Workshop on Language Resources and Technologies for Turkic Languages, Istanbul, Turkey (2012)
Google Scholar
Yatbaz, M.A., Yuret, D.: Unsupervised morphological disambiguation using statistical language models. In: Pro. of the NIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning, Whistler, Canada (2009)
Google Scholar
Aslin, R.N., Saffran, J.R., Newport, E.L.: Computation of conditional probability statistics by human infants. Psychological Science 9, 321–324 (1998)
Article Google Scholar
Gomez, R.L.: Variability and detection of invariant structure. Psychological Science 13, 431–436 (2002)
Article Google Scholar
Kaschak, M.P., Saffran, J.R.: Idiomatic syntactic constructions and language learning. Cognitive Science 30, 43–63 (2006)
Article Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Tran. on Speech and Language Processing 4(1) (2007)
Google Scholar
Bernhard, D.: Unsupervised morphological segmentation based on segment predictability and word segments alignment. In: Proc. of 2nd Pascal Challenges Workshop, pp. 19–24 (2006)
Google Scholar
Demberg, V.: A language-independent unsupervised model for morphological segmentation. Ann. Meet. of Assoc. for Computational Linguistics 45(1), 920–927 (2007)
Google Scholar
Debrowska, E.: The effects of frequency and neighbourhood density on adult native spakers’ productivity with Polish case inflections: An empirical test of usafe-based approaches to morphology. Memory and Language 58, 931–951 (2008)
Article Google Scholar
Baayen, R.H., Dijkstra, T., Schreuder, R.: Singulars and plurals in Dutch: Evidence for a parallel dual route model. Memory and Language 37, 94–117 (1997)
Article Google Scholar
Reeder, P.A., Newport, E.L., Aslin, R.N.: From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes. Cognitive Psychology 66, 30–54 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Lehigh University, Bethlehem, PA, USA
Özkan Kılıç

Authors

Özkan Kılıç
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire de Linguistique Formelle, Université Paris Diderot-Paris 7, 13 . passage Menténégro, 75019, Paris, France
Margot Colinet
Passeerdersstraat 28A, 1016XC, Amsterdam, The Netherlands
Sophia Katrenko
Department of Philosophy, Lund Unviersity, Kungshuset, Lundagård, 222 22, Lund, Sweden
Rasmus K. Rendsvig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kılıç, Ö. (2014). Using Corpus Statistics to Evaluate Nonce Words. In: Colinet, M., Katrenko, S., Rendsvig, R.K. (eds) Pristine Perspectives on Logic, Language, and Computation. ESSLLI ESSLLI 2013 2012. Lecture Notes in Computer Science, vol 8607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44116-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-44116-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44115-2
Online ISBN: 978-3-662-44116-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics