Skip to main content

Using Corpus Statistics to Evaluate Nonce Words

  • Conference paper
  • 533 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8607))

Abstract

Nonce words are widely used in linguistic research to evaluate areas such as the acquisition of vowel harmony and consonant voicing, naturalness judgment of loanwords, and children’s acquisition of morphemes. Researchers usually create lists of nonce words intuitively by considering the phonotactic features of the target languages. In this study, a corpus of Turkish orthographic representations is used to propose a measure for the nonce word appropriateness for linearly concatenative languages. The conditional probabilities of orthographic co-occurrences and pairwise vowel collocations within the same word boundaries are used to evaluate a list of nonce words in terms of whether they would be rejected, moderately accepted or fully accepted as novel words. A group of 50 Turkish native speakers was asked to judge the same list of nonce words on how native-like the words sound. Both the model and the participants displayed similar results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hammond, M.: Gradience, phonotactics, and the lexicon in English phonology. Int. J. of English Studies 4, 1–24 (2004)

    Google Scholar 

  2. Anshen, F., Aronoff, M.: Producing morphologically complex words. Linguistics 26, 641–655 (1988)

    Article  Google Scholar 

  3. Dabrowska, E.: Low-level schemas or general rules? The role of diminutives in the acquisition of Polish case inflections. Language Sciences 28, 120–135 (2006)

    Article  Google Scholar 

  4. MacDonald, S., Ramscar, M.: Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. In: Proc. of the 23rd Annual Conference of the Cognitive Science Society. University of Edinburgh (2001)

    Google Scholar 

  5. Pycha, A., Novak, P., Shosted, R., Shin, E.: Phonological rule-learning and its implications for a theory of vowel harmony. In: Garding, G., Tsujimura, M. (eds.) Proc. of WCCFL, vol. 22, pp. 423–435 (2003)

    Google Scholar 

  6. Kawahara, S.: OCP is active in loanwords and nonce words: Evidence from naturalness judgment studies. Lingua (to appear)

    Google Scholar 

  7. Albright, A.: From clusters to words: Grammatical models of nonce word acceptability. Handout of talk presented at 82nd LSA, Chicago (January 3, 2008)

    Google Scholar 

  8. Shademan, S.: From clusters to words: Grammatical models of nonce word acceptability. Grammar and Analogy in Phonotactic Well-formedness Judgments. Ph. D. thesis, University of California, Los Angeles (2007)

    Google Scholar 

  9. Hay, J., Pierrehumbert, J., Beckman, M.: Speech perception, well-formedness and the statistics of the lexicon. In: Local, J., Ogden, R., Temple, R. (eds.) Phonetic Interpretation: Papersbin Laboratory Phonology VI. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  10. Frisch, S.A., Zawaydeh, B.A.: The psychological reality of OCP-Place in Arabic. Language 77, 91–106 (2001)

    Article  Google Scholar 

  11. Koo, H., Callahan, L.: Tier-adjacency is not a necessary condition for learning phonotactic dependencies. Language and Cognitive Processes 77, 1–8 (2011)

    Google Scholar 

  12. Finley, S.: Testing the limits of long-distance learning: learning beyond a three-segment window. Cognitive Science 36, 740–756 (2012)

    Article  Google Scholar 

  13. Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., Bowman, M.: English speakers sensitivity to phonotactic patterns. In: Broe, M.B., Pierrehumbert, J. (eds.) Papers in Laboratory Phonology V: Acquisition and the Lexicon, pp. 269–282. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  14. Goldsmith, J., Riggle, J.: Information theoretic approaches to phonological structure: the case of Finnish vowel harmony. Natural Language & Linguistic Theory (to appear)

    Google Scholar 

  15. Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proc. of the Eleventh International Conference of Turkish Linguistics (2002)

    Google Scholar 

  16. Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)

    Book  Google Scholar 

  17. Lewis, G.: Turkish Grammar, 2nd edn. University Press, Oxford (2000)

    Google Scholar 

  18. Kılıç, Ö., Bozşahin, C.: Semi-supervised morpheme segmentation without morphological analysis. In: Pro. of the LREC 2012 Workshop on Language Resources and Technologies for Turkic Languages, Istanbul, Turkey (2012)

    Google Scholar 

  19. Yatbaz, M.A., Yuret, D.: Unsupervised morphological disambiguation using statistical language models. In: Pro. of the NIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning, Whistler, Canada (2009)

    Google Scholar 

  20. Aslin, R.N., Saffran, J.R., Newport, E.L.: Computation of conditional probability statistics by human infants. Psychological Science 9, 321–324 (1998)

    Article  Google Scholar 

  21. Gomez, R.L.: Variability and detection of invariant structure. Psychological Science 13, 431–436 (2002)

    Article  Google Scholar 

  22. Kaschak, M.P., Saffran, J.R.: Idiomatic syntactic constructions and language learning. Cognitive Science 30, 43–63 (2006)

    Article  Google Scholar 

  23. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Tran. on Speech and Language Processing 4(1) (2007)

    Google Scholar 

  24. Bernhard, D.: Unsupervised morphological segmentation based on segment predictability and word segments alignment. In: Proc. of 2nd Pascal Challenges Workshop, pp. 19–24 (2006)

    Google Scholar 

  25. Demberg, V.: A language-independent unsupervised model for morphological segmentation. Ann. Meet. of Assoc. for Computational Linguistics 45(1), 920–927 (2007)

    Google Scholar 

  26. Debrowska, E.: The effects of frequency and neighbourhood density on adult native spakers’ productivity with Polish case inflections: An empirical test of usafe-based approaches to morphology. Memory and Language 58, 931–951 (2008)

    Article  Google Scholar 

  27. Baayen, R.H., Dijkstra, T., Schreuder, R.: Singulars and plurals in Dutch: Evidence for a parallel dual route model. Memory and Language 37, 94–117 (1997)

    Article  Google Scholar 

  28. Reeder, P.A., Newport, E.L., Aslin, R.N.: From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes. Cognitive Psychology 66, 30–54 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kılıç, Ö. (2014). Using Corpus Statistics to Evaluate Nonce Words. In: Colinet, M., Katrenko, S., Rendsvig, R.K. (eds) Pristine Perspectives on Logic, Language, and Computation. ESSLLI ESSLLI 2013 2012. Lecture Notes in Computer Science, vol 8607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44116-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44116-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44115-2

  • Online ISBN: 978-3-662-44116-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics