Skip to main content

Predicting Speech Errors in Mandarin Based on Word Frequency

  • Chapter
  • First Online:
From Minimal Contrast to Meaning Construct

Part of the book series: Frontiers in Chinese Linguistics ((FiCL,volume 9))

Abstract

This paper investigates the effect of word frequency on the occurrence of speech errors in Mandarin. A corpus of 390 speech errors along with their surrounding linguistic context was gathered. The information of word frequency was extracted from the Academia Sinica Corpus. Our analysis with a computational classifier based on conditional inference trees shows that intended words having a frequency lower than words of the surrounding context are more likely to generate speech errors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The audio files were automatically transcribed in Speech-to-Text systems powered by AI Labs in Taiwan or Google API depending on the recording quality.

  2. 2.

    The average error rate of the Speech-to-Text systems was 40%. Main influencing factors were the recording environment and the voice quality of the speakers.

  3. 3.

    The phonetic alignment marked to the phoneme level is still under construction and being developed with DNNs by Dr. Chain-Wu Lee.

  4. 4.

    The total amount of semantic lexical errors is much higher in the entire corpus. However, since the transcription and phonetic alignment is still an ongoing work, we only selected a sample from the semantic lexical errors that were already transcribed, annotated, and cross-checked.

References

  • Arnaud, Pierre J. 1999. Target—error resemblance in French word substitution speech errors and the mental lexicon. Applied Psycholinguistics 20 (2): 269–287.

    Article  Google Scholar 

  • Bastiaanse, Roelien, Martijn Wieling, and Nienke Wolthuis. 2015. The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30: 1221–1239.

    Article  Google Scholar 

  • Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Press.

    Google Scholar 

  • Breiman, Leo. 2001. Random forests. Machine Learning 45 (1): 5–32.

    Article  Google Scholar 

  • Breiman, Leo, Jerome Friedman, Charles J. Stone, and Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.

    Google Scholar 

  • CKIP (Chinese Knowledge Information Processing Group). 1998. The content and illustration of Academica Sinica Corpus. Taipei: Academia Sinica.

    Google Scholar 

  • Cutler, Anne. 1982. The reliability of speech error data. In Slips of the tongue and language production, ed. Anne Cutler, 7–28. Amsterdam: Mouton.

    Chapter  Google Scholar 

  • Fay, David, and Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8: 505–520.

    Google Scholar 

  • Fromkin, Victoria. 1980. Errors in linguistic performance: Slips of the tongue, ear, pen, and hand. NY: Academic Press.

    Google Scholar 

  • Harley, Trevor, and Siobhan MacAndrew. 2001. Constraints upon word substitution speech errors. Journal of Psycholinguistic Research 30: 395–418.

    Article  Google Scholar 

  • Huang, Chu-Ren, Lung-Hao Lee, Qu Wei-guang, Jia-Fei Hong, and Yu. Shiwen. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. LREC 2008: 2725–2729.

    Google Scholar 

  • Jaeger, Jeri J. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Kittredge, Audrey K., Gary S. Dell, Jay Verkuilen, and Myrna F. Schwartz. 2008. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cognitive Neuropsychology 25: 463–492.

    Article  Google Scholar 

  • Levelt, Willem J. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT press.

    Google Scholar 

  • Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • Ma, Wei-Yun, Chu-Ren, Huang. 2006. Uniform and effective tagging of a heterogeneous Giga-word corpus. In Proceedings of the 5th international conference on language resources and evaluation (LREC-5).

    Google Scholar 

  • Martin, Nadine, and Eleanor M. Saffran. 1997. Language and auditory-verbal short-term memory impairments: Evidence for common underlying processes. Cognitive Neuropsychology 14: 641–682.

    Article  Google Scholar 

  • Minkina, Irene, Nadine Martin, Kristie A. Spencer, and Diane L. Kendall. 2018. Links between short-term memory and word retrieval in aphasia. American Journal Speech Language Pathology 27 (1): 379–391.

    Article  Google Scholar 

  • Nickels, Lyndsey, and David Howard. 1994. A frequent occurrence? Factors affecting the production of semantic errors in aphasic naming. Cognitive Neuropsychology 11: 289–320.

    Article  Google Scholar 

  • Ting, Kai Ming. 2010. Precision and Recall. In Encyclopedia of machine learning, Claude, Sammut, Geoffrey I. Webb, (eds.). 781–781. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-30164-8_652.

    Google Scholar 

  • Wan, I-Ping, Marc, Tang. 2018. A corpus study of lexical speech errors in Mandarin. Manuscript.

    Google Scholar 

  • Wijnen, Frank. 1992. Incidental word and sound errors in young speakers. Journal of Memory and Language 31: 734–755.

    Article  Google Scholar 

  • Wan, I-Ping, Ting, Jen. To appear. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics.

    Google Scholar 

Download references

Acknowledgements

We thank the two anonymous reviewers for their constructive comments, which led to significant improvements of the paper. The second author would like to thank Dr. Chain-wu Lee for his continuous cutting-edge high-tech programming support in constructing all the corpora in Phonetics and Psycholinguistics lab at National Chengchi University. All remaining errors are our own. The research reported in this paper was funded to the second author by MOST three-year grant, MOST 98-2410-H-004-103-MY2, in Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Tang .

Editor information

Editors and Affiliations

Appendix: Sample Output from Praat After the Phonetic Alignment

Appendix: Sample Output from Praat After the Phonetic Alignment

A set of program codes defines the file type and object class along with the item specifications with their intervals.
A set of program codes defines the intervals with x min, x max, and text as pause, n a b, d b b, v c 33, n c b, and n c d a.
A set of program codes defines several intervals with values of x min, x max, and text as single alphabets.
A set of program codes defines several intervals with values of x min, x max, and text as single alphabets.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Peking University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tang, M., Wan, IP. (2020). Predicting Speech Errors in Mandarin Based on Word Frequency. In: Su, Q., Zhan, W. (eds) From Minimal Contrast to Meaning Construct. Frontiers in Chinese Linguistics, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-32-9240-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-32-9240-6_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-32-9239-0

  • Online ISBN: 978-981-32-9240-6

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics