Abstract
This paper investigates the effect of word frequency on the occurrence of speech errors in Mandarin. A corpus of 390 speech errors along with their surrounding linguistic context was gathered. The information of word frequency was extracted from the Academia Sinica Corpus. Our analysis with a computational classifier based on conditional inference trees shows that intended words having a frequency lower than words of the surrounding context are more likely to generate speech errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The audio files were automatically transcribed in Speech-to-Text systems powered by AI Labs in Taiwan or Google API depending on the recording quality.
- 2.
The average error rate of the Speech-to-Text systems was 40%. Main influencing factors were the recording environment and the voice quality of the speakers.
- 3.
The phonetic alignment marked to the phoneme level is still under construction and being developed with DNNs by Dr. Chain-Wu Lee.
- 4.
The total amount of semantic lexical errors is much higher in the entire corpus. However, since the transcription and phonetic alignment is still an ongoing work, we only selected a sample from the semantic lexical errors that were already transcribed, annotated, and cross-checked.
References
Arnaud, Pierre J. 1999. Target—error resemblance in French word substitution speech errors and the mental lexicon. Applied Psycholinguistics 20 (2): 269–287.
Bastiaanse, Roelien, Martijn Wieling, and Nienke Wolthuis. 2015. The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30: 1221–1239.
Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Press.
Breiman, Leo. 2001. Random forests. Machine Learning 45 (1): 5–32.
Breiman, Leo, Jerome Friedman, Charles J. Stone, and Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.
CKIP (Chinese Knowledge Information Processing Group). 1998. The content and illustration of Academica Sinica Corpus. Taipei: Academia Sinica.
Cutler, Anne. 1982. The reliability of speech error data. In Slips of the tongue and language production, ed. Anne Cutler, 7–28. Amsterdam: Mouton.
Fay, David, and Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8: 505–520.
Fromkin, Victoria. 1980. Errors in linguistic performance: Slips of the tongue, ear, pen, and hand. NY: Academic Press.
Harley, Trevor, and Siobhan MacAndrew. 2001. Constraints upon word substitution speech errors. Journal of Psycholinguistic Research 30: 395–418.
Huang, Chu-Ren, Lung-Hao Lee, Qu Wei-guang, Jia-Fei Hong, and Yu. Shiwen. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. LREC 2008: 2725–2729.
Jaeger, Jeri J. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah: Lawrence Erlbaum Associates.
Kittredge, Audrey K., Gary S. Dell, Jay Verkuilen, and Myrna F. Schwartz. 2008. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cognitive Neuropsychology 25: 463–492.
Levelt, Willem J. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT press.
Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.
Ma, Wei-Yun, Chu-Ren, Huang. 2006. Uniform and effective tagging of a heterogeneous Giga-word corpus. In Proceedings of the 5th international conference on language resources and evaluation (LREC-5).
Martin, Nadine, and Eleanor M. Saffran. 1997. Language and auditory-verbal short-term memory impairments: Evidence for common underlying processes. Cognitive Neuropsychology 14: 641–682.
Minkina, Irene, Nadine Martin, Kristie A. Spencer, and Diane L. Kendall. 2018. Links between short-term memory and word retrieval in aphasia. American Journal Speech Language Pathology 27 (1): 379–391.
Nickels, Lyndsey, and David Howard. 1994. A frequent occurrence? Factors affecting the production of semantic errors in aphasic naming. Cognitive Neuropsychology 11: 289–320.
Ting, Kai Ming. 2010. Precision and Recall. In Encyclopedia of machine learning, Claude, Sammut, Geoffrey I. Webb, (eds.). 781–781. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-30164-8_652.
Wan, I-Ping, Marc, Tang. 2018. A corpus study of lexical speech errors in Mandarin. Manuscript.
Wijnen, Frank. 1992. Incidental word and sound errors in young speakers. Journal of Memory and Language 31: 734–755.
Wan, I-Ping, Ting, Jen. To appear. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics.
Acknowledgements
We thank the two anonymous reviewers for their constructive comments, which led to significant improvements of the paper. The second author would like to thank Dr. Chain-wu Lee for his continuous cutting-edge high-tech programming support in constructing all the corpora in Phonetics and Psycholinguistics lab at National Chengchi University. All remaining errors are our own. The research reported in this paper was funded to the second author by MOST three-year grant, MOST 98-2410-H-004-103-MY2, in Taiwan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Sample Output from Praat After the Phonetic Alignment
Appendix: Sample Output from Praat After the Phonetic Alignment
Rights and permissions
Copyright information
© 2020 Peking University Press
About this chapter
Cite this chapter
Tang, M., Wan, IP. (2020). Predicting Speech Errors in Mandarin Based on Word Frequency. In: Su, Q., Zhan, W. (eds) From Minimal Contrast to Meaning Construct. Frontiers in Chinese Linguistics, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-32-9240-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-32-9240-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9239-0
Online ISBN: 978-981-32-9240-6
eBook Packages: Social SciencesSocial Sciences (R0)