Skip to main content

Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners

  • Chapter
  • First Online:
Computational and Corpus Approaches to Chinese Language Learning

Part of the book series: Chinese Language Learning Sciences ((CLLS))

  • 895 Accesses

Abstract

Chinese as a foreign language (CFL) learners may, in their language production, generate inappropriate linguistic usages, including character-level confusions (or commonly known as spelling errors) and word-/sentence-/discourse-level grammatical errors. Chinese spelling errors frequently arise from confusions among multiple-character words that are phonologically and visually similar but semantically distinct. Chinese grammatical errors contain coarse-grained surface differences in terms of missing, redundant, incorrect selection, and word ordering error of linguistic components. Simultaneously, fine-grained error types further focus on representing linguistic morphology and syntax such as verb, noun, preposition, conjunction, adverb, and so on. Annotated learner corpora are important language resources to understand these error patterns and to help the development of error diagnosis systems. In this chapter, we describe two representative Chinese learner corpora: the HSK Dynamic Composition Corpus constructed by Beijing Language and Culture University and the TOCFL Learner Corpus built by National Taiwan Normal University. In addition, we introduce several evaluations based on both learner corpora designed for computer-assisted Chinese learning. One is a series of SIGHAN bakeoffs for Chinese spelling checkers. The other series are the NLPTEA workshop shared tasks for Chinese grammatical error identification. The purpose of this chapter is to summarize the resources and evaluations for better understanding the current research developments and challenges of automated Chinese error diagnosis for CFL learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Chang, L.-P. (2016). Error classification and annotation of the TOCFL learner corpus. In Proceedings of the 3rd International Conference of Interlanguage Corpora (pp. 131–159). Beijing, China.

    Google Scholar 

  • Chang, T.-H., Sung, Y.-T., Hong, J.-F., & Chang, J.-I. (2014). KNGED: A tool for grammatical error diagnosis of Chinese sentences. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 48–55). Nara, Japan.

    Google Scholar 

  • Chang, T.-H., Chen, H.-C., & Yang, C.-H. (2015). Introduction to a proofreading tool for Chinese spelling check task of SIGHAN-8. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 50–55). Beijing, China.

    Google Scholar 

  • Chen, P.-L., Wu, S.-H., Chen, L.-P., & Yang, P.-C. (2016a). CYUT-III system at Chinese grammatical error diagnosis task. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 63–72). Osaka, Japan.

    Google Scholar 

  • Chen, S.-H., Tsai, Y.-L., & Lin, C.-J. (2016b). Generating and scoring correction candidates in Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 131–139). Osaka, Japan.

    Google Scholar 

  • Chiu, H.-W., Wu, J.-C., & Chang, J. S. (2014). Chinese spelling checking based on noisy channel model. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 202–209). Wuhan, China.

    Google Scholar 

  • Chou, W.-C., Lin, C.-K., Liao, Y.-F., & Wang, Y.-R. (2016). Word order sensitive embedding features/conditional random field-based Chinese grammatical error detection. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 73–81). Osaka, Japan.

    Google Scholar 

  • Chu, W.-C., & Lin, C.-J. (2014). NTOU Chinese spelling check system in CLP bake-off 2014. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 210–215). Wuhan, China.

    Google Scholar 

  • Chu, W.-C., & Lin, C.-J. (2015). NTOU Chinese spelling check system in SIGHAN-8 bake-off. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 137–143). Beijing, China.

    Google Scholar 

  • Cui, X., & Zhang, B.-L. (2011). The principles for building the “International Corpus of Learner Corpus”. Applied Linguistics,2011(2), 100–108.

    Google Scholar 

  • Díaz-Negrillo, A., & Fernández-Domínguez, J. (2006). Error tagging systems for learner corpora. Revista española de lingüística aplicada (RESLA),19, 83–102.

    Google Scholar 

  • Fachverband Chinesisch e.V. (2010). Statement of the Fachverband Chinesisch e.V. on the new HSK Chinese Proficiency Test. http://www.fachverband-chinesisch.de/sites/default/files/FaCh2010_ErklaerungHSK_en.pdf. Accessed December 26, 2017.

  • Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research,1(1), 7–24.

    Article  Google Scholar 

  • Gu, L., Wang, Y., & Liang, X. (2014). Introduction to NJUPT Chinese spelling check system in CLP-2014 bakeoff. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 167–172). Wuhan, China.

    Google Scholar 

  • Huang, S., & Wang, H. (2016). Bi-LSTM neural networks for Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 148–154). Osaka, Japan.

    Google Scholar 

  • Huang, Q., Huang, P., Zhang, X., Xie, W., Hong, K., Chen, B., & Huang, L. (2014). Chinese spelling check system based on tri-gram model. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 173–178). Wuhan, China.

    Google Scholar 

  • Lee, L.-H., Lee, K.-C., Chang, L.-P., Tseng, Y.-H., Yu, L.-C., & Chen, H.-H. (2014). A tagging editor for learner corpus annotation and error analysis. In Proceedings of the 22nd International Conference on Computers in Education (pp. 806–808). Nara, Japan.

    Google Scholar 

  • Lee, L.-H., Chang, L.-P., Liao, B.-S., Cheng, W.-L., & Tseng, Y.-H. (2015a). A retrieval system for interlanguage analysis. In Proceedings of the 23rd International Conference on Computers in Education (pp. 599–601). Hangzhou, China.

    Google Scholar 

  • Lee, L.-H., Yu, L.-C., & Chang, L.-P. (2015b). Overview of the NLP-TEA 2015 shared task for Chinese grammatical error diagnosis. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 1–6). Beijing, China.

    Google Scholar 

  • Lee, L.-H., Chang, L.-P., & Tseng, Y.-H. (2016a). Developing learner corpus annotation for Chinese grammatical errors. In Proceedings of the 20th International Conference on Asian Language Processing (pp. 254–257). Tainan, Taiwan.

    Google Scholar 

  • Lee, L.-H., Rao, G., Yu, L.-C., Xun, E., Zhang, B., & Chang, L.-P. (2016b). Overview of the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 40–48). Osaka, Japan.

    Google Scholar 

  • Lee, L.-H., Tseng, Y.-H., & Chang, L.-P. (2018). Building a TOCFL learner corpus for Chinese grammatical error diagnosis. In Proceedings of the 11th International Conference on Language Resources and Evaluation (pp. 2298–2304), Miyazaki, Japan.

    Google Scholar 

  • Lin, C.-J., & Chan, S.-H. (2014). Description of NTOU Chinese grammar checker in CFL 2014. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 75–78). Nara, Japan.

    Google Scholar 

  • Lin, C.-J., & Chen, S.-H. (2015). NTOU Chinese grammar checker for CGED shared task. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 15–19). Beijing, China.

    Google Scholar 

  • Little, D. (2006). The Common European Framework of Reference for languages: Content, purpose, origin, reception and impact. Language Teaching,39(3), 167–190.

    Article  Google Scholar 

  • Liu, M., Jian, P., & Huang, H. (2014). Introduction to BIT Chinese spelling correction system at CLP 2014 bake-off. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 179–185). Wuhan, China.

    Google Scholar 

  • Liu, Y., Han, Y., Zhuo, L., & Zan, H. (2016). Automatic grammatical error detection for Chinese based on conditional random field. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 57–62). Osaka, Japan.

    Google Scholar 

  • Malmasi, S., & Dras, M. (2015). Large-scale native language identification with cross-corpus evaluation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 1403–1409). Denver, CO, USA.

    Google Scholar 

  • Sakaguchi, K., Arase, Y., & Komachi, M. (2013). Discriminative approach to fill-in-the-blank quiz generation for language learners. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (pp. 238–242). Sofia, Bulgaria.

    Google Scholar 

  • Sawai, Y., Komachi, M., & Matsumoto, Y. (2013). A learner corpus-based approach for verb suggestion for ESL. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (pp. 708–713). Sofia, Bulgaria.

    Google Scholar 

  • Swanson, B., & Charniak, E. (2013). Extracting the native language signal for second language acquisition. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 85–94). Atlanta, GA, USA.

    Google Scholar 

  • Tseng, Y.-H., Lee, L.-H., Chang, L.-P., & Chen, H.-H. (2015). Introduction to SIGHAN 2015 bake-off for Chinese spelling check. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 32–37). Beijing, China.

    Google Scholar 

  • Wang, Y.-R., & Liao, Y.-F. (2014). NCTU and NTUT’s entry to CLP-2014 Chinese spelling check evaluation. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 216–219). Wuhan, China.

    Google Scholar 

  • Wang, Y.-R., & Liao, Y.-F. (2015). Word vector/conditional random field-based Chinese spelling error detection for SIGHAN-2015 evaluation. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 46–49). Beijing, China.

    Google Scholar 

  • Wang, C., & Seneff, S. (2007). Automatic assessment of student translations for foreign language tutoring. In Proceedings of the 2007 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 468–475). Rochester, NY, USA.

    Google Scholar 

  • Wu, S.-H., Liu, C.-L., & Lee, L.-H. (2013). Chinese spelling check evaluation at SIGHAN bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing (pp. 35–42). Nagoya, Japan.

    Google Scholar 

  • Wu, S.-H., Chen, P.-L., Chen, L.-P., Yang, P.-C., & Yang, R.-D. (2015a). Chinese grammatical error diagnosis by conditional random fields. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 7–14). Beijing, China.

    Google Scholar 

  • Wu, X., Huang, P., Wang, J., Guo, Q., Xu, Y., & Chen, C. (2015b). Chinese grammatical error diagnosis system based on hybrid model. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 117–125). Beijing, China.

    Google Scholar 

  • Xiang, Y., Wang, X., Han, W., & Hong, Q. (2015). Chinese grammatical error diagnosis using ensemble learning. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 99–104). Beijing, China.

    Google Scholar 

  • Xie, W., Huang, P., Zhang, X., Hong, K., Huang, Q., Chen, B., & Huang, L. (2015). Chinese spelling check system based on n-gram model. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 128–136). Beijing, China.

    Google Scholar 

  • Xin, Y., Zhao, H., Wang, Y., & Jia, Z. (2014). An improved graph model for Chinese spell checking. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 157–166). Wuhan, China.

    Google Scholar 

  • Xiong, J., Zhang, Q., Hou, J., Wang, Q., Wang, Y., & Cheng, X. (2014). Extended HMM and ranking models for Chinese spelling correction. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 133–138). Wuhan, China.

    Google Scholar 

  • Yang, J., Peng, B., Wang, J., Zhang, J., & Zhang, X. (2016). Chinese grammatical error diagnosis using single word embedding. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 155–161). Osaka, Japan.

    Google Scholar 

  • Yannakoudakis, H., Briscoe, T., & Medlock, B. (2011). A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 180–189). Portland, OR, USA.

    Google Scholar 

  • Yeh, J.-F., Lu, Y.-Y., Lee, C.-H., Yu, Y.-H., & Chen, Y.-T. (2014a). Chinese word spelling correction based on rule induction. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 139–145). Wuhan, China.

    Google Scholar 

  • Yeh, J.-F., Lu, Y.-Y., Lee, C.-H., Yu, Y.-H., & Chen, Y.-T. (2014b). Detecting grammatical error in Chinese sentence for foreign. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 62–68). Nara, Japan.

    Google Scholar 

  • Yeh, J.-F., Yeh, C.-K., Yu, K.-H., Li, Y.-T., & Tsai, W.-L. (2015). Conditional random field-based grammatical error detection for Chinese as second language. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 105–110). Beijing, China.

    Google Scholar 

  • Yeh, J.-F., Hsu, T.-W., & Yeh, C.-K. (2016). Grammatical error detection based on machine learning for Mandarin as second language learning. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 140–147). Osaka, Japan.

    Google Scholar 

  • Yu, J., & Li, Z. (2014). Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 220–223). Wuhan, China.

    Google Scholar 

  • Yu, L.-C., Lee, L.-H., & Chang, L.-P. (2014a). Overview of grammatical error diagnosis for learning Chinese as a foreign language. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 42–47). Nara, Japan.

    Google Scholar 

  • Yu, L.-C., Lee, L.-H., Tseng, Y.-H., & Chen, H.-H. (2014b). Overview of SIGHAN 2014 bake-off for Chinese spelling check. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 126–132). Wuhan, China.

    Google Scholar 

  • Zampieri, M., & Tan, L. (2014). Grammatical error detection with limited training data: The case of Chinese. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 69–74). Nara, Japan.

    Google Scholar 

  • Zhang, B.-L., & Cui, X. (2013). Design concepts of the construction and research of the inter-language corpus of Chinese from global learners. Language Teaching and Linguistic Study,2013(5), 27–34.

    Google Scholar 

  • Zhang, S., Xiong, J., Hou, J., Zhang, Q., & Cheng, X. (2015). HANSpeller++: A unified framework for Chinese spelling correction. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 38–45). Beijing, China.

    Google Scholar 

  • Zhao, Y., Komachi, M., & Ishikawa, H. (2014). Extracting a Chinese learner corpus from the Web: Grammatical error correction for learning Chinese as a foreign language with statistical machine translation. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 56–61). Nara, Japan.

    Google Scholar 

  • Zhao, Y., Komachi, M., & Ishikawa, H. (2015). Improving Chinese grammatical error correction with corpus augmentation and hierarchical phrase-based statistical machine translation. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 111–116). Beijing, China.

    Google Scholar 

  • Zheng, B., Che, W., Guo, J., & Liu, T. (2016). Chinese grammatical error diagnosis with long short-term memory networks. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 49–56). Osaka, Japan.

    Google Scholar 

Download references

Acknowledgements

This study was partially supported by the Ministry of Science and Technology, under the grant MOST 103-2221-E-003-013-MY3, MOST 105-2221-E-003-020-MY2, MOST 106-2221-E-003-030-MY2, and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuen-Hsien Tseng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lee, LH., Tseng, YH., Chang, LP. (2019). Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners. In: Lu, X., Chen, B. (eds) Computational and Corpus Approaches to Chinese Language Learning. Chinese Language Learning Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-13-3570-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-3570-9_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-3569-3

  • Online ISBN: 978-981-13-3570-9

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics