Academic Chinese: From Corpora to Language Teaching

  • Howard Ho-Jan Chen
  • Hongyin TaoEmail author
Part of the Chinese Language Learning Sciences book series (CLLS)


The past several decades of research in Chinese applied linguistics have seen rapid developments in corpus infrastructure building and exploitation. However, one area in which systematic research is still lacking involves academic Chinese. In this chapter, we describe the construction of written academic Chinese corpora at the National Taiwan Normal University and University of California, Los Angeles and report preliminary results of research based on these corpora as well as their pedagogical applications in developing teaching materials for advanced Chinese language learning. Theoretical and practical issues in academic Chinese and the role of corpora in academic language pedagogy are discussed.



Howard Chen would like to thank the Ministry of Sciences and Technology, Taiwan, for supporting the related corpus research. He is also very grateful that this research is partially supported by the “Aim for the Top University Project” of the National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.

Hongyin Tao wishes to acknowledge the following individual and institutional support: Weidong Zhan and his team at Peking University, Ying Yang, and Yan Zhou for assistance with the UCLA CWAC corpus; Danjie Su, Xiaoxin Sun, and Yu-Wen Yao for participating in the development of the teaching materials; Jiajin Xu for constructive comments on an earlier version of the paper; Elizabeth Carter for both the translation of the teaching materials into English and the editorial assistance with this article; National Taiwan Normal University for support with a Distinguished Chair Professorship (2015–17); the UCLA Academic Senate faculty research grant (AY2014-16); and the U.S. Department of Education, grant #P229A140026, to the Center for Advanced Language Proficiency Education and Research (CALPER), The Pennsylvania State University. However, the contents developed under the grant do not necessarily represent the policy of the U.S. Department of Education, and you should not assume endorsement by the Federal Government.

Finally, we are grateful to the editors and anonymous referees for the constructive comments and suggestions on an earlier version of the paper. Any remaining shortcomings are of course entirely our own.


  1. Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  2. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating structure and use. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  3. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow: Pearson.Google Scholar
  4. Campion, M., & Elley, W. (1971). An academic vocabulary list. Wellington: New Zealand Council for Educational Research.Google Scholar
  5. Canavan, A., & Zipperlen, G. (1996). CALLFRIEND Mandarin Chinese-Mainland dialect. Philadelphia: Linguistic Data Consortium.Google Scholar
  6. Conrad, S. (2000). Will Corpus linguistics revolutionize grammar teaching in the 21st century? TESOL Quarterly, 34(3), 548–560.CrossRefGoogle Scholar
  7. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.CrossRefGoogle Scholar
  8. Chang, M., Luo Y., & Hsu Y. (2012). Subjectivity and objectivity in Chinese academic discourse: How attribution hedges indicate authorial stance. Concentric: Studies in Linguistics, 38(2), 293–329.Google Scholar
  9. Chen, W. (陈望道) (1962/1997). Introduction to rhetoric (《修辞学发凡》). Shanghai: Shanghai Education Press (上海教育出版社).Google Scholar
  10. Donley, K., & Reppen, R. (2001). Using corpus tools to highlight academic vocabulary in SCLT. TESOL Journal, 10(2–3), 7–12.Google Scholar
  11. Du, W. (杜文霞) (2005). Ba constructions in different registers and their pragmatic functions (把字句在不同语体中的分布、结构、语用差异考察). Journal of Nanjing University Social Sciences Edition (《南京师范大学报 (社会科学版)》), 2005(1), 145–150.Google Scholar
  12. Feng, S. (2009). On modern written Chinese. Journal of Chinese Linguistics, 37(1), 145–161.Google Scholar
  13. Francis, W., & Kucera, H. (1964). A standard corpus of present-day edited American English, for use with digital computers. Providence, RI: Brown University.Google Scholar
  14. Flowerdew, J. (2013). Introduction: Approaches to the analysis of academic discourse. In J. Flowerdew (Ed.), academic discourse (pp. 1–18). New York: Routledge.Google Scholar
  15. Flowerdew, L. (2005). An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: Countering criticisms against corpus-based methodologies. English for Specific Purposes, 24(3), 321–332.CrossRefGoogle Scholar
  16. Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327.CrossRefGoogle Scholar
  17. Ghadessy, P. (1979). Frequency counts, words lists, and materials preparation: A new approach. English Teaching Forum, 17, 24–27.Google Scholar
  18. Han, Z., & Dong, J. (韓志剛、董傑). (2010). Vocabulary selection for scientific Chinese (《科技漢語教材編寫中的選詞問題》), Cultural and Teaching Materials (文教资料), 2010(9), 51–53.Google Scholar
  19. Huang, C., & Chen, K. (1992). A Chinese corpus for linguistics research. In Proceedings of the 1992 International Conference on Computational Linguistics (pp. 1214–1217). Nantes, France.Google Scholar
  20. Hu, G., & Cao, F. (2011). Hedging and boosting in abstracts of applied linguistics articles: a comparative study of English- and Chinese-medium journals. Journal of Pragmatics, 43(11), 2795–2809.CrossRefGoogle Scholar
  21. Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21.CrossRefGoogle Scholar
  22. Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied Linguistics, 32, 150–169.CrossRefGoogle Scholar
  23. Li, Y. (李裕德). (1985). Grammar of scientific Chinese (科技汉语语法). Beijing: Metallurgical Industry Press (冶金工业出版社).Google Scholar
  24. Liu, C.(劉貞妤), Chen, H.(陳浩然), & Yang, H. (楊惠媚) (2016). Compiling a Chinese academic wordlist based on an academic corpus (藉學術語料庫提出中文學術常用詞表: 以人文社會科學為例). Journal of Chinese Language Teaching (華語文教學研究), 13(2), 4–87.Google Scholar
  25. Liu, C.(劉貞妤), Chen, H.(陳浩然), & Yang, H. (楊惠媚) (2017). Study on the lexical bundles in Chinese academic writing (中文人文社會科學論文常用詞串之研究). Journal of Chinese Language Teaching (華語文教學研究), 14(1), 119–152.Google Scholar
  26. Lű, S. (吕叔湘) (1963). Monosyllables and disyllables in modern Chinese (现代汉语单双音节问题初探). Chinese Language (《中国语文》), 1963(1), 11–23.Google Scholar
  27. Lynn, R. (1973). Preparing word lists: A suggested method. RELC Journal, 4(1), 25–32.CrossRefGoogle Scholar
  28. McEnery, A., & Xiao, Z. (2004). The Lancaster corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (pp. 1175–1178). Lisbon, Portugal.Google Scholar
  29. Ochs, E. (1996). Linguistic resources for socializing humanity. In J. Gumperz & S. Levinson (Eds.), Rethinking linguistic relativity (pp. 407–437). Cambridge: Cambridge University Press.Google Scholar
  30. O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  31. Peng, N. (彭妮丝) (Ed.). (2016). Introduction to professional Chinese (專業華語概論). Taipei: New Sharing Publishing Company Ltd. (新學林出版股份有限公司).Google Scholar
  32. Praninskas, J. (1972). American university word list. London: Longman.Google Scholar
  33. Salazar, D. (2014). Lexical bundles in native and non-native scientific writing: Applying a corpus-based study to language teaching. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  34. SCTOP (Steering Committee on the Test of Proficiency—Huayu. 國家華語測驗推動工作委員會). (2016). Introducing the Mandarin TOCFL 8000 word list (華語八千詞表說明). Taipei. Available online from Last accessed September 2, 2017.
  35. Shao, C. (邵长超). (2010). Adjectival predicates in literary and scientific registers (文艺语体和科技语体形谓句状语差异研究). Jinan University Journal (《暨南学报 (哲学社会科学版)》), 2010(2), 123–127.Google Scholar
  36. Simpson-Vlach, R. (2006). Academic speech across disciplines: Lexical and phraseological distinctions. In K. Hyland & M. Bondi (Eds.), Academic discourse across disciplines (pp. 295–316). Bern: Peter Lang.Google Scholar
  37. Simpson-Vlach, R. (2013). Corpus analysis of spoken English for academic purposes. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. Chichester, UK: Blackwell.Google Scholar
  38. Sinclair, J. (Ed.). (2004). How to use corpora in language teaching. Amsterdam: John Benjamins.Google Scholar
  39. Swales, J. (2002). Integrated and fragmented worlds: EAP materials and corpus linguistics. In J. Flowerdew (Ed.), Academic discourse (pp. 150–164). London: Longman.Google Scholar
  40. Tao, H. (2013). Corpus of Written Academic Chinese. ACTFL CALPER Brochure. State College, PA: Pennsylvania State University.Google Scholar
  41. Tao, H. (2015). Profiling the Mandarin spoken vocabulary based on corpora. In W. Wang & C. Sun (Eds.), Oxford handbook of Chinese linguistics (pp. 336–347). Oxford: Oxford University Press.Google Scholar
  42. Tao, H., & Xiao, R. (2007). The UCLA Chinese Corpus. UCREL: Lancaster.Google Scholar
  43. Xiao, R., Rayson, P., & McEnery, T. (2009). A frequency dictionary of Mandarin Chinese: Core vocabulary for learners. London/New York: Routledge.Google Scholar
  44. Xu, J. (2015). Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language and Discourse, 6(2), 218–244.CrossRefGoogle Scholar
  45. Xu, J. (许家金) (2017). ToRCH2014 Corpus (ToRCH2014现代汉语平衡语料库). Beijing: Beijing Foreign Studies University.Google Scholar
  46. Xue, G., & Nation, I. (1984). A university word list. Language Learning and Communication, 3(2), 215–229.Google Scholar
  47. Xun, E. (荀恩东), Rao, G. (饶高琦), Xiao, X. (肖晓悦), & Zang, JJ (臧娇娇). (2016). Developing the BCC Corpus in a big data environment (大数据背景下BCC语料库的研制). Corpus Linguistics (语料库语言学), 3(1), 93–109.Google Scholar
  48. Zhang, Z. (2017). Dimensions of variation in written Chinese. New York: Routledge.CrossRefGoogle Scholar
  49. Zipf, G. (1935). The psycho-biology of language. Boston: Houghton Mifflin Company.Google Scholar
  50. Zou, J. (邹嘉彦) & You, R., (游汝杰). (2010). A Global Chinese Neologisms Dictionary (《全球华语新词语词典》). Beijing: Commercial Press.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.National Taiwan Normal UniversityTaipeiTaiwan
  2. 2.University of CaliforniaLos AngelesUSA

Personalised recommendations