Text Analysis of Corpus Linguistics in a Post-concordancer Era

  • Simon Ho WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10108)


In this methodological paper, I review a number of studies in corpus linguistics that rely heavily on off-the-shelf computer programs known as concordancers. While acknowledging the fruitful research findings generated using concordancers, it is argued that natural language processing (NLP) tools such as Stanford parser and SyntaxNet should be used to automate certain analytical procedures that are often performed manually by corpus linguistics researchers using concordancers. More collaboration efforts between NLP researchers and corpus linguists are called for to help advance the field of corpus linguistics into a post-concordancer era.


Natural Language Processing Language Teaching Syntactic Complexity Sentence Boundary Corpus Linguistic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Banko, M. et al.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007) (2007)Google Scholar
  2. 2.
    Bresnan, J.: Lexical-functional syntax. Blackwell, Malden (2001)Google Scholar
  3. 3.
    Carroll, J. et al.: Corpus annotation for parser evaluation. arXiv preprint arXiv:cs/9907013 (1999)
  4. 4.
    Chambers, A.: Integrating corpus consultation in language studies. Lang. Learn. Technol. 9(2), 111–125 (2005)Google Scholar
  5. 5.
    Clegg, A.B.: Computational-linguistic approaches to biological text mining, University of London (2008)Google Scholar
  6. 6.
    Davies, M.: Google Scholar and COCA-Academic: two very different approaches to examining academic English. J. Engl. Acad. Purp. 12(3), 155–165 (2013)CrossRefGoogle Scholar
  7. 7.
    De Marneffe, M.-C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. Association for Computational Linguistics (2008)Google Scholar
  8. 8.
    Erkan, G. et al.: Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)Google Scholar
  9. 9.
    Flowerdew, J.: Concordancing as a tool in course design. System 21(2), 231–244 (1993)CrossRefGoogle Scholar
  10. 10.
    Flowerdew, J., Wang, S.H.: Author’s editor revisions to manuscripts published in international journals. J. Second Lang. Writ. 32, 39–52 (2016)CrossRefGoogle Scholar
  11. 11.
    Hyland, K., Tse, P.: Hooking the reader: a corpus study of evaluative <i> that </i> in abstracts. Engl. Specif. Purp. 24(2), 123–139 (2005)CrossRefGoogle Scholar
  12. 12.
    King, T.H. et al.: The PARC 700 dependency bank. In: 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003) (2003)Google Scholar
  13. 13.
    Lee, D., Swales, J.: A corpus-based EAP course for NNS doctoral students: moving from available specialized corpora to self-compiled corpora. Engl. Specif. Purp. 25(1), 56–75 (2006)CrossRefGoogle Scholar
  14. 14.
    Lu, X.: Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15, 4 (2010)CrossRefGoogle Scholar
  15. 15.
    Lu, X., Ai, H.: Syntactic complexity in college-level English writing: differences among writers with diverse L1 backgrounds. J. Second Lang. Writ. 29, 16–27 (2015)CrossRefGoogle Scholar
  16. 16.
    Meena, A., Prabhakar, T.V.: Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 573–580. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71496-5_53 CrossRefGoogle Scholar
  17. 17.
    Ortega, L.: Syntactic complexity in L2 writing: progress and expansion. J. Second Lang. Writ. 29, 82–94 (2015)CrossRefGoogle Scholar
  18. 18.
    Ortega, L.: Syntactic complexity measures and their relationship to L2 proficiency: a research synthesis of college-level L2 writing. Appl. Linguist. 24(4), 492–518 (2003)CrossRefGoogle Scholar
  19. 19.
    Thurstun, J., Candlin, C.N.: Concordancing and the teaching of the vocabulary of academic English. Engl. Specif. Purp. 17(3), 267–280 (1998)CrossRefGoogle Scholar
  20. 20.
    Yoon, C.: Concordancing in L2 writing class: an overview of research and issues. J. Engl. Acad. Purp. 10(3), 130–139 (2011)CrossRefGoogle Scholar
  21. 21.
    Yoon, H.: More than a linguistic reference: the influence of corpus technology on L2 academic writing. Lang. Learn. Technol. 12(2), 31–48 (2008)MathSciNetGoogle Scholar
  22. 22.
    Youn, S.J.: Measuring syntactic complexity in L2 pragmatic production: investigating relationships among pragmatics, grammar, and proficiency. System 42, 270–287 (2014)CrossRefGoogle Scholar
  23. 23.
    Zareva, A.: Self-mention and the projection of multiple identity roles in TESOL graduate student presentations: the influence of the written academic genres. Engl. Specif. Purp. 32(2), 72–83 (2013)CrossRefGoogle Scholar
  24. 24.
    Zhang, G.: It is suggested that… or it is better to…? Forms and meanings of subject it-extraposition in academic and popular writing. J. Engl. Acad. Purp. 20, 1–13 (2015)CrossRefGoogle Scholar
  25. 25.
    Zhuang, L. et al.: Movie review mining and summarization. In: Presented at the Proceedings of ACM Conference on Information and Knowledge Management (CIKM) (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Language CenterHong Kong Baptist UniversityKowloon Tong, KowloonHong Kong

Personalised recommendations