Performance Analysis of a Serial Natural Language Processing Pipeline for Scaling Analytics of Academic Writing Process

  • David Boulanger
  • Clayton Clemens
  • Jeremie Seanosky
  • Shawn Fraser
  • Vivekanandan KumarEmail author


Capturing and analyzing just the final submission of a writing assignment ignores a substantial amount of information, providing only a partial view of the writer’s effort and intent. Such a partial view of writing abilities limits opportunities for the generation of feedback to improve the final writing product as well as to aid in the development of effective writing techniques. Over-the-shoulder monitoring of the writing process for only a few individuals proves to be a challenge, while scaling specialized tutoring to as many writers as possible is simply impossible without leveraging technology. This research analyzes the computational requirements of a single-threaded writing analytics system for real-time monitoring and instructional intervention of writing processes. This chapter reports on the performance of this analytics system using the simulated writing processes of 391 compositions in higher education, a subset of the British Academic Written English (BAWE) corpus. It elaborates on computational requirements of analytics elements involving Natural Language Processing (NLP) and offers recommendations for building scalable big data NLP pipelines adapted to the analysis of academic writing process of learners.


Writing analytics Writing process Scalability Natural language processing Big data Performance 



The authors gratefully acknowledge NSERC funding for this research.


  1. Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big data for natural language processing: A streaming approach. Knowledge-Based Systems, 79, 36–42. Scholar
  2. Alsop, S., & Nesi, H. (2009). Issues in the development of the British Academic Written English (BAWE) corpus. Corpora, 4(1), 71–83.CrossRefGoogle Scholar
  3. Alvarez-Fernandez, M.-L., & Garcia-Sanchez, J.-N. (2015). The orchestration of processes in relation to the product, and the role of psychological variables in written composition. Anales de Psicologia, 31(1), 96–108. Scholar
  4. Boulanger, D., Seanosky, J., Clemens, C., Kumar, V., & Kinshuk. (2016). SCALE: A smart competence analytics solution for English writing. In Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT) (pp. 468–472). Washington, DC: IEEE. Scholar
  5. Clemens, C. (2017). A causal model of writing competence (Doctoral dissertation, Athabasca University, 2017). Retrieved from
  6. Clemens, C., Kumar, V., Boulanger, D., Seanosky, J., & Kinshuk. (2018). Learning traces, competence assessment, and causal inference for English composition. In Frontiers of cyberlearning (pp. 49–67). Singapore: Springer.Google Scholar
  7. Cureton, E. E. (1968). Rank-biserial correlation when ties are present. Educational and Psychological Measurement, 28(1), 77–79.CrossRefGoogle Scholar
  8. Franklin, S. V., & Hermsen, L. M. (2014). Real-time capture of student reasoning while writing. Physical Review Special Topics-Physics Education Research, 10(2), 020121. Scholar
  9. Freiman, M. (2015). The art of drafting and revision: Extended mind in creative writing. New Writing – The International Journal for the Practice and Theory of Creative Writing, 12(1), 48–66. Scholar
  10. Fuchs, S., & Krivokapic, J. (2016). Prosodic boundaries in writing: Evidence from a keystroke analysis. Frontiers in Psychology, 7, 1678. Scholar
  11. Garcia, J.-N., & Fidalgo, R. (2008). Orchestration of writing processes and writing products: A comparison of sixth-grade students with and without learning disabilities. Learning Disabilities: A Contemporary Journal, 6(2), 77–98.Google Scholar
  12. Glass, G. V. (1966). Note on rank biserial correlation. Educational and Psychological Measurement, 26(3), 623–631. Scholar
  13. Goyal, A., Singh, A., Bhargava, S., Crawl, D., Altintas, I., & Hsu, C.-N. (2016). Natural language processing using Kepler workflow system: First steps. Procedia Computer Science, 80, 712–721. Scholar
  14. Heuboeck, A., Holmes, J., & Nesi, H. (2010). The BAWE corpus manual. Reading: University of Reading.Google Scholar
  15. Kaggal, V. C., Elayavilli, R. K., Mehrabi, S., Pankratz, J. J., Sohn, S., Wang, Y., … Liu, H. (2016). Toward a learning health-care system – Knowledge delivery at the point of care empowered by big data and NLP. Biomedical Informatics Insights, 8(Suppl. 1), 13–22. Scholar
  16. Kumar, V., Fraser, S. N., & Boulanger, D. (2017). Discovering the predictive power of five baseline writing competences. Journal of Writing Analytics, 1(1), 176–226.Google Scholar
  17. Lewkow, N., Feild, J., Zimmerman, N., Riedesel, M., Essa, A., Boulanger, D., … Kode, S. (2016). A scalable learning analytics platform for automated writing feedback. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (pp. 109–112). New York, NY: ACM. Scholar
  18. McCreadie, R., Macdonald, C., Ounis, I., Osborne, M., & Petrovic, S. (2013). Scalable distributed event detection for Twitter. In 2013 IEEE International Conference on Big Data (pp. 543–549). Washington, DC: IEEE. Scholar
  19. Monali, P., & Sandip, K. (2014). A concise survey on text data mining. International Journal of Advanced Research in Computer and Communication Engineering, 3(9), 8040–8043.Google Scholar
  20. Nath, C., Albaghdadi, M. S., & Jonnalagadda, S. R. (2016). A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One, 11(4), e0153749. Scholar
  21. Nesi, H., Sharpling, G., & Ganobcsik-Williams, L. (2004). Student papers across the curriculum: Designing and developing a corpus of British student writing. Computers and Composition, 21(4), 439–450. Scholar
  22. Nesi, P., Pantaleo, G., & Sanesi, G. (2015). A Hadoop-based platform for natural language processing of web pages and documents. Journal of Visual Languages & Computing, 31, 130–138. Scholar
  23. Ollagnier-Beldame, M., Brassac, C., & Mille, A. (2014). Traces and activity: A case study of a joint writing process mediated by a digital environment. Behaviour & Information Technology, 33(9, SI), 954–967. Scholar
  24. Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2011). Large-scale cross-document coreference using distributed inference and hierarchical models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 793–803). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from Scholar
  25. Southavilay, V., Yacef, K., & Calvo, R. A. (2009). WriteProc: A framework for exploring collaborative writing processes. In ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium (pp. 129–136). Retrieved from Scholar
  26. Torkildsen, J. v. K., Morken, F., Helland, W. A., & Helland, T. (2016). The dynamics of narrative writing in primary grade children: Writing process factors predict story quality. Reading and Writing, 29(3), 529–554. Scholar
  27. Turner, V., Gantz, J. F., Reinsel, D., & Minton, S. (2014). The digital universe of opportunities: Rich data and the increasing value of the Internet of Things. In IDC Analyze the future.Google Scholar
  28. Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of Pragmatics, 35(6), 829–853. Scholar
  29. Wei, C.-H., Leaman, R., & Lu, Z. (2016). Beyond accuracy: Creating interoperable and scalable text-mining web services. Bioinformatics, 32(12), 1907–1910. Scholar
  30. Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical text mining on big data framework. PLoS One, 11(9), e0162721. Scholar
  31. Yim, S., & Warschauer, M. (2017). Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Language Learning and Technology, 21(1), 146–165. Retrieved from Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • David Boulanger
    • 1
  • Clayton Clemens
    • 1
  • Jeremie Seanosky
    • 1
  • Shawn Fraser
    • 1
  • Vivekanandan Kumar
    • 1
    Email author
  1. 1.Athabasca UniversityEdmontonCanada

Personalised recommendations