Advertisement

Performance Analysis of a Serial Natural Language Processing Pipeline for Scaling Analytics of Academic Writing Process

  • David Boulanger
  • Clayton Clemens
  • Jeremie Seanosky
  • Shawn Fraser
  • Vivekanandan KumarEmail author
Chapter

Abstract

Capturing and analyzing just the final submission of a writing assignment ignores a substantial amount of information, providing only a partial view of the writer’s effort and intent. Such a partial view of writing abilities limits opportunities for the generation of feedback to improve the final writing product as well as to aid in the development of effective writing techniques. Over-the-shoulder monitoring of the writing process for only a few individuals proves to be a challenge, while scaling specialized tutoring to as many writers as possible is simply impossible without leveraging technology. This research analyzes the computational requirements of a single-threaded writing analytics system for real-time monitoring and instructional intervention of writing processes. This chapter reports on the performance of this analytics system using the simulated writing processes of 391 compositions in higher education, a subset of the British Academic Written English (BAWE) corpus. It elaborates on computational requirements of analytics elements involving Natural Language Processing (NLP) and offers recommendations for building scalable big data NLP pipelines adapted to the analysis of academic writing process of learners.

Keywords

Writing analytics Writing process Scalability Natural language processing Big data Performance 

Notes

Acknowledgements

The authors gratefully acknowledge NSERC funding for this research.

References

  1. Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big data for natural language processing: A streaming approach. Knowledge-Based Systems, 79, 36–42.  https://doi.org/10.1016/j.knosys.2014.11.007CrossRefGoogle Scholar
  2. Alsop, S., & Nesi, H. (2009). Issues in the development of the British Academic Written English (BAWE) corpus. Corpora, 4(1), 71–83.CrossRefGoogle Scholar
  3. Alvarez-Fernandez, M.-L., & Garcia-Sanchez, J.-N. (2015). The orchestration of processes in relation to the product, and the role of psychological variables in written composition. Anales de Psicologia, 31(1), 96–108.  https://doi.org/10.6018/analesps.31.1.169621CrossRefGoogle Scholar
  4. Boulanger, D., Seanosky, J., Clemens, C., Kumar, V., & Kinshuk. (2016). SCALE: A smart competence analytics solution for English writing. In Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT) (pp. 468–472). Washington, DC: IEEE.  https://doi.org/10.1109/ICALT.2016.108CrossRefGoogle Scholar
  5. Clemens, C. (2017). A causal model of writing competence (Doctoral dissertation, Athabasca University, 2017). Retrieved from https://dt.athabascau.ca/jspui/handle/10791/233
  6. Clemens, C., Kumar, V., Boulanger, D., Seanosky, J., & Kinshuk. (2018). Learning traces, competence assessment, and causal inference for English composition. In Frontiers of cyberlearning (pp. 49–67). Singapore: Springer.Google Scholar
  7. Cureton, E. E. (1968). Rank-biserial correlation when ties are present. Educational and Psychological Measurement, 28(1), 77–79.CrossRefGoogle Scholar
  8. Franklin, S. V., & Hermsen, L. M. (2014). Real-time capture of student reasoning while writing. Physical Review Special Topics-Physics Education Research, 10(2), 020121.  https://doi.org/10.1103/PhysRevSTPER.10.020121CrossRefGoogle Scholar
  9. Freiman, M. (2015). The art of drafting and revision: Extended mind in creative writing. New Writing – The International Journal for the Practice and Theory of Creative Writing, 12(1), 48–66.  https://doi.org/10.1080/14790726.2014.977797CrossRefGoogle Scholar
  10. Fuchs, S., & Krivokapic, J. (2016). Prosodic boundaries in writing: Evidence from a keystroke analysis. Frontiers in Psychology, 7, 1678.  https://doi.org/10.3389/fpsyg.2016.01678CrossRefGoogle Scholar
  11. Garcia, J.-N., & Fidalgo, R. (2008). Orchestration of writing processes and writing products: A comparison of sixth-grade students with and without learning disabilities. Learning Disabilities: A Contemporary Journal, 6(2), 77–98.Google Scholar
  12. Glass, G. V. (1966). Note on rank biserial correlation. Educational and Psychological Measurement, 26(3), 623–631.  https://doi.org/10.1177/001316446602600307CrossRefGoogle Scholar
  13. Goyal, A., Singh, A., Bhargava, S., Crawl, D., Altintas, I., & Hsu, C.-N. (2016). Natural language processing using Kepler workflow system: First steps. Procedia Computer Science, 80, 712–721.  https://doi.org/10.1016/j.procs.2016.05.358CrossRefGoogle Scholar
  14. Heuboeck, A., Holmes, J., & Nesi, H. (2010). The BAWE corpus manual. Reading: University of Reading.Google Scholar
  15. Kaggal, V. C., Elayavilli, R. K., Mehrabi, S., Pankratz, J. J., Sohn, S., Wang, Y., … Liu, H. (2016). Toward a learning health-care system – Knowledge delivery at the point of care empowered by big data and NLP. Biomedical Informatics Insights, 8(Suppl. 1), 13–22.  https://doi.org/10.4137/BII.S37977CrossRefGoogle Scholar
  16. Kumar, V., Fraser, S. N., & Boulanger, D. (2017). Discovering the predictive power of five baseline writing competences. Journal of Writing Analytics, 1(1), 176–226.Google Scholar
  17. Lewkow, N., Feild, J., Zimmerman, N., Riedesel, M., Essa, A., Boulanger, D., … Kode, S. (2016). A scalable learning analytics platform for automated writing feedback. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (pp. 109–112). New York, NY: ACM.  https://doi.org/10.1145/2876034.2893380CrossRefGoogle Scholar
  18. McCreadie, R., Macdonald, C., Ounis, I., Osborne, M., & Petrovic, S. (2013). Scalable distributed event detection for Twitter. In 2013 IEEE International Conference on Big Data (pp. 543–549). Washington, DC: IEEE.  https://doi.org/10.1109/BigData.2013.6691620CrossRefGoogle Scholar
  19. Monali, P., & Sandip, K. (2014). A concise survey on text data mining. International Journal of Advanced Research in Computer and Communication Engineering, 3(9), 8040–8043.Google Scholar
  20. Nath, C., Albaghdadi, M. S., & Jonnalagadda, S. R. (2016). A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One, 11(4), e0153749.  https://doi.org/10.1371/journal.pone.0153749CrossRefGoogle Scholar
  21. Nesi, H., Sharpling, G., & Ganobcsik-Williams, L. (2004). Student papers across the curriculum: Designing and developing a corpus of British student writing. Computers and Composition, 21(4), 439–450.  https://doi.org/10.1016/j.compcom.2004.08.003CrossRefGoogle Scholar
  22. Nesi, P., Pantaleo, G., & Sanesi, G. (2015). A Hadoop-based platform for natural language processing of web pages and documents. Journal of Visual Languages & Computing, 31, 130–138.  https://doi.org/10.1016/j.jvlc.2015.10.017CrossRefGoogle Scholar
  23. Ollagnier-Beldame, M., Brassac, C., & Mille, A. (2014). Traces and activity: A case study of a joint writing process mediated by a digital environment. Behaviour & Information Technology, 33(9, SI), 954–967.  https://doi.org/10.1080/0144929X.2013.819528CrossRefGoogle Scholar
  24. Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2011). Large-scale cross-document coreference using distributed inference and hierarchical models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 793–803). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2002472.2002573Google Scholar
  25. Southavilay, V., Yacef, K., & Calvo, R. A. (2009). WriteProc: A framework for exploring collaborative writing processes. In ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium (pp. 129–136). Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84864874796&partnerID=40&md5=48c193aa1c55dd7706e33a903f9914c6Google Scholar
  26. Torkildsen, J. v. K., Morken, F., Helland, W. A., & Helland, T. (2016). The dynamics of narrative writing in primary grade children: Writing process factors predict story quality. Reading and Writing, 29(3), 529–554.  https://doi.org/10.1007/s11145-015-9618-4CrossRefGoogle Scholar
  27. Turner, V., Gantz, J. F., Reinsel, D., & Minton, S. (2014). The digital universe of opportunities: Rich data and the increasing value of the Internet of Things. In IDC Analyze the future.Google Scholar
  28. Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of Pragmatics, 35(6), 829–853.  https://doi.org/10.1016/S0378-2166(02)00121-2CrossRefGoogle Scholar
  29. Wei, C.-H., Leaman, R., & Lu, Z. (2016). Beyond accuracy: Creating interoperable and scalable text-mining web services. Bioinformatics, 32(12), 1907–1910.  https://doi.org/10.1093/bioinformatics/btv760CrossRefGoogle Scholar
  30. Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical text mining on big data framework. PLoS One, 11(9), e0162721.  https://doi.org/10.1371/journal.pone.0162721CrossRefGoogle Scholar
  31. Yim, S., & Warschauer, M. (2017). Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Language Learning and Technology, 21(1), 146–165. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85013173145&partnerID=40&md5=0cbf14349550945a59a8fbc50f28677eGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • David Boulanger
    • 1
  • Clayton Clemens
    • 1
  • Jeremie Seanosky
    • 1
  • Shawn Fraser
    • 1
  • Vivekanandan Kumar
    • 1
    Email author
  1. 1.Athabasca UniversityEdmontonCanada

Personalised recommendations