Performance Analysis of a Serial Natural Language Processing Pipeline for Scaling Analytics of Academic Writing Process

Boulanger, David; Clemens, Clayton; Seanosky, Jeremie; Fraser, Shawn; Kumar, Vivekanandan

doi:10.1007/978-3-030-15130-0_8

David Boulanger⁷,
Clayton Clemens⁷,
Jeremie Seanosky⁷,
Shawn Fraser⁷ &
…
Vivekanandan Kumar⁷

729 Accesses
1 Citations

Abstract

Capturing and analyzing just the final submission of a writing assignment ignores a substantial amount of information, providing only a partial view of the writer’s effort and intent. Such a partial view of writing abilities limits opportunities for the generation of feedback to improve the final writing product as well as to aid in the development of effective writing techniques. Over-the-shoulder monitoring of the writing process for only a few individuals proves to be a challenge, while scaling specialized tutoring to as many writers as possible is simply impossible without leveraging technology. This research analyzes the computational requirements of a single-threaded writing analytics system for real-time monitoring and instructional intervention of writing processes. This chapter reports on the performance of this analytics system using the simulated writing processes of 391 compositions in higher education, a subset of the British Academic Written English (BAWE) corpus. It elaborates on computational requirements of analytics elements involving Natural Language Processing (NLP) and offers recommendations for building scalable big data NLP pipelines adapted to the analysis of academic writing process of learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
PubMed. Retrieved August 17, 2017, from https://www.ncbi.nlm.nih.gov/pubmed/
2.
Sharpling, G. (2016). BAWE (British Academic Written English) and BAWE Plus Collections. Retrieved August 21, 2017, from http://www2.warwick.ac.uk/fac/soc/al/research/collections/bawe/

References

Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big data for natural language processing: A streaming approach. Knowledge-Based Systems, 79, 36–42. https://doi.org/10.1016/j.knosys.2014.11.007
Article Google Scholar
Alsop, S., & Nesi, H. (2009). Issues in the development of the British Academic Written English (BAWE) corpus. Corpora, 4(1), 71–83.
Article Google Scholar
Alvarez-Fernandez, M.-L., & Garcia-Sanchez, J.-N. (2015). The orchestration of processes in relation to the product, and the role of psychological variables in written composition. Anales de Psicologia, 31(1), 96–108. https://doi.org/10.6018/analesps.31.1.169621
Article Google Scholar
Boulanger, D., Seanosky, J., Clemens, C., Kumar, V., & Kinshuk. (2016). SCALE: A smart competence analytics solution for English writing. In Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT) (pp. 468–472). Washington, DC: IEEE. https://doi.org/10.1109/ICALT.2016.108
Chapter Google Scholar
Clemens, C. (2017). A causal model of writing competence (Doctoral dissertation, Athabasca University, 2017). Retrieved from https://dt.athabascau.ca/jspui/handle/10791/233
Clemens, C., Kumar, V., Boulanger, D., Seanosky, J., & Kinshuk. (2018). Learning traces, competence assessment, and causal inference for English composition. In Frontiers of cyberlearning (pp. 49–67). Singapore: Springer.
Google Scholar
Cureton, E. E. (1968). Rank-biserial correlation when ties are present. Educational and Psychological Measurement, 28(1), 77–79.
Article Google Scholar
Franklin, S. V., & Hermsen, L. M. (2014). Real-time capture of student reasoning while writing. Physical Review Special Topics-Physics Education Research, 10(2), 020121. https://doi.org/10.1103/PhysRevSTPER.10.020121
Article Google Scholar
Freiman, M. (2015). The art of drafting and revision: Extended mind in creative writing. New Writing – The International Journal for the Practice and Theory of Creative Writing, 12(1), 48–66. https://doi.org/10.1080/14790726.2014.977797
Article Google Scholar
Fuchs, S., & Krivokapic, J. (2016). Prosodic boundaries in writing: Evidence from a keystroke analysis. Frontiers in Psychology, 7, 1678. https://doi.org/10.3389/fpsyg.2016.01678
Article Google Scholar
Garcia, J.-N., & Fidalgo, R. (2008). Orchestration of writing processes and writing products: A comparison of sixth-grade students with and without learning disabilities. Learning Disabilities: A Contemporary Journal, 6(2), 77–98.
Google Scholar
Glass, G. V. (1966). Note on rank biserial correlation. Educational and Psychological Measurement, 26(3), 623–631. https://doi.org/10.1177/001316446602600307
Article Google Scholar
Goyal, A., Singh, A., Bhargava, S., Crawl, D., Altintas, I., & Hsu, C.-N. (2016). Natural language processing using Kepler workflow system: First steps. Procedia Computer Science, 80, 712–721. https://doi.org/10.1016/j.procs.2016.05.358
Article Google Scholar
Heuboeck, A., Holmes, J., & Nesi, H. (2010). The BAWE corpus manual. Reading: University of Reading.
Google Scholar
Kaggal, V. C., Elayavilli, R. K., Mehrabi, S., Pankratz, J. J., Sohn, S., Wang, Y., … Liu, H. (2016). Toward a learning health-care system – Knowledge delivery at the point of care empowered by big data and NLP. Biomedical Informatics Insights, 8(Suppl. 1), 13–22. https://doi.org/10.4137/BII.S37977
Article Google Scholar
Kumar, V., Fraser, S. N., & Boulanger, D. (2017). Discovering the predictive power of five baseline writing competences. Journal of Writing Analytics, 1(1), 176–226.
Article Google Scholar
Lewkow, N., Feild, J., Zimmerman, N., Riedesel, M., Essa, A., Boulanger, D., … Kode, S. (2016). A scalable learning analytics platform for automated writing feedback. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (pp. 109–112). New York, NY: ACM. https://doi.org/10.1145/2876034.2893380
Chapter Google Scholar
McCreadie, R., Macdonald, C., Ounis, I., Osborne, M., & Petrovic, S. (2013). Scalable distributed event detection for Twitter. In 2013 IEEE International Conference on Big Data (pp. 543–549). Washington, DC: IEEE. https://doi.org/10.1109/BigData.2013.6691620
Chapter Google Scholar
Monali, P., & Sandip, K. (2014). A concise survey on text data mining. International Journal of Advanced Research in Computer and Communication Engineering, 3(9), 8040–8043.
Google Scholar
Nath, C., Albaghdadi, M. S., & Jonnalagadda, S. R. (2016). A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One, 11(4), e0153749. https://doi.org/10.1371/journal.pone.0153749
Article Google Scholar
Nesi, H., Sharpling, G., & Ganobcsik-Williams, L. (2004). Student papers across the curriculum: Designing and developing a corpus of British student writing. Computers and Composition, 21(4), 439–450. https://doi.org/10.1016/j.compcom.2004.08.003
Article Google Scholar
Nesi, P., Pantaleo, G., & Sanesi, G. (2015). A Hadoop-based platform for natural language processing of web pages and documents. Journal of Visual Languages & Computing, 31, 130–138. https://doi.org/10.1016/j.jvlc.2015.10.017
Article Google Scholar
Ollagnier-Beldame, M., Brassac, C., & Mille, A. (2014). Traces and activity: A case study of a joint writing process mediated by a digital environment. Behaviour & Information Technology, 33(9, SI), 954–967. https://doi.org/10.1080/0144929X.2013.819528
Article Google Scholar
Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2011). Large-scale cross-document coreference using distributed inference and hierarchical models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 793–803). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2002472.2002573
Google Scholar
Southavilay, V., Yacef, K., & Calvo, R. A. (2009). WriteProc: A framework for exploring collaborative writing processes. In ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium (pp. 129–136). Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84864874796&partnerID=40&md5=48c193aa1c55dd7706e33a903f9914c6
Google Scholar
Torkildsen, J. v. K., Morken, F., Helland, W. A., & Helland, T. (2016). The dynamics of narrative writing in primary grade children: Writing process factors predict story quality. Reading and Writing, 29(3), 529–554. https://doi.org/10.1007/s11145-015-9618-4
Article Google Scholar
Turner, V., Gantz, J. F., Reinsel, D., & Minton, S. (2014). The digital universe of opportunities: Rich data and the increasing value of the Internet of Things. In IDC Analyze the future.
Google Scholar
Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of Pragmatics, 35(6), 829–853. https://doi.org/10.1016/S0378-2166(02)00121-2
Article Google Scholar
Wei, C.-H., Leaman, R., & Lu, Z. (2016). Beyond accuracy: Creating interoperable and scalable text-mining web services. Bioinformatics, 32(12), 1907–1910. https://doi.org/10.1093/bioinformatics/btv760
Article Google Scholar
Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical text mining on big data framework. PLoS One, 11(9), e0162721. https://doi.org/10.1371/journal.pone.0162721
Article Google Scholar
Yim, S., & Warschauer, M. (2017). Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Language Learning and Technology, 21(1), 146–165. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85013173145&partnerID=40&md5=0cbf14349550945a59a8fbc50f28677e
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge NSERC funding for this research.

Author information

Authors and Affiliations

Athabasca University, Edmonton, AB, Canada
David Boulanger, Clayton Clemens, Jeremie Seanosky, Shawn Fraser & Vivekanandan Kumar

Authors

David Boulanger
View author publications
You can also search for this author in PubMed Google Scholar
Clayton Clemens
View author publications
You can also search for this author in PubMed Google Scholar
Jeremie Seanosky
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Fraser
View author publications
You can also search for this author in PubMed Google Scholar
Vivekanandan Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vivekanandan Kumar .

Editor information

Editors and Affiliations

Department of Digital Systems, University of Piraeus, Piraeus, Greece
Demetrios Sampson & Stylianos Sergis &
School of Education, Curtin University, Perth, WA, Australia
Demetrios Sampson
Department of Learning Technologies, University of North Texas, Denton, TX, USA
J. Michael Spector
Curtin University, Perth, WA, Australia
Dirk Ifenthaler
Economic and Business Education Learning, Design and Technology, University of Mannheim, Mannheim, Germany
Dirk Ifenthaler
Institute for Teaching & Learning Innovation (ITaLI), The University of Queensland, St. Lucia, QLD, Australia
Pedro Isaías

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boulanger, D., Clemens, C., Seanosky, J., Fraser, S., Kumar, V. (2019). Performance Analysis of a Serial Natural Language Processing Pipeline for Scaling Analytics of Academic Writing Process. In: Sampson, D., Spector, J.M., Ifenthaler, D., Isaías, P., Sergis, S. (eds) Learning Technologies for Transforming Large-Scale Teaching, Learning, and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-15130-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-15130-0_8
Published: 25 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15129-4
Online ISBN: 978-3-030-15130-0
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics