Abstract
Capturing and analyzing just the final submission of a writing assignment ignores a substantial amount of information, providing only a partial view of the writer’s effort and intent. Such a partial view of writing abilities limits opportunities for the generation of feedback to improve the final writing product as well as to aid in the development of effective writing techniques. Over-the-shoulder monitoring of the writing process for only a few individuals proves to be a challenge, while scaling specialized tutoring to as many writers as possible is simply impossible without leveraging technology. This research analyzes the computational requirements of a single-threaded writing analytics system for real-time monitoring and instructional intervention of writing processes. This chapter reports on the performance of this analytics system using the simulated writing processes of 391 compositions in higher education, a subset of the British Academic Written English (BAWE) corpus. It elaborates on computational requirements of analytics elements involving Natural Language Processing (NLP) and offers recommendations for building scalable big data NLP pipelines adapted to the analysis of academic writing process of learners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
PubMed. Retrieved August 17, 2017, from https://www.ncbi.nlm.nih.gov/pubmed/
- 2.
Sharpling, G. (2016). BAWE (British Academic Written English) and BAWE Plus Collections. Retrieved August 21, 2017, from http://www2.warwick.ac.uk/fac/soc/al/research/collections/bawe/
References
Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big data for natural language processing: A streaming approach. Knowledge-Based Systems, 79, 36–42. https://doi.org/10.1016/j.knosys.2014.11.007
Alsop, S., & Nesi, H. (2009). Issues in the development of the British Academic Written English (BAWE) corpus. Corpora, 4(1), 71–83.
Alvarez-Fernandez, M.-L., & Garcia-Sanchez, J.-N. (2015). The orchestration of processes in relation to the product, and the role of psychological variables in written composition. Anales de Psicologia, 31(1), 96–108. https://doi.org/10.6018/analesps.31.1.169621
Boulanger, D., Seanosky, J., Clemens, C., Kumar, V., & Kinshuk. (2016). SCALE: A smart competence analytics solution for English writing. In Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT) (pp. 468–472). Washington, DC: IEEE. https://doi.org/10.1109/ICALT.2016.108
Clemens, C. (2017). A causal model of writing competence (Doctoral dissertation, Athabasca University, 2017). Retrieved from https://dt.athabascau.ca/jspui/handle/10791/233
Clemens, C., Kumar, V., Boulanger, D., Seanosky, J., & Kinshuk. (2018). Learning traces, competence assessment, and causal inference for English composition. In Frontiers of cyberlearning (pp. 49–67). Singapore: Springer.
Cureton, E. E. (1968). Rank-biserial correlation when ties are present. Educational and Psychological Measurement, 28(1), 77–79.
Franklin, S. V., & Hermsen, L. M. (2014). Real-time capture of student reasoning while writing. Physical Review Special Topics-Physics Education Research, 10(2), 020121. https://doi.org/10.1103/PhysRevSTPER.10.020121
Freiman, M. (2015). The art of drafting and revision: Extended mind in creative writing. New Writing – The International Journal for the Practice and Theory of Creative Writing, 12(1), 48–66. https://doi.org/10.1080/14790726.2014.977797
Fuchs, S., & Krivokapic, J. (2016). Prosodic boundaries in writing: Evidence from a keystroke analysis. Frontiers in Psychology, 7, 1678. https://doi.org/10.3389/fpsyg.2016.01678
Garcia, J.-N., & Fidalgo, R. (2008). Orchestration of writing processes and writing products: A comparison of sixth-grade students with and without learning disabilities. Learning Disabilities: A Contemporary Journal, 6(2), 77–98.
Glass, G. V. (1966). Note on rank biserial correlation. Educational and Psychological Measurement, 26(3), 623–631. https://doi.org/10.1177/001316446602600307
Goyal, A., Singh, A., Bhargava, S., Crawl, D., Altintas, I., & Hsu, C.-N. (2016). Natural language processing using Kepler workflow system: First steps. Procedia Computer Science, 80, 712–721. https://doi.org/10.1016/j.procs.2016.05.358
Heuboeck, A., Holmes, J., & Nesi, H. (2010). The BAWE corpus manual. Reading: University of Reading.
Kaggal, V. C., Elayavilli, R. K., Mehrabi, S., Pankratz, J. J., Sohn, S., Wang, Y., … Liu, H. (2016). Toward a learning health-care system – Knowledge delivery at the point of care empowered by big data and NLP. Biomedical Informatics Insights, 8(Suppl. 1), 13–22. https://doi.org/10.4137/BII.S37977
Kumar, V., Fraser, S. N., & Boulanger, D. (2017). Discovering the predictive power of five baseline writing competences. Journal of Writing Analytics, 1(1), 176–226.
Lewkow, N., Feild, J., Zimmerman, N., Riedesel, M., Essa, A., Boulanger, D., … Kode, S. (2016). A scalable learning analytics platform for automated writing feedback. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (pp. 109–112). New York, NY: ACM. https://doi.org/10.1145/2876034.2893380
McCreadie, R., Macdonald, C., Ounis, I., Osborne, M., & Petrovic, S. (2013). Scalable distributed event detection for Twitter. In 2013 IEEE International Conference on Big Data (pp. 543–549). Washington, DC: IEEE. https://doi.org/10.1109/BigData.2013.6691620
Monali, P., & Sandip, K. (2014). A concise survey on text data mining. International Journal of Advanced Research in Computer and Communication Engineering, 3(9), 8040–8043.
Nath, C., Albaghdadi, M. S., & Jonnalagadda, S. R. (2016). A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One, 11(4), e0153749. https://doi.org/10.1371/journal.pone.0153749
Nesi, H., Sharpling, G., & Ganobcsik-Williams, L. (2004). Student papers across the curriculum: Designing and developing a corpus of British student writing. Computers and Composition, 21(4), 439–450. https://doi.org/10.1016/j.compcom.2004.08.003
Nesi, P., Pantaleo, G., & Sanesi, G. (2015). A Hadoop-based platform for natural language processing of web pages and documents. Journal of Visual Languages & Computing, 31, 130–138. https://doi.org/10.1016/j.jvlc.2015.10.017
Ollagnier-Beldame, M., Brassac, C., & Mille, A. (2014). Traces and activity: A case study of a joint writing process mediated by a digital environment. Behaviour & Information Technology, 33(9, SI), 954–967. https://doi.org/10.1080/0144929X.2013.819528
Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2011). Large-scale cross-document coreference using distributed inference and hierarchical models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 793–803). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2002472.2002573
Southavilay, V., Yacef, K., & Calvo, R. A. (2009). WriteProc: A framework for exploring collaborative writing processes. In ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium (pp. 129–136). Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84864874796&partnerID=40&md5=48c193aa1c55dd7706e33a903f9914c6
Torkildsen, J. v. K., Morken, F., Helland, W. A., & Helland, T. (2016). The dynamics of narrative writing in primary grade children: Writing process factors predict story quality. Reading and Writing, 29(3), 529–554. https://doi.org/10.1007/s11145-015-9618-4
Turner, V., Gantz, J. F., Reinsel, D., & Minton, S. (2014). The digital universe of opportunities: Rich data and the increasing value of the Internet of Things. In IDC Analyze the future.
Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of Pragmatics, 35(6), 829–853. https://doi.org/10.1016/S0378-2166(02)00121-2
Wei, C.-H., Leaman, R., & Lu, Z. (2016). Beyond accuracy: Creating interoperable and scalable text-mining web services. Bioinformatics, 32(12), 1907–1910. https://doi.org/10.1093/bioinformatics/btv760
Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical text mining on big data framework. PLoS One, 11(9), e0162721. https://doi.org/10.1371/journal.pone.0162721
Yim, S., & Warschauer, M. (2017). Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Language Learning and Technology, 21(1), 146–165. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85013173145&partnerID=40&md5=0cbf14349550945a59a8fbc50f28677e
Acknowledgements
The authors gratefully acknowledge NSERC funding for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Boulanger, D., Clemens, C., Seanosky, J., Fraser, S., Kumar, V. (2019). Performance Analysis of a Serial Natural Language Processing Pipeline for Scaling Analytics of Academic Writing Process. In: Sampson, D., Spector, J.M., Ifenthaler, D., Isaías, P., Sergis, S. (eds) Learning Technologies for Transforming Large-Scale Teaching, Learning, and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-15130-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-15130-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15129-4
Online ISBN: 978-3-030-15130-0
eBook Packages: EducationEducation (R0)