Abstract
This chapter is an introduction to parallel processing with education data. As the amount of education data continues to grow, new methods for processing this data efficiently are required. This chapter gives a history of popular parallel computing frameworks and discusses problem types that are easily mapped to these frameworks. Following that, an example machine-learning problem is described and a single-threaded and parallel pipeline using Apache Spark are compared. We hope this information can be used by other practitioners looking to utilize Apache Spark to expand their models to include more students and more data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ghemawat, S., Gobioff, H., & Leung, S. (2003). The Google file system. In ACM SIGOPS operating systems review (Vol. 37, No. 5). ACM.
O’Malley, O. (2008). Tera2byte sort on Apache Hadoop. http://sortbenchmark.org. Cited October 31, 2017.
Pappano, L. (2012). The year of the MOOC. The New York Times. http://www.nytimes.com/2012/11/04/education/edlife/massive-open-online-courses-are-multiplying-at-a-rapid-pace.html. Cited October 31, 2017.
Shah, D. (2016). By the numbers: MOOCS in 2016. In Class central. https://www.class-central.com/report/mooc-stats-2016. Cited October 31, 2017.
Xin, R. (2014). Apache spark the fastest open source engine for sorting a petabyte. In Databricks engineering blog. https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html. Cited October 31, 2017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lewkow, N., Feild, J. (2018). Using Apache Spark for Modeling Student Behavior at Scale. In: Spector, J., et al. Frontiers of Cyberlearning. Lecture Notes in Educational Technology. Springer, Singapore. https://doi.org/10.1007/978-981-13-0650-1_9
Download citation
DOI: https://doi.org/10.1007/978-981-13-0650-1_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0649-5
Online ISBN: 978-981-13-0650-1
eBook Packages: EducationEducation (R0)