Advertisement

A Sliding-Window Algorithm Implementation in MapReduce

  • Emad A. Mohammed
  • Christopher T. Naugler
  • Behrouz H. FarEmail author
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

A limited resource processing platform may not be suited to process a large volume of data. The distributed processing platforms can solve this problem by incorporating commodity hardware collaboratively to process a large volume of data. The MapReduce programming framework is one candidate framework for large-scale processing, and Hadoop is its open-source implementation. This framework consists of the Hadoop Distributed File System and the MapReduce for computation capabilities. However, the MapReduce framework does not allow for data sharing for computation among the computing nodes. In this paper, we present an implementation of a sliding-window algorithm for data sharing for computation dependency in MapReduce. The algorithm is designed to facilitate the data processing a sequential order, e.g., moving average. The algorithm utilizes the MapReduce job metadata, e.g., input split size, to prepare the shared data between the computing nodes without violating the MapReduce fault tolerance handling mechanism.

Keywords

MapReduce Hadoop Data sharing Moving average Sequential algorithm 

Notes

Acknowledgments

This work is supported and funded by Alberta Innovates Technology Futures (AITF), Calgary, AB, Canada. The authors would like to thank Alberta Health Services (AHS) and Calgary Laboratory Services (CLS), Calgary, Alberta, Canada, for endless logistics support.

References

  1. 1.
    Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113.CrossRefGoogle Scholar
  2. 2.
    Ekanayake, J., et al. (2008). Mapreduce for data intensive scientific analyses. In Proceedings of the 2008 Fourth IEEE International Conference on eScience, eScience ’08. pp. 277–284.Google Scholar
  3. 3.
    Apache Hadoop. (2015). Retrieved 19 Dec 2015, from, https://hadoop.apache.org/.
  4. 4.
    Shvachko, K., et al. (2010). The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). pp. 1–10.Google Scholar
  5. 5.
    Ma, Z., & Gu, L. (2010). The limitation of MapReduce: A probing case and a lightweight solution. In Proceedings of the 1st International Conference on Cloud Computing, GRIDs, and virtualization. pp. 68–73.Google Scholar
  6. 6.
    Elteir, M., et al. (2010) Enhancing Mapreduce via asynchronous data processing. In IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS). pp. 397–405.Google Scholar
  7. 7.
    Bröder, A., & Gaissmaier, W. (2007). Sequential processing of cues in memory-based multiattribute decisions. Psychonomic Bulletin & Review, 14, 895–900.CrossRefGoogle Scholar
  8. 8.
    Datar, M., & Motwani, R. (2007). The sliding-window computation model and results. In Data streams (pp. 149–167). New York, NY: Springer.CrossRefGoogle Scholar
  9. 9.
    Olson, M. (2010). Hadoop: Scalable, flexible data storage and analysis. In IQT quarterly (Vol. 1, pp. 14–18). New York, NY: Springer.Google Scholar
  10. 10.
    Yu, Y., et al. (2008). DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (pp. 1–14). Berkeley, CA: USENIX Association.Google Scholar
  11. 11.
    Yang, H.-C., et al. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. pp. 1029–1040.Google Scholar
  12. 12.
    Li, L., et al. (2014). Rolling window time series prediction using MapReduce. In 2014 IEEE 15th International Conference on Information Reuse and Integration (IRI). pp. 757–764.Google Scholar
  13. 13.
    Dudek, A. E., et al. (2014). A generalized block bootstrap for seasonal time series. Journal of Time Series Analysis, 35, 89–114.CrossRefGoogle Scholar
  14. 14.
    Hu, Z., et al. (2002). An accumulative parallel skeleton for all. In Programming languages and systems (pp. 83–97). Heidelberg: Springer.CrossRefGoogle Scholar
  15. 15.
    Liu, Y., et al. (2014). Accumulative computation on MapReduce. IPSJ Online Transactions, 7, 33–42.CrossRefGoogle Scholar
  16. 16.
    Burgstahler, L., Neubauer, M. (2002). New modifications of the exponential moving average algorithm for bandwidth estimation. In Proceedings of the 15th ITC Specialist Seminar.Google Scholar
  17. 17.
    White, T. (2012). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media, Inc..Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Emad A. Mohammed
    • 1
  • Christopher T. Naugler
    • 2
  • Behrouz H. Far
    • 3
    Email author
  1. 1.Department of Software Engineering, Faculty of EngineeringLakehead UniversityThunder BayCanada
  2. 2.Departments of Pathology and Laboratory Medicine and Family MedicineUniversity of Calgary and Calgary Laboratory ServicesCalgaryCanada
  3. 3.Department of Electrical and Computer Engineering, Schulich School of EngineeringUniversity of CalgaryCalgaryCanada

Personalised recommendations