MapReduce-Based Data Stream Processing over Large History Data

Qi, Kaiyuan; Zhao, Zhuofeng; Fang, Jun; Han, Yanbo

doi:10.1007/978-3-642-34321-6_57

Kaiyuan Qi^20,21,
Zhuofeng Zhao²⁰,
Jun Fang²⁰ &
…
Yanbo Han²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7636))

Included in the following conference series:

International Conference on Service-Oriented Computing

2357 Accesses
2 Citations

Abstract

With the development of Internet of Things applications based on sensor data, how to process high speed data stream over large scale history data brings a new challenge. This paper proposes a new programming model RTMR, which improves the real-time capability of traditional batch processing based MapReduce by preprocessing and caching, along with pipelining and localizing. Furthermore, to adapt the topologies to application characteristics and cluster environments, a model analysis based RTMR cluster constructing method is proposed. The benchmark built on the urban vehicle monitoring system shows RTMR can provide the real-time capability and scalability for data stream processing over large scale data.

Download to read the full chapter text

Chapter PDF

Distributed processing of big mobility data as spatio-temporal data streams

Article 21 July 2016

MapReduce Frame Work: Investigating Suitability for Faster Data Analytics

Optimizing the Performance of Concurrent RDF Stream Processing Queries

Keywords

References

Motwani, R., Widom, J., Arasu, A., et al.: Query processing, resource management, and approximation in a data stream management system. In: 1st Biennial Conference on Innovative Data Systems Research, pp. 176–187. ACM Press, New York (2003)
Google Scholar
Abadi, D.J., Ahmad, Y., Balazinska, M., et al.: The design of the Borealis stream processing engine. In: 2nd Biennial Conference on Innovative Data Systems Research, pp. 277–289. ACM Press, New York (2005)
Google Scholar
Jin, C.Q., Qian, W.N., Zhou, A.Y.: Analysis and management of streaming data: A survey. Journal of Software 15(8), 1172–1181 (2004)
MATH Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. ACM Communication 51(1), 107–113 (2008)
Article Google Scholar
Ranger, C., Raghuraman, R., Penmetsa, A., et al.: Evaluating MapReduce for multi-core and multiprocessor systems. In: 13th International Conference on High Performance Computer Architecture, pp. 13–24. IEEE Computer Society, Washington (2007)
Google Scholar
Kaashoek, F., Morris, R., Mao, Y.: Optimizing MapReduce for multicore architectures. Technical Report, MIT Computer Science and Artificial Intelligence Laboratory (2010)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. In: 7th Symposium on Operating Systems Design and Implementation, pp. 205–218. USENIX Association, Berkeley (2006)
Google Scholar
Diao, Z.J., Zheng, H.D., Liu, J.Z., et al.: Operational Research. Higher Education Press, Beijing (2010)
Google Scholar
Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., et al.: Flux: An adaptive partitioning operator for continuous query systems. In: 19th International Conference on Data Engineering, pp. 25–36. IEEE Computer Society, Washington (2003)
Google Scholar
Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: 9th USENIX Symposium on Operating Systems Design and Implementation, pp. 251–264. USENIX Association, Berkeley (2010)
Google Scholar
Ekanayake, J., Li, H., Zhang, B., et al.: Twister: A runtime for iterative MapReduce. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM Press, New York (2010)
Google Scholar
Zaharia, M., Chowdhury, N.M., Franklin, M., et al.: Spark: Cluster competing with working sets. In: 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 1–10. USENIX Association, Berkeley (2010)
Google Scholar
Condie, T., Conway, N., Alvaro, P., et al.: MapReduce online. In: 7th USENIX Symposium on Networked Systems Design and Implementation, pp. 313–328. USENIX Association, Berkeley (2010)
Google Scholar
Neumeyer, L., Robbins, L., Nair, A., et al.: S4: Distributed stream computing platform. In: 10th IEEE International Conference on Data Mining Workshops, pp. 170–177. IEEE Computer Society, Washington (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Cloud Computing Research Center, North China University of Technology, No.5 Jinuanzhuang Road, 100144, Beijing, China
Kaiyuan Qi, Zhuofeng Zhao, Jun Fang & Yanbo Han
Institute of Computing Technology, Chinese Academy of Sciences, No.6 Academy South Road, 100144, Beijing, China
Kaiyuan Qi

Authors

Kaiyuan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zhuofeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yanbo Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of ICT, Swinburne University of Technology, John Street, 3122, Hawthorn, VIC, Australia
Chengfei Liu
IBM Almaden Research Center, 650 Harry Road, 95120, San Jose, CA, USA
Heiko Ludwig
LIMOS - UMR 6158, Blaise Pascal University, Complexe scientifique des Cézeaux, 63177, Aubiere, France
Farouk Toumani
College of Computing and Information Sciences, Rochester Institute of Technology, 1 Lomb Memorial Drive, 14623, Rochester, NY, USA
Qi Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, K., Zhao, Z., Fang, J., Han, Y. (2012). MapReduce-Based Data Stream Processing over Large History Data. In: Liu, C., Ludwig, H., Toumani, F., Yu, Q. (eds) Service-Oriented Computing. ICSOC 2012. Lecture Notes in Computer Science, vol 7636. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34321-6_57

Download citation

DOI: https://doi.org/10.1007/978-3-642-34321-6_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34320-9
Online ISBN: 978-3-642-34321-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MapReduce-Based Data Stream Processing over Large History Data

Abstract

Chapter PDF

Similar content being viewed by others

Distributed processing of big mobility data as spatio-temporal data streams

MapReduce Frame Work: Investigating Suitability for Faster Data Analytics

Optimizing the Performance of Concurrent RDF Stream Processing Queries

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MapReduce-Based Data Stream Processing over Large History Data

Abstract

Chapter PDF

Similar content being viewed by others

Distributed processing of big mobility data as spatio-temporal data streams

MapReduce Frame Work: Investigating Suitability for Faster Data Analytics

Optimizing the Performance of Concurrent RDF Stream Processing Queries

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation