Advertisement

Real-time intelligent big data processing: technology, platform, and applications

  • Tongya Zheng
  • Gang Chen
  • Xinyu Wang
  • Chun Chen
  • Xingen WangEmail author
  • Sihui Luo
Research Paper
  • 17 Downloads

Abstract

Human beings keep exploring the physical space using information means. Only recently, with the rapid development of information technologies and the increasing accumulation of data, human beings can learn more about the unknown world with data-driven methods. Given data timeliness, there is a growing awareness of the importance of real-time data. There are two categories of technologies accounting for data processing: batching big data and streaming processing, which have not been integrated well. Thus, we propose an innovative incremental processing technology named after Stream Cube to process both big data and stream data. Also, we implement a real-time intelligent data processing system, which is based on real-time acquisition, real-time processing, real-time analysis, and real-time decision-making. The real-time intelligent data processing technology system is equipped with a batching big data platform, data analysis tools, and machine learning models. Based on our applications and analysis, the real-time intelligent data processing system is a crucial solution to the problems of the national society and economy.

Keywords

batching big data streaming processing technology real-time data processing incremental computation intelligent data processing system 

References

  1. 1.
    Pan Y. Heading toward artificial intelligence 2.0. Engineering, 2016, 2: 409–413CrossRefGoogle Scholar
  2. 2.
    Chen C. Real-time processing technology, platform and application of streaming big data. Big Data, 2017, 3: 1–8Google Scholar
  3. 3.
    Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system. In: Proceedings of Mass Storage Systems and Technologies (MSST), 2010. 1–10Google Scholar
  4. 4.
    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107–113CrossRefGoogle Scholar
  5. 5.
    Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. HotCloud, 2010, 10: 95Google Scholar
  6. 6.
    Zhang Q, Cheng L, Boutaba R. Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl, 2010, 1: 7–18CrossRefGoogle Scholar
  7. 7.
    Hashem I A T, Yaqoob I, Anuar N B, et al. The rise of “big data” on cloud computing: review and open research issues. Inf Syst, 2015, 47: 98–115CrossRefGoogle Scholar
  8. 8.
    Wu Q, Ishikawa F, Zhu Q, et al. Deadline-constrained cost optimization approaches for workflow scheduling in clouds. IEEE Trans Parallel Distrib Syst, 2017, 28: 3401–3412CrossRefGoogle Scholar
  9. 9.
    Saha B, Shah H, Seth S, et al. Apache tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015. 1357–1369Google Scholar
  10. 10.
    Maarala A I, Rautiainen M, Salmi M, et al. Low latency analytics for streaming traffic data with Apache Spark. In: Proceedings of IEEE International Conference on Big Data (Big Data), 2015. 2855–2858Google Scholar
  11. 11.
    Toshniwal A, Taneja S, Shukla A, et al. Storni@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014. 147–156Google Scholar
  12. 12.
    Carbone P, Katsifodimos A, Ewen S, et al. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Committee Data Eng, 2015, 36: 4Google Scholar
  13. 13.
    Zaharia M, Das T, Li H, et al. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud, 2012, 12: 10Google Scholar
  14. 14.
    Zhao X, Garg S, Queiroz C, et al. A taxonomy and survey of stream processing systems. In: Proceedings of Software Architecture for Big Data and the Cloud, 2017. 183–206Google Scholar
  15. 15.
    Ali M. An introduction to microsoft SQL server streaminsight. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010. 66Google Scholar
  16. 16.
    Hyde J. Data in flight. Commun ACM, 2010, 53: 48–52CrossRefGoogle Scholar
  17. 17.
    Demers A J, Gehrke J, Panda B, et al. Cayuga: a general purpose event monitoring system. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, Asilomar, 2007. 7: 412–422Google Scholar
  18. 18.
    Strohbach M, Ziekow H, Gazis V, et al. Towards a big data analytics framework for IoT and smart city applications. In: Proceedings of Modeling and Processing for Next-generation Big-data Technologies, 2015. 257–282Google Scholar
  19. 19.
    Noghabi S A, Paramasivam K, Pan Y, et al. Samza: stateful scalable stream processing at LinkedIn. Proc VLDB Endow, 2017, 10: 1634–1645CrossRefGoogle Scholar
  20. 20.
    Chauhan J, Chowdhury S A, Makaroff D. Performance evaluation of Yahoo! S4: a first look. In: Proceedings of the 7th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2012. 58–65Google Scholar
  21. 21.
    Fernandez R C, Pietzuch P R, Kreps J, et al. Liquid: unifying nearline and offline big data integration. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, 2015Google Scholar
  22. 22.
    Pacaci A, Ozsu M T. Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, 2018. 6Google Scholar
  23. 23.
    Jin H, Chen F, Wu S, et al. Towards low-latency batched stream processing by pre-scheduling. IEEE Trans Parallel Distrib Syst, 2019, 30: 710–722CrossRefGoogle Scholar
  24. 24.
    Venkataraman S, Panda A, Ousterhout K, et al. Drizzle: fast and adaptable stream processing at scale. In: Proceedings of the 26th Symposium on Operating Systems Principles, 2017. 374–389Google Scholar
  25. 25.
    Zhang B, Jin X, Ratnasamy S, et al. Awstream: adaptive wide-area streaming analytics. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018. 236–252Google Scholar
  26. 26.
    Li W X, Niu D, Liu Y N, et al. Wide-area spark streaming: automated routing and batch sizing. IEEE Trans Parall Distributed Syst, 2019, 30: 1434–1448CrossRefGoogle Scholar
  27. 27.
    Traub J, Grulich P M, Cuellar A R, et al. Scotty: efficient window aggregation for out-of-order stream processing. In: Proceedings of 2018 IEEE 34th International Conference on Data Engineering, 2018. 1300–1303Google Scholar
  28. 28.
    Srinivasan V, Bulkowski B, Chu W L, et al. Aerospike. Proc VLDB Endow, 2016, 9: 1389–1400CrossRefGoogle Scholar
  29. 29.
    Carlson J L. Redis in Action. New York: Manning Publications Co., 2013Google Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Tongya Zheng
    • 1
  • Gang Chen
    • 1
  • Xinyu Wang
    • 1
  • Chun Chen
    • 1
  • Xingen Wang
    • 1
    Email author
  • Sihui Luo
    • 1
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations