Abstract
Massive amount of data sets are continuously generated from a wide variety of digital services and infrastructures. Examples of those are machine/system logs, retail transaction logs, traffic tracing data and diverse social data coming from different social networks and mobile interactions. Currently, the New York stock exchange produces 1 TB data per day, Google processes 700 PB of data per month and Facebook hosts 10 billion photos taking 1 PB of storage just to mention some cases. Turning these streaming data flow into actionable real-time insights is not a trivial task. The usage of data in real-time can change different aspects of the business logic of any corporation including real time decision making, resource optimization, and so on. In this paper, we present an analysis of different aspects related to real-time data analytics from an algorithmic perspective. Thus, one of the goals of this paper is to identify some new problems in this domain and to gain new insights in order to share the outcomes of our efforts and these challenges with the research community working on real-time data analytics algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Moore’s law is the declaration that throughout the history of computing hardware, the amount of transistors in an integrated circuit duplicates itself after each consecutive two years.
References
Cha, S., Monica, W.: Developing a real-time data analytics framework using Hadoop. In: IEEE International Congress on BigData. IEEE (2015)
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2, 8 (2015)
Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2014, pp. 157–168 (2014)
Morshed, S.J., Rana, J., Milrad, M.: Open source initiatives and frameworks addressing distributed real-time data analytics. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, Illinois, Chicago, USA (2016). doi:10.1109/IPDPSW.2016.152
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On clustering massive data streams: a summarization paradigm. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 9–38. Springer, New York (2007)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, pp. 419–429 (1994)
Keim, D.A., Krstajic, M., Rohrdantz, C., Schreck, T.: Real-time visual analytics for text streams. Computer 46(7), 47–55 (2013)
Tripathy, B.K., Manusha, G.V., Mohisin, G.S.: An improved set-valued data anonymization algorithm and generation of FP-Tree. In: Venugopal, K.R., Patnaik, L.M. (eds.) ICIP 2012. CCIS, vol. 292, pp. 552–560. Springer, Heidelberg (2012)
Xie, J., Yang, J.: A survey of join processing in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31, pp. 209–236. Springer, New York (2007). ISBN: 10:0-387-28759-0
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS Conference (2002)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD Conference (1998)
Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., Park, J.-S.: Fast algorithms for projected clustering. In: ACM SIGMOD Conference (1999)
Aggarwal, C.C.: A survey of change diagnosis algorithms in evolving data streams. In: Models and Algorithms, pp. 85–102. IBM (2007)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 39–59. Springer, New York (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference (1993)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (2002)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
Xifeng, Y., Han, J.: GSPAN: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), p. 721 (2002)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 71–80 (2002)
Fiat, A., Woeginger, G.J.: Online Algorithms: The State of the Art. LNCS, vol. 1442. Springer, Heidelberg (1998)
Shalev-Shwartz, S.: Online Learning: Theory, Algorithms, and Applications, The Hebrew University of Jerusalem. Ph.D. thesis (2014)
Littlestone, N., Warmuth, M.: Relating data compression and learn ability. Unpublished Manuscript, November 1986
Ikonomovska, E., Mariano, Z.: Algorithmic techniques for processing data streams. In: Data Exchange, Information, and Streams, pp. 237–274 (2013)
Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A 20, 745 (1987)
Ben-David, S., Kushilevitz, E., Mansour, Y.: Online learning versus offline learning. Mach. Learn. 29, 45–63 (1997). Kluwer Academic Publishers, Netherlands
Shoorehdeli, M.A., Teshnehlab, M., Sedigh, A.K.: Novel hybrid learning algorithms for tuning ANFIS parameters using adaptive weighted PSO. In: IEEE International on Fuzzy Systems Conference, FUZZ-IEEE 2007, London, pp. 1–6 (2007)
Nguyen, T., Schiefer, J., Tjoa, M.A.: Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In: Proceedings of DOLAP 2005, pp. 77–86. ACM, New York (2005)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of KDD 2002, pp. 279–288. ACM, New York (2002)
Shamir, O.: Fundamental limits of online and distributed algorithms for statistical learning and estimation (2013). CoRR: abs/1311.3494
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Morshed, S.J., Rana, J., Milrad, M. (2016). Real-Time Data Analytics: An Algorithmic Perspective. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2016. Lecture Notes in Computer Science(), vol 9714. Springer, Cham. https://doi.org/10.1007/978-3-319-40973-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-40973-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40972-6
Online ISBN: 978-3-319-40973-3
eBook Packages: Computer ScienceComputer Science (R0)