Advertisement

A Scalable Platform for Low-Latency Real-Time Analytics of Streaming Data

  • Paolo CappellariEmail author
  • Mark Roantree
  • Soon Ae Chun
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 737)

Abstract

The ability to process high-volume high-speed streaming data from different data sources is critical for modern organizations to gain insights for business decisions. In this research, we present the streaming analytics platform (SDAP), which provides a set of operators to specify the process of stream data transformations and analytics. SDAP adopts a declarative approach to model and design, delivering analytics capabilities through the combination of a set of primitive operators in a simple manner. The model includes a topology to design streaming analytics specifications using a set of atomic data manipulation operators. Our evaluation demonstrates that SDAP is capable of maintaining low-latency while scaling to a cloud of distributed computing nodes, and providing easier process design and execution of streaming analytics.

Keywords

Data stream processing High-performance computing Low-latency Distributed systems 

Notes

Acknowledgements

This research was supported, in part, from Collective[i] Grant RF-7M617-00-01, the National Science Foundation Grants CNS-0958379,CNS-0855217, ACI-1126113 and the City University of New York High Performance Computing Center at the College of Staten Island.

References

  1. 1.
    Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: Millwheel: fault-tolerant stream processing at internet scale. PVLDB 6(11), 1033–1044 (2013). http://www.vldb.org/pvldb/vol6/p1033-akidau.pdf Google Scholar
  2. 2.
    Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 1–3 (2008). http://doi.acm.org/10.1145/1331904.1331907 CrossRefGoogle Scholar
  3. 3.
    Cappellari, P., Chun, S.A., Roantree, M.: Ise: a high performance system for processing data streams. In: Proceedings of 5th International Conference on Data Science, Technology and Applications, DATA 2016, Lisbon, Portugal, pp. 13–24, 24–26 July 2016Google Scholar
  4. 4.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring streams - a new class of data management applications. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, China, pp. 215–226, 20–23 August 2002. http://www.vldb.org/conf/2002/S07P02.pdf
  5. 5.
    Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, China, pp. 203–214, 20–23 August 2002. http://www.vldb.org/conf/2002/S07P01.pdf
  6. 6.
    Chen, X., Beschastnikh, I., Zhuang, L., Yang, F., Qian, Z., Zhou, L., Shen, G., Shen, J.: Sonora: a platform for continuous mobile-cloud computing. Technical report (2012). https://www.microsoft.com/en-us/research/publication/sonora-a-platform-for-continuous-mobile-cloud-computing/
  7. 7.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Gerth, J., Talbot, J., Elmeleegy, K., Sears, R.: Online aggregation and continuous query support in mapreduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, pp. 1115–1118, 6–10 June 2010. http://doi.acm.org/10.1145/1807167.1807295
  8. 8.
    Ganglia (2015). http://ganglia.sourceforge.net/. Accessed 15 Nov 2016
  9. 9.
    Gedik, B., Yu, P.S., Bordawekar, R.: Executing stream joins on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, pp. 363–374, 23–27 September 2007. http://www.vldb.org/conf/2007/papers/research/p363-gedik.pdf
  10. 10.
    Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Mehrotra, S., Sellis, T.K. (eds.) Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, pp. 13–24. ACM, 21–24 May 2001. http://doi.acm.org/10.1145/375663.375665
  11. 11.
    Grinev, M., Grineva, M.P., Hentschel, M., Kossmann, D.: Analytics for the realtime web. PVLDB 4(12), 1391–1394 (2011). http://www.vldb.org/pvldb/vol4/p1391-grinev.pdf Google Scholar
  12. 12.
    Gui, H., Roantree, M.: Topological XML data cube construction. Int. J. Web Eng. Technol. 8(4), 347–368 (2013)CrossRefGoogle Scholar
  13. 13.
    Gui, H., Roantree, M.: Using a pipeline approach to build data cube for large XML data streams. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7827, pp. 59–73. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40270-8_5 CrossRefGoogle Scholar
  14. 14.
    Infiniband (2015). http://www.infinibandta.org/. Accessed 15 Nov 2016
  15. 15.
    InfoSphere streams (2015). http://www-03.ibm.com/software/products/en/infosphere-streams. Accessed 15 Nov 2016
  16. 16.
    Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, pp. 341–352, 5–8 March 2003. doi: 10.1109/ICDE.2003.1260804
  17. 17.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, pp. 311–322, 14–16 June 2005. http://doi.acm.org/10.1145/1066157.1066193
  18. 18.
    Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, pp. 49–60, 3–6 June 2002. http://doi.acm.org/10.1145/564691.564698
  19. 19.
    Maier, D., Li, J., Tucker, P., Tufte, K., Papadimos, V.: Semantics of data streams and operators. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 37–52. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30570-5_3 CrossRefGoogle Scholar
  20. 20.
    Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G.S., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: CIDR (2003). http://www-db.cs.wisc.edu/cidr/cidr2003/program/p22.pdf
  21. 21.
    MVAPICH2, The Ohio State University (2015). http://mvapich.cse.ohio-state.edu/. Accessed 15 Nov 2016
  22. 22.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW 2010, Washington, DC, USA, pp. 170–177 (2010). IEEE Computer Society. doi: 10.1109/ICDMW.2010.172
  23. 23.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, Vancouver, BC, Canada, pp. 251–264, 4–6 October 2010. http://www.usenix.org/events/osdi10/tech/full_papers/Peng.pdf
  24. 24.
    Plimpton, S.J., Shead, T.M.: Streaming data analytics via message passing with application to graph algorithms. J. Parallel Distrib. Comput. 74(8), 2687–2698 (2014). doi: 10.1016/j.jpdc.2014.04.001 CrossRefGoogle Scholar
  25. 25.
    Slurm (2015). http://slurm.schedmd.com/. Accessed 15 Nov 2016
  26. 26.
    Teubner, J., Müller, R.: How soccer players would do stream joins. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, pp. 625–636, 12–16 June 2011. http://doi.acm.org/10.1145/1989323.1989389
  27. 27.
    Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., Ryaboy, D.V.: Storm@twitter. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, pp. 147–156, 22–27 June 2014. http://doi.acm.org/10.1145/2588555.2595641
  28. 28.
  29. 29.
    Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: 4th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2012, Boston, MA, USA, 12–13 June 2012. https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/zaharia

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Paolo Cappellari
    • 1
    Email author
  • Mark Roantree
    • 2
  • Soon Ae Chun
    • 1
  1. 1.City University of New YorkNew YorkUSA
  2. 2.School of Computing, Insight Centre for Data AnalyticsDublin City UniversityDublinIreland

Personalised recommendations