Skip to main content

Dynamic Pattern Detection for Big Data Stream Analytics

  • Chapter
  • First Online:
  • 778 Accesses

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

The last two decades witnessed tremendous and astonishing developments in technology. This pushed for visible revolution in communication and electronics design leading to the production of computing devices of various sizes and capabilities, ranging from tiny sensors with limited specifications to mobile devices with huge power and rich functionalities, among others. These stimulated researchers and practitioners work hard seeking the best possible benefit from such novel devices to serve humanity. Gathering huge amounts of data is way easier and more affordable than ever before. Indeed, there is a clear shift from paper-based manual data collection to totally automated data collection even under sever conditions which were never feasible to consider before. Data is captured as a stream which may encapsulate some trends that may reveal certain aspects essential to our daily life. Identifying such trends in data streams is the main theme of the study described in this chapter. We mainly concentrate on real-time stream data analysis to better serve time-critical applications where instant decision making is crucial. This study builds on our methodology described in (Xylogiannopoulos et al. Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in social networks analysis and mining (ASONAM), pp. 931–938, 2016) which considers detecting all repeated patterns in a big data stream. In the new dynamic approach, a sliding window is employed with LERP Reduced Suffix Array and the ARPaD algorithm to analyze one trillion digits composed from one million subsequences of one million digits each. We achieved like generating one data point every 300 ns.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Repeated patterns detection in big data using classification and parallelism on LERP reduced suffix arrays. Appl. Intell. 45(3), 567–597 (2016). https://doi.org/10.1007/s10489-016-0766-2

    Article  Google Scholar 

  2. Xylogiannopoulos, K. F.: Data structures, algorithms and applications for big data analytics: single, multiple and all repeated patterns detection in discrete sequences. Unpublished PhD thesis, University of Calgary (2017)

    Google Scholar 

  3. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Analyzing very large time series using suffix arrays. Appl. Intell. 41(3), 941–955 (2014). https://doi.org/10.1007/s10489-014-0553-x

    Article  Google Scholar 

  4. Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theor. Comput. Sci. 22, 297–315 (1983)

    Article  Google Scholar 

  5. Weiner, P.: Linear pattern matching algorithms. In: SWAT ‘73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973), pp. 1–11 (1973)

    Google Scholar 

  6. Guo, D., Hu, X., Xie, F., Wu, X.: Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39, 57–74 (2013)

    Article  Google Scholar 

  7. Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcards. Appl. Intell. 41, 99–116 (2014)

    Article  Google Scholar 

  8. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)

    Google Scholar 

  9. Franek, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. JALC. 8(4), 579–591 (2003)

    Google Scholar 

  10. Puglishi, S.J., Smyth, W.F., Yusufu, M.: Fast optimal algorithms for computing all the repeats in a string. In: Proceedings of PSC, pp. 161–169 (2008)

    Google Scholar 

  11. Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2009). https://doi.org/10.1007/s00778-009-0172-z

    Article  Google Scholar 

  12. Boyer, R.S., Moore, J.: A fast majority vote algorithm. Technical Report ICSCA-CMP-32, Institute for Computer Science, University of Texas (1981)

    Google Scholar 

  13. Demaine, E., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms (ESA) (2002)

    Google Scholar 

  14. Karp, R., Papadimitriou, C., Shenker, S.: A simple algorithm for finding frequent elements in sets and bags. ACM Trans. Database Syst. 28, 51–55 (2003)

    Article  Google Scholar 

  15. Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: International Conference on Very Large Data Bases, pp. 346–357 (2002)

    Google Scholar 

  16. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory (2005)

    Google Scholar 

  17. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: ACM SIGMOD International Conference on Management of Data (2001)

    Google Scholar 

  18. Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 239–249. ACM (2004)

    Google Scholar 

  19. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)

    Article  Google Scholar 

  20. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm. 55(1), 58–75 (2005)

    Article  Google Scholar 

  21. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Sequential all frequent Itemsets detection – a method to detect all frequent sequential itemsets using LERP–reduced suffix array data structure and ARPaD algorithhm. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining, pp. 1141–1148 (2015). https://doi.org/10.1145/2808797.2809301

    Chapter  Google Scholar 

  22. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Real time early warning DDoS attack detection. In: Proceedings of the 11th International Conference on Cyber Warfare and Security, (2016), pp. 344–351 (2016)

    Google Scholar 

  23. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Pattern detection and analysis in financial time series using suffix arrays. In: Doumpos, M., Zopounidis, C., Pardalos, P.M. (eds.) Financial Decision Making Using Computational Intelligence, pp. 129–157 (2012). https://doi.org/10.1007/978-1-4614-3773-4_5

    Chapter  Google Scholar 

  24. Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in Social Networks Analysis and Mining (ASONAM), pp. 931–938 (2016). https://doi.org/10.1109/ASONAM.2016.7752351

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R. (2018). Dynamic Pattern Detection for Big Data Stream Analytics. In: Kaya, M., Kawash, J., Khoury, S., Day, MY. (eds) Social Network Based Big Data Analysis and Applications. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-78196-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78196-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78195-2

  • Online ISBN: 978-3-319-78196-9

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics