Skip to main content

A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining

  • Conference paper
  • First Online:
Innovations in Computer Science and Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 413))

Abstract

Various data mining approaches are now available, which help in handling large static data sets, in spite of limited computational resources. However, these approaches lack in mining high-speed endless streams, as their learning procedure though simple require the entire training process to be repeated for each new arriving information instance. The main challenges while dealing with continuous data streams: they are of sizes many times greater than the available memory, are real-time, and the new instances should be inspected at most once, and predictions must be made. Another issue with continuous real-time data is changing of concepts with time, which is often called concept drift. This paper addresses the above stated problems, and provides a solution by proposing a real-time, scalable, and robust architecture. It is a general-purpose architecture, based on online machine learning, which efficiently logs and mines the stream data in a fault-tolerant manner. It consists of two frameworks: (1) Event aggregation framework, which reliably collects events and messages from multiple sources and ships them to a destination for processing (2) Real-time computation framework, which processes streams online for extraction of information patterns. It guarantees reliable processing of billions of messages per second. Furthermore, it facilitates the evaluation of the stream learning algorithms and offers change detection strategies to detect concept drifts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Golab and Ozsu M. T.: Issues in Data Stream Management. In SIGMOD Record, Volume 32, Number 2, June (2003) 5–14.

    Google Scholar 

  2. Garofalakis M., Gehrke J., Rastogi R.: Querying and mining data streams: you only get one look a tutorial. SIGMOD Conference 2002: 35. (2002).

    Google Scholar 

  3. Babcock B., Babu S., Datar M., Motwani R., and Widom J.:Models and issues in data stream systems. In Proceedings of PODS (2002).

    Google Scholar 

  4. Muthukrishnan S.: Data streams: algorithms and applications. Proceedings of the fourteenth annual ACMSIAM symposium on discrete algorithms (2003).

    Google Scholar 

  5. http://developer.yahoo.com/blogs/hadoop/posts/2010/06/enabling_hadoop_batch_processi_1/.

  6. https://issues.apache.org/jira/browse/ZOOKEEPER-775.

  7. Kafka, http://sna-projects.com/kafka/.

  8. Cloudera’s Flume, https://github.com/cloudera/flume.

  9. http://www.ibm.com/developerworks/library/os-spark/.

  10. http://incubator.apache.org/s4/.

  11. http://cloud.berkeley.edu/data/storm-berkeley.pdf.

  12. Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy. “Mining Data Streams: A Review”, VIC3145, Australia, ACM SIGMOD Record Vol. 34, No. 2; June 2005.

    Google Scholar 

  13. http://activemq.apache.org.

  14. Albert Bifet and Richard Kirkby. Massive Online Analysis, August 2009.

    Google Scholar 

  15. Alexey Tsymbal. (2004) The Problem of Concept Drift: Definitions and Related Work.

    Google Scholar 

  16. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R. (2009). New ensemble methods for evolving data streams. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  17. Bifet, A. (2010). Adaptive Stream Mining: Pattern Learning and Mining from Evolving DataStreams, IOS Press.

    Google Scholar 

  18. Bifet, A. and Gavalda, R. (2007). Learning from Time-Changing Data with Adaptive Windowing, in SIAM Int. Conf. on Data Mining (SDM’07).

    Google Scholar 

  19. http://www.facebook.com/note.php?note_id=32008268919.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adnan Rashid Hussain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Hussain, A.R., Hameed, M.A., Fatima, S. (2016). A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining. In: Saini, H., Sayal, R., Rawat, S. (eds) Innovations in Computer Science and Engineering. Advances in Intelligent Systems and Computing, vol 413. Springer, Singapore. https://doi.org/10.1007/978-981-10-0419-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0419-3_36

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0417-9

  • Online ISBN: 978-981-10-0419-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics