Skip to main content

A Big Data Architecture for Log Data Storage and Analysis

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 771))

Abstract

We propose an architecture for analysing database connection logs across different instances of databases within an intranet comprising over 10,000 users and associated devices. Our system uses Flume agents to send notifications to a Hadoop Distributed File System for long-term storage and ElasticSearch and Kibana for short-term visualisation, effectively creating a data lake for the extraction of log data. We adopt machine learning models with an ensemble of approaches to filter and process the indicators within the data and aim to predict anomalies or outliers using feature vectors built from this log data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gorton, I., P. Greenfield, A. Szalay, and R. Williams. 2008. Data-intensive computing in the 21st century. Computer 41 (4): 30–32.

    Article  Google Scholar 

  2. Grancher, E., and M. Limper. 2013. Oracle at CERN. https://indico.cern.ch/event/242874/.

  3. Lanza, D. 2016. Collecting heterogeneous data into a central repository. https://indico.cern.ch/event/578615/.

  4. Baranowski, Z., M. Grzybek, L. Canali, D.L. Garcia, and K. Surdy. 2015. Scale out databases for CERN use cases. In Journal of physics: Conference series, vol. 664, no. 4, 042002. IOP Publishing.

    Google Scholar 

  5. Kothuri, P., D. Lanza Garcia, and J. Hermans. 2016. Developing and optimizing applications in hadoop. In 22nd international conference on computing in high energy and nuclear physics, CHEP.

    Google Scholar 

  6. Moore, R., C. Baru, R. Marciano, A. Rajasekar, and M. Wan. 1997. Data-intensive computing. In: Practical digital libraries: Books, bytes, and bucks, 105–129.

    Google Scholar 

  7. W. Johnston. 1997. Realtime widely distributed instrumentation systems. In: Practical digital libraries: Books, bytes, and bucks, 75–103.

    Google Scholar 

  8. Shoshani, A., L.M. Bernardo, H. Nordberg, D. Rotem, and A. Sim. 1998. Storage management for high energy physics applications. In Proceedings of computing in high energy physics 1998 (CHEP 98). http://www.lbl.gov/arie/papers/proc-CHEP98.ps.

  9. Foster, I., and C. Kesselman (eds.). 1999. The grid: Blueprint for a future computing infrastructure. Florida: Morgan Kaufmann Publishers.

    Google Scholar 

  10. Chervenak, A., I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. 2000. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications 23 (3): 187–200.

    Article  Google Scholar 

  11. Ledain, J.E., J.A. Colgrove, and D. Koren. 1999.  Efficient virtualized mapping space for log device data storage system. Veritas Software Corp., U.S. Patent 5,996,054.

    Google Scholar 

  12. Apache Flume. https://flume.apache.org/.

  13. Oracle real application clusters. http://www.oracle.com/technetwork/database/options/clustering/rac-wp-12c-1896129.pdf.

  14. Chandola, V., A. Banerjee, V. Kumar. 2009. Outlier detection—A survey. Technical Report TR07–17, University of Minnesota.

    Google Scholar 

  15. Plase, D., L. Niedrite, and R. Taranovs. 2017. A comparison of HDFS compact data formats: Avro versus Parquet. Mokslas: Lietuvos Ateitis, 9 (3): 267.

    Article  Google Scholar 

  16. Plase, D., L. Niedrite, and R. Taranovs. November 2016. Accelerating data queries on Hadoop framework by using compact data formats. In 2016 IEEE 4th workshop on advances in information, electronic and electrical engineering (AIEEE), 1–7. IEEE.

    Google Scholar 

  17. Baranowski, Z., L. Canali, R. Toebbicke, J. Hrivnac, and D. Barberis. 2016. On behalf of the ATLAS collaboration, 2016. A study of data representation in hadoop to optimize the data storage and search performance for the ATLAS EventIndex. In 22nd international conference on computing in high energy and nuclear physics, CHEP.

    Google Scholar 

  18. Denning, D.E. 1987. An intrusion-detection model. IEEE Transactions on Software Engineering 2: 222–232.

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the contributions of Mr. Eric Grancher, Mr. Luca Canali, Mr. Michael Davis, Dr. Jean-Roch Vlimant, Mr. Adrian Alan Pol, and other members of the CERN IT-DB Group. They are grateful to the staff and management of the CERN Openlab Team, including Mr. Alberto Di Meglio, for their support in undertaking this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swapneel Mehta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mehta, S., Kothuri, P., Garcia, D.L. (2019). A Big Data Architecture for Log Data Storage and Analysis. In: Krishna, A., Srikantaiah, K., Naveena, C. (eds) Integrated Intelligent Computing, Communication and Security. Studies in Computational Intelligence, vol 771. Springer, Singapore. https://doi.org/10.1007/978-981-10-8797-4_22

Download citation

Publish with us

Policies and ethics