Skip to main content

Analyzing Big Security Logs in Cluster with Apache Spark

  • Conference paper
  • First Online:
  • 2310 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 529))

Abstract

Cyber security is the major concern in today’s highly networked environment and logging is the primary way of tracking compliance with the security policies. However analyzing the massive amount of logs has become a “Big Data” problem. Apache Spark is one of the latest and most notable incarnation of Data Flow Models in cluster computing. In terms of security log analysis, it provides an exceptional batch or interactive working environment. In this study, Apache Spark along with its distinctive features is briefly introduced, the challenges related to security logs analyzes are discussed and then some of Spark’s security log analyzing capabilities are demonstrated through a problem related to big security logs. Finally, a sample Spark Application is presented that extracts statistics relevant to the problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://spark.apache.org.

  2. 2.

    http://hadoop.apache.org.

  3. 3.

    http://www.mathworks.com.

  4. 4.

    http://www.r-project.org.

  5. 5.

    Spark uses HDFS as distributed file system environment amongst cluster nodes.

  6. 6.

    http://www.ubuntu.com.

  7. 7.

    http://www.ansible.com.

References

  1. Kent, K., Souppaya, M.: Guide to computer security log management. recommendations of the national institute of standards and technology (2006). http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf. Accessed 10 Jan 2016

  2. Fekete, R.: Log message classification with syslog-ng (2010). http://lwn.net/Articles/369075/. Accessed 10 Jan 2016

  3. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, p. 2. USENIX Association, Berkeley (2012)

    Google Scholar 

  4. Spark, A.: Apache spark web site. http://spark.apache.org. Accessed 10 Jan 2016

  5. Kreps, J.: The log: What every software engineer should know about real-time data’s unifying abstraction (2013). https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying. Accessed 14 Jan 2016

  6. Dean, A.: The three eras of business data processing (2014). http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing. Accessed 14 Jan 2016

  7. codecondo.com: 8 tools for log monitoring and processing big data (2014). http://codecondo.com/8-tools-for-log-monitoring-and-processing-big-data. Accessed 14 Jan 2016

  8. Kobielus, J.: Big data log analysis thrives on machine learning (2014). http://www.infoworld.com/article/2608064/big-data/big-data-log-analysis-thrives-on-machine-learning.html. Accessed 14 Jan 2016

  9. Zeltser, L.: Critical log review checklist for security incidents (2015). https://zeltser.com/security-incident-log-review-checklist. Accessed 14 Jan 2016

  10. databricks.com: Log analysis with spark. https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/index.html. Accessed 14 Jan 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Talha Oktay .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Oktay, T., Sayar, A. (2017). Analyzing Big Security Logs in Cluster with Apache Spark. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47898-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47897-5

  • Online ISBN: 978-3-319-47898-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics