Skip to main content

Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory

  • Conference paper
  • First Online:
Book cover Information Systems: Research, Development, Applications, Education (SIGSAND/PLAIS 2017)

Abstract

Information security is one of the most important aspects an organization should consider. Due to this matter and the variety of existing vulnerabilities, there are specialized groups known as Computer Security Incident Response Team (CSIRT), that are responsible for event monitoring and for providing proactive and reactive support related to incidents. Using as a case study a CSIRT of a university with 10,000 users, and considering the high volume of events to be analyzed on a daily basis, it is proposed to implement a Big Data ecosystem. One of the most important activities for the information processing is the data cleaning phase, it will remove useless data and help to overcome storage limitations, since CSIRT is actually limited to a small time-frame, usually a few days and cannot analyze historical security events. Focusing on this cleaning phase, this article analyzes an intuitive technique and proposes a comparative technique based on the Fellegi-Sunter theory. The main conclusion of our research is that some data could be safely ignored helping to reduce storage size requirements. Moreover, increasing the data retention will enable to detect some events from historical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Qaiyum, S., Aziz, I.A., Jaafar, J.B.: Analysis of Big Data and quality-of-experience in high-density wireless network. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 287–292 (2016). doi:10.1109/ICCOINS.2016.7783229

  2. Arputhamary, B., Arockiam, L.: Data integration in Big Data environment. Bonfring Int. J. Data Mining 5(1), 1–5 (2015). doi:10.9756/BIJDM.8001

    Article  Google Scholar 

  3. Cárdenas, A., Manadhata, P., Rajan, S.: Big Data analytics for security. IEEE Secur. Priv. 11(6), 74–76 (2015). doi:10.1109/MSP.2013.138

    Article  Google Scholar 

  4. Martínez-Mosquera, D., Luján-Mora, S.: Data cleaning technique for security Big Data ecosystem. In: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, vol. 1, pp. 380–385 (2017). doi:10.5220/0006360603800385

  5. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969). doi:10.1080/01621459.1969.10501049

    Article  Google Scholar 

  6. Luján Mora, S., Palomar Sanz, M.: Reducing inconsistency in integrating data from different sources. In: Proceedings 2001 International Database Engineering and Applications Symposium (IDEAS 2001), pp. 209–218 (2001). doi:10.1109/IDEAS.2001.938087

  7. Luján Mora, S., Palomar Sanz, M.: Comparing string similarity measures for reducing inconsistency in integrating data from different sources. In: Proceedings of the Second International Conference in Advances in Web-Age Information Management (WAIM 2001), pp. 191–202 (2001). doi:10.1007/3-540-47714-4_18

  8. Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 2011 3rd International Conference Computer Research and Development (ICCRD), vol. 2, pp. 490–494 (2011). doi:10.1109/ICCRD.2011.5764181

  9. Maletic, J.I., Marcus, A.: Data cleansing: a prelude to knowledge discovery. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 19–32. Springer, USA (2009). doi:10.1007/978-0-387-09823-4_2

  10. Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Yin, S.: Bigdansing: a system for Big Data cleansing. In: ACM SIGMOD International Conference on Management of Data, pp. 1215–1230 (2015). doi:10.1145/2723372.2747646

  11. Krishnan, S., Haas, D., Franklin, M., Wu, E.: Towards reliable interactive data cleaning: a user survey and recommendations. In: ACM SIGMOD/PODS Conference Workshop on Human. In the Loop Data Analytics (2016), p. 9. doi:10.1145/2939502.2939511

  12. Winkler, W.E.: Using the EM algorithm for weight computation in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, vol. 667, p. 671 (1988)

    Google Scholar 

  13. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)

    Google Scholar 

Download references

Acknowledgements

We thank to the National Polytechnic School CSIRT for their collaboration and facilities needed to test this data cleaning technique.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diana Martinez-Mosquera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Martinez-Mosquera, D., Luján-Mora, S., López, G., Santos, L. (2017). Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory. In: Wrycza, S., Maślankowski, J. (eds) Information Systems: Research, Development, Applications, Education. SIGSAND/PLAIS 2017. Lecture Notes in Business Information Processing, vol 300. Springer, Cham. https://doi.org/10.1007/978-3-319-66996-0_1

Download citation

Publish with us

Policies and ethics