Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory

Martinez-Mosquera, Diana; Luján-Mora, Sergio; López, Gabriel; Santos, Lauro

doi:10.1007/978-3-319-66996-0_1

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 300))

Included in the following conference series:

EuroSymposium on Systems Analysis and Design

504 Accesses
3 Citations

Abstract

Information security is one of the most important aspects an organization should consider. Due to this matter and the variety of existing vulnerabilities, there are specialized groups known as Computer Security Incident Response Team (CSIRT), that are responsible for event monitoring and for providing proactive and reactive support related to incidents. Using as a case study a CSIRT of a university with 10,000 users, and considering the high volume of events to be analyzed on a daily basis, it is proposed to implement a Big Data ecosystem. One of the most important activities for the information processing is the data cleaning phase, it will remove useless data and help to overcome storage limitations, since CSIRT is actually limited to a small time-frame, usually a few days and cannot analyze historical security events. Focusing on this cleaning phase, this article analyzes an intuitive technique and proposes a comparative technique based on the Fellegi-Sunter theory. The main conclusion of our research is that some data could be safely ignored helping to reduce storage size requirements. Moreover, increasing the data retention will enable to detect some events from historical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qaiyum, S., Aziz, I.A., Jaafar, J.B.: Analysis of Big Data and quality-of-experience in high-density wireless network. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 287–292 (2016). doi:10.1109/ICCOINS.2016.7783229
Arputhamary, B., Arockiam, L.: Data integration in Big Data environment. Bonfring Int. J. Data Mining 5(1), 1–5 (2015). doi:10.9756/BIJDM.8001
Article Google Scholar
Cárdenas, A., Manadhata, P., Rajan, S.: Big Data analytics for security. IEEE Secur. Priv. 11(6), 74–76 (2015). doi:10.1109/MSP.2013.138
Article Google Scholar
Martínez-Mosquera, D., Luján-Mora, S.: Data cleaning technique for security Big Data ecosystem. In: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, vol. 1, pp. 380–385 (2017). doi:10.5220/0006360603800385
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969). doi:10.1080/01621459.1969.10501049
Article Google Scholar
Luján Mora, S., Palomar Sanz, M.: Reducing inconsistency in integrating data from different sources. In: Proceedings 2001 International Database Engineering and Applications Symposium (IDEAS 2001), pp. 209–218 (2001). doi:10.1109/IDEAS.2001.938087
Luján Mora, S., Palomar Sanz, M.: Comparing string similarity measures for reducing inconsistency in integrating data from different sources. In: Proceedings of the Second International Conference in Advances in Web-Age Information Management (WAIM 2001), pp. 191–202 (2001). doi:10.1007/3-540-47714-4_18
Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 2011 3rd International Conference Computer Research and Development (ICCRD), vol. 2, pp. 490–494 (2011). doi:10.1109/ICCRD.2011.5764181
Maletic, J.I., Marcus, A.: Data cleansing: a prelude to knowledge discovery. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 19–32. Springer, USA (2009). doi:10.1007/978-0-387-09823-4_2
Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Yin, S.: Bigdansing: a system for Big Data cleansing. In: ACM SIGMOD International Conference on Management of Data, pp. 1215–1230 (2015). doi:10.1145/2723372.2747646
Krishnan, S., Haas, D., Franklin, M., Wu, E.: Towards reliable interactive data cleaning: a user survey and recommendations. In: ACM SIGMOD/PODS Conference Workshop on Human. In the Loop Data Analytics (2016), p. 9. doi:10.1145/2939502.2939511
Winkler, W.E.: Using the EM algorithm for weight computation in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, vol. 667, p. 671 (1988)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
Google Scholar

Download references

Acknowledgements

We thank to the National Polytechnic School CSIRT for their collaboration and facilities needed to test this data cleaning technique.

Author information

Authors and Affiliations

Departamento de Ciencias de la Ingeniería, Universidad Israel, Quito, Ecuador
Diana Martinez-Mosquera
Department of Software and Computing Systems, University of Alicante, Alicante, Spain
Sergio Luján-Mora
Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Quito, Ecuador
Gabriel López
Performance Testing and Continuous Integration, Nokia Solutions and Networks, Amadora, Portugal
Lauro Santos

Authors

Diana Martinez-Mosquera
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Luján-Mora
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel López
View author publications
You can also search for this author in PubMed Google Scholar
Lauro Santos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diana Martinez-Mosquera .

Editor information

Editors and Affiliations

Faculty of Management, University of Gdansk, Sopot, Poland
Stanisław Wrycza
University of Gdansk, Sopot, Poland
Jacek Maślankowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martinez-Mosquera, D., Luján-Mora, S., López, G., Santos, L. (2017). Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory. In: Wrycza, S., Maślankowski, J. (eds) Information Systems: Research, Development, Applications, Education. SIGSAND/PLAIS 2017. Lecture Notes in Business Information Processing, vol 300. Springer, Cham. https://doi.org/10.1007/978-3-319-66996-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-66996-0_1
Published: 29 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66995-3
Online ISBN: 978-3-319-66996-0
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics