Abstract
Information security is one of the most important aspects an organization should consider. Due to this matter and the variety of existing vulnerabilities, there are specialized groups known as Computer Security Incident Response Team (CSIRT), that are responsible for event monitoring and for providing proactive and reactive support related to incidents. Using as a case study a CSIRT of a university with 10,000 users, and considering the high volume of events to be analyzed on a daily basis, it is proposed to implement a Big Data ecosystem. One of the most important activities for the information processing is the data cleaning phase, it will remove useless data and help to overcome storage limitations, since CSIRT is actually limited to a small time-frame, usually a few days and cannot analyze historical security events. Focusing on this cleaning phase, this article analyzes an intuitive technique and proposes a comparative technique based on the Fellegi-Sunter theory. The main conclusion of our research is that some data could be safely ignored helping to reduce storage size requirements. Moreover, increasing the data retention will enable to detect some events from historical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qaiyum, S., Aziz, I.A., Jaafar, J.B.: Analysis of Big Data and quality-of-experience in high-density wireless network. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 287–292 (2016). doi:10.1109/ICCOINS.2016.7783229
Arputhamary, B., Arockiam, L.: Data integration in Big Data environment. Bonfring Int. J. Data Mining 5(1), 1–5 (2015). doi:10.9756/BIJDM.8001
Cárdenas, A., Manadhata, P., Rajan, S.: Big Data analytics for security. IEEE Secur. Priv. 11(6), 74–76 (2015). doi:10.1109/MSP.2013.138
Martínez-Mosquera, D., Luján-Mora, S.: Data cleaning technique for security Big Data ecosystem. In: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, vol. 1, pp. 380–385 (2017). doi:10.5220/0006360603800385
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969). doi:10.1080/01621459.1969.10501049
Luján Mora, S., Palomar Sanz, M.: Reducing inconsistency in integrating data from different sources. In: Proceedings 2001 International Database Engineering and Applications Symposium (IDEAS 2001), pp. 209–218 (2001). doi:10.1109/IDEAS.2001.938087
Luján Mora, S., Palomar Sanz, M.: Comparing string similarity measures for reducing inconsistency in integrating data from different sources. In: Proceedings of the Second International Conference in Advances in Web-Age Information Management (WAIM 2001), pp. 191–202 (2001). doi:10.1007/3-540-47714-4_18
Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 2011 3rd International Conference Computer Research and Development (ICCRD), vol. 2, pp. 490–494 (2011). doi:10.1109/ICCRD.2011.5764181
Maletic, J.I., Marcus, A.: Data cleansing: a prelude to knowledge discovery. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 19–32. Springer, USA (2009). doi:10.1007/978-0-387-09823-4_2
Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Yin, S.: Bigdansing: a system for Big Data cleansing. In: ACM SIGMOD International Conference on Management of Data, pp. 1215–1230 (2015). doi:10.1145/2723372.2747646
Krishnan, S., Haas, D., Franklin, M., Wu, E.: Towards reliable interactive data cleaning: a user survey and recommendations. In: ACM SIGMOD/PODS Conference Workshop on Human. In the Loop Data Analytics (2016), p. 9. doi:10.1145/2939502.2939511
Winkler, W.E.: Using the EM algorithm for weight computation in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, vol. 667, p. 671 (1988)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
Acknowledgements
We thank to the National Polytechnic School CSIRT for their collaboration and facilities needed to test this data cleaning technique.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Martinez-Mosquera, D., Luján-Mora, S., López, G., Santos, L. (2017). Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory. In: Wrycza, S., Maślankowski, J. (eds) Information Systems: Research, Development, Applications, Education. SIGSAND/PLAIS 2017. Lecture Notes in Business Information Processing, vol 300. Springer, Cham. https://doi.org/10.1007/978-3-319-66996-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-66996-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66995-3
Online ISBN: 978-3-319-66996-0
eBook Packages: Business and ManagementBusiness and Management (R0)