Abstract
Modern machine learning techniques have been applied to many aspects of network analytics in order to discover patterns that can clarify or better demonstrate the behavior of users and systems within a given network. Often the information to be processed has to be converted to a different type in order for machine learning algorithms to be able to process them. To accurately process the information generated by systems within a network, the true intention and meaning behind the information must be observed. In this paper we propose different approaches for mapping network information such as IP addresses to integer values that attempts to keep the relation present in the original format of the information intact. With one exception, all of the proposed mappings result in (at most) 64 bit long outputs in order to allow atomic operations using CPUs with 64 bit registers. The mapping output size is restricted in the interest of performance. Additionally we demonstrate the benefits of the new mappings for one specific machine learning algorithm (k-means) and compare the algorithm’s results for datasets with and without the proposed transformations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cheng, F., Meinel, C., Azodi, A., Jaeger, D.: A new approach to building a multi-tier direct access knowledgebase for ids/siem systems. In: Proceedings of the 11th IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC2013), Chengdu, China, 12 2013. IEEE CS (2013)
Cheng, F., Meinel, C., Azodi, A., Jaeger, D.: Pushing the limits in event normalisation to improve attack detection in ids/siem systems. In: Proceedings of the 1st International Conference on Advanced Cloud and Big Data, Nanjing, China, 12 2013. IEEE CS (2013)
Azodi, A., Gawron, M., Cheng, F., Meinel, C., Sapegin, A., Jaeger, D.: Hierarchical object log format for normalization of security events. In: Proceedings of the 9th International Conference on Information Assurance and Security (IAS 2013), Tunis, Tunisia, 12 2013. IEEE CS (2013)
Aumasson, J.-P., Bernstein, D.J.: Siphash: a fast short-input prf, Jan 2015. https://131002.net/siphash/
Brink, H., Richards, J.: Real-world machine learning. In: MEAP, pp. 1–22 (2014)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Consul, P.C., Famoye, F.: Generalized poisson distribution. In: Lagrangian Probability Distributions, pp. 165–190 (2006)
Fangohr, H.: Performance of python’s long data type, Jan 2013. http://www.southampton.ac.uk/~fangohr/blog/performance-of-pythons-long-data-type.html
Google Inc. Cityhash provides hash functions for strings, Jan 2010. https://code.google.com/p/cityhash/
Google Inc. The farmhash family of hash functions, Jan 2015. https://code.google.com/p/farmhash/
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, vol. 501. Wiley, New York (2009)
Schneider, P.: Tcp/ip traffic classification based on port numbers. Division of Applied Sciences, Cambridge, MA, 2138 (1996)
Schreiber, T.: A voronoi diagram based adaptive k-means-type clustering algorithm for multidimensional weighted data. In: Bieri, H., Noltemeier, H. (eds.) CG-WS 1991. LNCS, vol. 553, pp. 265–275. Springer, Heidelberg (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Azodi, A., Gawron, M., Sapegin, A., Cheng, F., Meinel, C. (2015). Leveraging Event Structure for Adaptive Machine Learning on Big Data Landscapes. In: Boumerdassi, S., Bouzefrane, S., Renault, É. (eds) Mobile, Secure, and Programmable Networking. MSPN 2015. Lecture Notes in Computer Science(), vol 9395. Springer, Cham. https://doi.org/10.1007/978-3-319-25744-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-25744-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25743-3
Online ISBN: 978-3-319-25744-0
eBook Packages: Computer ScienceComputer Science (R0)