Abstract
Company data are a precious asset which need to be truly authentic and must not be disclosed to unauthorized parties. In this contribution, we report on ongoing work that aims at supporting human IT security experts by pinpointing significant alerts that really need closer inspection. We developed an experimental tool environment to support the analysis of IT infrastructure data with data mining methods. In particular, various clustering algorithms are used to differentiate normal behavior from activities that call for intervention through IT security experts. Before being subjected to clustering, data can be pre-processed in various ways. In particular, categorical values can be cleverly mapped to numerical values while preserving the semantics of the data as far as possible. Resulting clusters can be subjected to visual inspection using techniques such as parallel coordinates or pixel-based techniques, e.g. circle segments or recursive patterns.
Preliminary results indicate that clustering is well suited to structure monitoring data appropriately. Also, fairly large data volumes can be clustered effectively and efficiently. Currently, the main focus is on more elaborate visualization and classification techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining applications. In: Proc. 25th Int. Conference on Management of Data (SIGMOD’98), pp. 94–105 (1998)
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proc. SIAM Int. Conference on Data Mining, pp. 243–254 (2008)
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture—A System of Patterns. Wiley, Chichester (1996)
Chaturvedi, A.D., Green, P.E., Carroll, J.D.: k-Means, k-medians, and k-modes: special cases of partitioning multiway data. In: Classification Society of North America Meeting, Houston (1994)
Chou, C.-H., Su, M.-C., Lai, E.: A new cluster validity measure and its application to image compression. PAA Pattern Anal. Appl. 7(2), 205–220 (2004)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Dutta, M., Kakoti Mahanta, A., Pujari, A.K.: QROCK: A quick version of the ROCK algorithm for clustering of categorical data. Pattern Recognit. Lett. 26, 2364–2373 (2005)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231 (1996)
Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical report CPDC-TR-9906-010, Northwestern University, Evanston (1999)
Guha, S., Rastogi, R., Shim, K.: ROCK; a robust clustering algorithm for categorical attributes. In: Proc. 15th Int. Conference on Data Engineering (ICDE’99), pp. 512–521 (1999)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Syst. 17(2/3), 107–145 (2001)
Han, J., Kamber, M., Pei, J.: Data Mining—Concepts and Techniques, 3rd edn. Morgan Kaufmann, Waltham (2012)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. In: Data Mining and Knowledge Discovery, vol. 2, pp. 283–302 (1998)
Inselberg, A.: The plane with parallel coordinates. Vis. Comput. 1, 69–91 (1985)
Inselberg, A., Dimsdale, B.: Parallel coordinates: a tool for visualizing multidimensional geometry. In: Proc. 1st IEEE Conference on Visualization (Visualization’90), pp. 361–378 (1990)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Keim, D., Kriegel, H.-P., Ankerst, M.: Recursive pattern: a technique for visualizing very large amounts of data. In: Proc. 6th IEEE Conference on Visualization (Visualization’95), pp. 279–286 (1995)
Kozak, M.: Watch out for superman: first visualize, then analyze. IEEE Comput. Graphics Appl. 32(3), 6–9 (2012)
Liu, Q., Dong, G.: CPCQ—contrast pattern based clustering quality index for categorical data. Pattern Recognit. 45, 1739–1748 (2012)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proc. 10th Int. Conference on Data Mining (ICDM 2010), pp. 911–916 (2010)
Lloyd, S.P.: Least squares optimization in PCM. Technical report, Bell Labs (1957). Also IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proc. 4th Int. Conference on the Practical Application of Knowledge Discovery and Data Mining, pp. 29–39 (2000)
Acknowledgements
The SecMine project is supported under grant no. 17049X10 by Bundesministerium für Bildung und Forschung (BMBF). We thank Christian Bergmann, Toni Böhnlein, Sebastian Detsch, Thomas Geus, Steffen Hammer, Johannes Henninger, Matthias Herrmann, Sebastian Jakob, Daniel Klett, Adrian Köhlein, Evelyn Krüger, Benjamin Krull, Andreas Kühntopf, Hannes Müller, Marc Pieruschek, Markus Pütz, Markus Ring, Martin Rosenbaum, Manuel Schnapp, Tobias Schmidtlein, Christopher Schramm, Elena Tereshko, Melanie Westendorf, Thomas Worch, and Bernhard Sick for their contributions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Landes, D., Otto, F., Schumann, S., Schlottke, F. (2013). Identifying Suspicious Activities in Company Networks Through Data Mining and Visualization. In: Rausch, P., Sheta, A., Ayesh, A. (eds) Business Intelligence and Performance Management. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-4866-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4866-1_6
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4865-4
Online ISBN: 978-1-4471-4866-1
eBook Packages: Computer ScienceComputer Science (R0)