Skip to main content

Identifying Suspicious Activities in Company Networks Through Data Mining and Visualization

  • Chapter
Business Intelligence and Performance Management

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Abstract

Company data are a precious asset which need to be truly authentic and must not be disclosed to unauthorized parties. In this contribution, we report on ongoing work that aims at supporting human IT security experts by pinpointing significant alerts that really need closer inspection. We developed an experimental tool environment to support the analysis of IT infrastructure data with data mining methods. In particular, various clustering algorithms are used to differentiate normal behavior from activities that call for intervention through IT security experts. Before being subjected to clustering, data can be pre-processed in various ways. In particular, categorical values can be cleverly mapped to numerical values while preserving the semantics of the data as far as possible. Resulting clusters can be subjected to visual inspection using techniques such as parallel coordinates or pixel-based techniques, e.g. circle segments or recursive patterns.

Preliminary results indicate that clustering is well suited to structure monitoring data appropriately. Also, fairly large data volumes can be clustered effectively and efficiently. Currently, the main focus is on more elaborate visualization and classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining applications. In: Proc. 25th Int. Conference on Management of Data (SIGMOD’98), pp. 94–105 (1998)

    Google Scholar 

  2. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proc. SIAM Int. Conference on Data Mining, pp. 243–254 (2008)

    Google Scholar 

  3. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture—A System of Patterns. Wiley, Chichester (1996)

    Google Scholar 

  4. Chaturvedi, A.D., Green, P.E., Carroll, J.D.: k-Means, k-medians, and k-modes: special cases of partitioning multiway data. In: Classification Society of North America Meeting, Houston (1994)

    Google Scholar 

  5. Chou, C.-H., Su, M.-C., Lai, E.: A new cluster validity measure and its application to image compression. PAA Pattern Anal. Appl. 7(2), 205–220 (2004)

    MathSciNet  Google Scholar 

  6. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)

    Article  Google Scholar 

  7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  8. Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  9. Dutta, M., Kakoti Mahanta, A., Pujari, A.K.: QROCK: A quick version of the ROCK algorithm for clustering of categorical data. Pattern Recognit. Lett. 26, 2364–2373 (2005)

    Article  Google Scholar 

  10. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231 (1996)

    Google Scholar 

  11. Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical report CPDC-TR-9906-010, Northwestern University, Evanston (1999)

    Google Scholar 

  12. Guha, S., Rastogi, R., Shim, K.: ROCK; a robust clustering algorithm for categorical attributes. In: Proc. 15th Int. Conference on Data Engineering (ICDE’99), pp. 512–521 (1999)

    Google Scholar 

  13. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Syst. 17(2/3), 107–145 (2001)

    Article  MATH  Google Scholar 

  14. Han, J., Kamber, M., Pei, J.: Data Mining—Concepts and Techniques, 3rd edn. Morgan Kaufmann, Waltham (2012)

    MATH  Google Scholar 

  15. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. In: Data Mining and Knowledge Discovery, vol. 2, pp. 283–302 (1998)

    Google Scholar 

  16. Inselberg, A.: The plane with parallel coordinates. Vis. Comput. 1, 69–91 (1985)

    Article  MATH  Google Scholar 

  17. Inselberg, A., Dimsdale, B.: Parallel coordinates: a tool for visualizing multidimensional geometry. In: Proc. 1st IEEE Conference on Visualization (Visualization’90), pp. 361–378 (1990)

    Chapter  Google Scholar 

  18. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  19. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  20. Keim, D., Kriegel, H.-P., Ankerst, M.: Recursive pattern: a technique for visualizing very large amounts of data. In: Proc. 6th IEEE Conference on Visualization (Visualization’95), pp. 279–286 (1995)

    Chapter  Google Scholar 

  21. Kozak, M.: Watch out for superman: first visualize, then analyze. IEEE Comput. Graphics Appl. 32(3), 6–9 (2012)

    Article  MathSciNet  Google Scholar 

  22. Liu, Q., Dong, G.: CPCQ—contrast pattern based clustering quality index for categorical data. Pattern Recognit. 45, 1739–1748 (2012)

    Article  Google Scholar 

  23. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proc. 10th Int. Conference on Data Mining (ICDM 2010), pp. 911–916 (2010)

    Chapter  Google Scholar 

  24. Lloyd, S.P.: Least squares optimization in PCM. Technical report, Bell Labs (1957). Also IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Google Scholar 

  25. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  26. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proc. 4th Int. Conference on the Practical Application of Knowledge Discovery and Data Mining, pp. 29–39 (2000)

    Google Scholar 

Download references

Acknowledgements

The SecMine project is supported under grant no. 17049X10 by Bundesministerium für Bildung und Forschung (BMBF). We thank Christian Bergmann, Toni Böhnlein, Sebastian Detsch, Thomas Geus, Steffen Hammer, Johannes Henninger, Matthias Herrmann, Sebastian Jakob, Daniel Klett, Adrian Köhlein, Evelyn Krüger, Benjamin Krull, Andreas Kühntopf, Hannes Müller, Marc Pieruschek, Markus Pütz, Markus Ring, Martin Rosenbaum, Manuel Schnapp, Tobias Schmidtlein, Christopher Schramm, Elena Tereshko, Melanie Westendorf, Thomas Worch, and Bernhard Sick for their contributions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dieter Landes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Landes, D., Otto, F., Schumann, S., Schlottke, F. (2013). Identifying Suspicious Activities in Company Networks Through Data Mining and Visualization. In: Rausch, P., Sheta, A., Ayesh, A. (eds) Business Intelligence and Performance Management. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-4866-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4866-1_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4865-4

  • Online ISBN: 978-1-4471-4866-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics