Skip to main content

Identifying Threats Using Graph-based Anomaly Detection

  • Chapter
  • First Online:
Machine Learning in Cyber Trust

Much of the data collected during the monitoring of cyber and other infrastructures is structural in nature, consisting of various types of entities and relationships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high truepositive rates, low false-positive rates, and is capable of detecting complex structural anomalies in real-world domains including email communications, cellphone calls and network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barthélemy, M., Chow, E. and Eliassi-Rad, T, Knowledge Representation Issues in Semantic Graphs for Relationship Detection. AI Technologies for Homeland Security: Papers from the 2005 AAAI Spring Symposium, AAAI Press, 2005, pp. 91-98.

    Google Scholar 

  2. Boykin, P. and Roychowdhury, V. Leveraging Social Networks to Fight Spam. IEEE Computer, April 2005, 38(4), 61-67, 2005.

    MathSciNet  Google Scholar 

  3. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and Wiener, J. Graph Structure in the Web. Computer Networks, Vol. 33, 309-320, 2000.

    Article  Google Scholar 

  4. Caruso, C. and Malerba, D. Clustering as an add-on for firewalls. Data Mining, WIT Press, 2004.

    Google Scholar 

  5. Chakrabarti, D. AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 112-124, 2004.

    Google Scholar 

  6. Chung, F., Lu, L., Vu, V. Eigenvalues of Random Power Law Graphs. Annals of Combinatorics, 7, 21-33, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  7. Cook, D. and Holder, L. Graph-based data mining. IEEE Intelligent Systems 15(2), 32-41, 2000.

    Article  Google Scholar 

  8. Cook, D. and Holder, L. Mining Graph Data. John Wiley and Sons, 2006.

    Google Scholar 

  9. Eberle, W. and Holder, L. Detecting Anomalies in Cargo Shipments Using Graph Properties. Proceedings of the IEEE Intelligence and Security Informatics Conference, 2006.

    Google Scholar 

  10. Frank, E. and Witten, I. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Second Edition, 2005.

    Google Scholar 

  11. Gross, J, and Yellen, J. Graph Theory and Its Applications. CRC Press. 1999.

    Google Scholar 

  12. Gudes, E. and Shimony, S. Discovering Frequent Graph Patterns Using Disjoint Paths IEEE Transactions of Knowledge and Data Engineering, 18(11) November 2006.

    Google Scholar 

  13. Holder, L., Cook, D. and Djoko, S. Substructure Discovery in the SUBDUE System. Proceedings of the AAAI Workshop on Knowledge Discover in Databases, pp. 169-180, 1994.

    Google Scholar 

  14. Holder, L., Cook, D., Coble, J., and Mukherjee, M. Graph-based Relational Learning with Application to Security. Fundamenta Informaticae Special Issue on Mining Graphs, Trees and Sequences, 66(1-2):83-101, March 2005.

    MATH  MathSciNet  Google Scholar 

  15. Huan, J., Wang, W. and Prins, J. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Knowledge Discovery and Data Mining, KDD '04, 2004.

    Google Scholar 

  16. KDD Cup 1999. Knowledge Discovery and Data Mining Tools Competition. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. 1999.

  17. Kamarck, E. Applying 21 st Century Government to the Challenge of Homeland Security. Harvard University, PriceWaterhouseCoopers, 2002.

    Google Scholar 

  18. Kanungo, T, Mount, D., Netanyahu, N., Piatko, C., Silverman, R. and Wu, A. The Analysis of a Simple k-Means Clustering Algorithm. Proceedings on the 16th Annual Symposium on Computational Geometry, 100-109, 2000.

    Google Scholar 

  19. Kuramochi, M. and Karypis, G. An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering, pp. 1038-1051, 2004.

    Google Scholar 

  20. Kuramochi, M. and Karypis, G. Grew - A Scalable Frequent Subgraph Discovery Algorithm. IEEE International Conference on Data Mining (ICDM '04), 2004.

    Google Scholar 

  21. Lin S. and Chalupsky, H. Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. Proceedings of the Third IEEE ICDM International Conference on Data Mining, 171-178, 2003.

    Google Scholar 

  22. Mukherjee, M. and Holder, L. Graph-based Data Mining on Social Networks. Workshop on Link Analysis and Group Detection, KDD, 2004.

    Google Scholar 

  23. Noble, C. and Cook, D. Graph-Based Anomaly Detection. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631-636, 2003.

    Google Scholar 

  24. Portnoy, L., Eskin, E. and Stolfo, S. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, 2001.

    Google Scholar 

  25. Rattigan, M. and Jensen, D. The case for anomalous link discovery. ACM SIGKDD Explor. Newsl., 7(2):41-47, 2005.

    Article  Google Scholar 

  26. Sageman, M. Understanding Terror Networks. University of Pennsylvania Press, 2004.

    Google Scholar 

  27. Scott, J. Social Network Analysis: A Handbook. SAGE Publications, Second Edition, 72-78, 2000.

    Google Scholar 

  28. Shetty, J. and Adibi, J. Discovering Important Nodes through Graph Entropy: The Case of Enron Email Database. KDD, Proceedings of the 3rd international workshop on Link discovery, 74-81, 2005.

    Google Scholar 

  29. Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J. Levitt, K., Wee, C., Yip, R. and Zerkle, D. GrIDS - A Graph Based Intrusion Detection System for Large Networks. Proceedings of the 19th National Information Systems Security Conference, 1996.

    Google Scholar 

  30. Sun, J, Qu, H., Chakrabarti, D. and Faloutsos, C. Relevance search and anomaly detection in bipartite graphs. SIGKDD Explorations 7(2), 48-55, 2005.

    Article  Google Scholar 

  31. Taipale, K. Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data. Columbia Science and Technology Law Review, 2003.

    Google Scholar 

  32. Thomas, L., Valluri, S. and Karlapalem, K. MARGIN: Maximal Frequent Subgraph Mining. Sixth International Conference on Data Mining (ICMD '06), 109-1101, 2006.

    Google Scholar 

  33. U.S. Customs Service: 1,754 Pounds of Marijuana Seized in Cargo Container at Port Everglades. November 6, 2000. (http://www.cbp.gov/hot-new/pressrel/2000/1106-01.htm)

  34. WEKA, http://www.cs.waikato.ac.nz/∼ml/index.html .

  35. West, D. Introduction to Graph Theory. Prentice-Hall International. Second Edition. 2001.

    Google Scholar 

  36. Yan, X. and Han, J. gSpan: Graph-Based Substructure Pattern Mining. Proceedings of International Conference on Data Mining, ICDM, pp. 51-58, 2002.

    Google Scholar 

  37. Zeng, Z., Wang, J., Zhou, L. and Karypis, G. Coherent closed quasi-clique discovery from large dense graph databases. Conference on Knowledge Discovery in Data, SIGKDD, 797-802, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Eberle .

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag US

About this chapter

Cite this chapter

Eberle, W., Holder, L., Cook, D. (2009). Identifying Threats Using Graph-based Anomaly Detection. In: Machine Learning in Cyber Trust. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-88735-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-88735-7_4

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-88734-0

  • Online ISBN: 978-0-387-88735-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics