Much of the data collected during the monitoring of cyber and other infrastructures is structural in nature, consisting of various types of entities and relationships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high truepositive rates, low false-positive rates, and is capable of detecting complex structural anomalies in real-world domains including email communications, cellphone calls and network traffic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barthélemy, M., Chow, E. and Eliassi-Rad, T, Knowledge Representation Issues in Semantic Graphs for Relationship Detection. AI Technologies for Homeland Security: Papers from the 2005 AAAI Spring Symposium, AAAI Press, 2005, pp. 91-98.
Boykin, P. and Roychowdhury, V. Leveraging Social Networks to Fight Spam. IEEE Computer, April 2005, 38(4), 61-67, 2005.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and Wiener, J. Graph Structure in the Web. Computer Networks, Vol. 33, 309-320, 2000.
Caruso, C. and Malerba, D. Clustering as an add-on for firewalls. Data Mining, WIT Press, 2004.
Chakrabarti, D. AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 112-124, 2004.
Chung, F., Lu, L., Vu, V. Eigenvalues of Random Power Law Graphs. Annals of Combinatorics, 7, 21-33, 2003.
Cook, D. and Holder, L. Graph-based data mining. IEEE Intelligent Systems 15(2), 32-41, 2000.
Cook, D. and Holder, L. Mining Graph Data. John Wiley and Sons, 2006.
Eberle, W. and Holder, L. Detecting Anomalies in Cargo Shipments Using Graph Properties. Proceedings of the IEEE Intelligence and Security Informatics Conference, 2006.
Frank, E. and Witten, I. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Second Edition, 2005.
Gross, J, and Yellen, J. Graph Theory and Its Applications. CRC Press. 1999.
Gudes, E. and Shimony, S. Discovering Frequent Graph Patterns Using Disjoint Paths IEEE Transactions of Knowledge and Data Engineering, 18(11) November 2006.
Holder, L., Cook, D. and Djoko, S. Substructure Discovery in the SUBDUE System. Proceedings of the AAAI Workshop on Knowledge Discover in Databases, pp. 169-180, 1994.
Holder, L., Cook, D., Coble, J., and Mukherjee, M. Graph-based Relational Learning with Application to Security. Fundamenta Informaticae Special Issue on Mining Graphs, Trees and Sequences, 66(1-2):83-101, March 2005.
Huan, J., Wang, W. and Prins, J. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Knowledge Discovery and Data Mining, KDD '04, 2004.
KDD Cup 1999. Knowledge Discovery and Data Mining Tools Competition. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. 1999.
Kamarck, E. Applying 21 st Century Government to the Challenge of Homeland Security. Harvard University, PriceWaterhouseCoopers, 2002.
Kanungo, T, Mount, D., Netanyahu, N., Piatko, C., Silverman, R. and Wu, A. The Analysis of a Simple k-Means Clustering Algorithm. Proceedings on the 16th Annual Symposium on Computational Geometry, 100-109, 2000.
Kuramochi, M. and Karypis, G. An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering, pp. 1038-1051, 2004.
Kuramochi, M. and Karypis, G. Grew - A Scalable Frequent Subgraph Discovery Algorithm. IEEE International Conference on Data Mining (ICDM '04), 2004.
Lin S. and Chalupsky, H. Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. Proceedings of the Third IEEE ICDM International Conference on Data Mining, 171-178, 2003.
Mukherjee, M. and Holder, L. Graph-based Data Mining on Social Networks. Workshop on Link Analysis and Group Detection, KDD, 2004.
Noble, C. and Cook, D. Graph-Based Anomaly Detection. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631-636, 2003.
Portnoy, L., Eskin, E. and Stolfo, S. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, 2001.
Rattigan, M. and Jensen, D. The case for anomalous link discovery. ACM SIGKDD Explor. Newsl., 7(2):41-47, 2005.
Sageman, M. Understanding Terror Networks. University of Pennsylvania Press, 2004.
Scott, J. Social Network Analysis: A Handbook. SAGE Publications, Second Edition, 72-78, 2000.
Shetty, J. and Adibi, J. Discovering Important Nodes through Graph Entropy: The Case of Enron Email Database. KDD, Proceedings of the 3rd international workshop on Link discovery, 74-81, 2005.
Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J. Levitt, K., Wee, C., Yip, R. and Zerkle, D. GrIDS - A Graph Based Intrusion Detection System for Large Networks. Proceedings of the 19th National Information Systems Security Conference, 1996.
Sun, J, Qu, H., Chakrabarti, D. and Faloutsos, C. Relevance search and anomaly detection in bipartite graphs. SIGKDD Explorations 7(2), 48-55, 2005.
Taipale, K. Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data. Columbia Science and Technology Law Review, 2003.
Thomas, L., Valluri, S. and Karlapalem, K. MARGIN: Maximal Frequent Subgraph Mining. Sixth International Conference on Data Mining (ICMD '06), 109-1101, 2006.
U.S. Customs Service: 1,754 Pounds of Marijuana Seized in Cargo Container at Port Everglades. November 6, 2000. (http://www.cbp.gov/hot-new/pressrel/2000/1106-01.htm)
West, D. Introduction to Graph Theory. Prentice-Hall International. Second Edition. 2001.
Yan, X. and Han, J. gSpan: Graph-Based Substructure Pattern Mining. Proceedings of International Conference on Data Mining, ICDM, pp. 51-58, 2002.
Zeng, Z., Wang, J., Zhou, L. and Karypis, G. Coherent closed quasi-clique discovery from large dense graph databases. Conference on Knowledge Discovery in Data, SIGKDD, 797-802, 2006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag US
About this chapter
Cite this chapter
Eberle, W., Holder, L., Cook, D. (2009). Identifying Threats Using Graph-based Anomaly Detection. In: Machine Learning in Cyber Trust. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-88735-7_4
Download citation
DOI: https://doi.org/10.1007/978-0-387-88735-7_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-88734-0
Online ISBN: 978-0-387-88735-7
eBook Packages: Computer ScienceComputer Science (R0)