Identifying Threats Using Graph-based Anomaly Detection

Eberle, William; Holder, Lawrence; Cook, Diane

doi:10.1007/978-0-387-88735-7_4

William Eberle¹,
Lawrence Holder² &
Diane Cook²

2343 Accesses
10 Citations

Much of the data collected during the monitoring of cyber and other infrastructures is structural in nature, consisting of various types of entities and relationships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high truepositive rates, low false-positive rates, and is capable of detecting complex structural anomalies in real-world domains including email communications, cellphone calls and network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barthélemy, M., Chow, E. and Eliassi-Rad, T, Knowledge Representation Issues in Semantic Graphs for Relationship Detection. AI Technologies for Homeland Security: Papers from the 2005 AAAI Spring Symposium, AAAI Press, 2005, pp. 91-98.
Google Scholar
Boykin, P. and Roychowdhury, V. Leveraging Social Networks to Fight Spam. IEEE Computer, April 2005, 38(4), 61-67, 2005.
MathSciNet Google Scholar
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and Wiener, J. Graph Structure in the Web. Computer Networks, Vol. 33, 309-320, 2000.
Article Google Scholar
Caruso, C. and Malerba, D. Clustering as an add-on for firewalls. Data Mining, WIT Press, 2004.
Google Scholar
Chakrabarti, D. AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 112-124, 2004.
Google Scholar
Chung, F., Lu, L., Vu, V. Eigenvalues of Random Power Law Graphs. Annals of Combinatorics, 7, 21-33, 2003.
Article MATH MathSciNet Google Scholar
Cook, D. and Holder, L. Graph-based data mining. IEEE Intelligent Systems 15(2), 32-41, 2000.
Article Google Scholar
Cook, D. and Holder, L. Mining Graph Data. John Wiley and Sons, 2006.
Google Scholar
Eberle, W. and Holder, L. Detecting Anomalies in Cargo Shipments Using Graph Properties. Proceedings of the IEEE Intelligence and Security Informatics Conference, 2006.
Google Scholar
Frank, E. and Witten, I. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Second Edition, 2005.
Google Scholar
Gross, J, and Yellen, J. Graph Theory and Its Applications. CRC Press. 1999.
Google Scholar
Gudes, E. and Shimony, S. Discovering Frequent Graph Patterns Using Disjoint Paths IEEE Transactions of Knowledge and Data Engineering, 18(11) November 2006.
Google Scholar
Holder, L., Cook, D. and Djoko, S. Substructure Discovery in the SUBDUE System. Proceedings of the AAAI Workshop on Knowledge Discover in Databases, pp. 169-180, 1994.
Google Scholar
Holder, L., Cook, D., Coble, J., and Mukherjee, M. Graph-based Relational Learning with Application to Security. Fundamenta Informaticae Special Issue on Mining Graphs, Trees and Sequences, 66(1-2):83-101, March 2005.
MATH MathSciNet Google Scholar
Huan, J., Wang, W. and Prins, J. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Knowledge Discovery and Data Mining, KDD '04, 2004.
Google Scholar
KDD Cup 1999. Knowledge Discovery and Data Mining Tools Competition. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. 1999.
Kamarck, E. Applying 21 ^st Century Government to the Challenge of Homeland Security. Harvard University, PriceWaterhouseCoopers, 2002.
Google Scholar
Kanungo, T, Mount, D., Netanyahu, N., Piatko, C., Silverman, R. and Wu, A. The Analysis of a Simple k-Means Clustering Algorithm. Proceedings on the 16^th Annual Symposium on Computational Geometry, 100-109, 2000.
Google Scholar
Kuramochi, M. and Karypis, G. An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering, pp. 1038-1051, 2004.
Google Scholar
Kuramochi, M. and Karypis, G. Grew - A Scalable Frequent Subgraph Discovery Algorithm. IEEE International Conference on Data Mining (ICDM '04), 2004.
Google Scholar
Lin S. and Chalupsky, H. Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. Proceedings of the Third IEEE ICDM International Conference on Data Mining, 171-178, 2003.
Google Scholar
Mukherjee, M. and Holder, L. Graph-based Data Mining on Social Networks. Workshop on Link Analysis and Group Detection, KDD, 2004.
Google Scholar
Noble, C. and Cook, D. Graph-Based Anomaly Detection. Proceedings of the 9^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631-636, 2003.
Google Scholar
Portnoy, L., Eskin, E. and Stolfo, S. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, 2001.
Google Scholar
Rattigan, M. and Jensen, D. The case for anomalous link discovery. ACM SIGKDD Explor. Newsl., 7(2):41-47, 2005.
Article Google Scholar
Sageman, M. Understanding Terror Networks. University of Pennsylvania Press, 2004.
Google Scholar
Scott, J. Social Network Analysis: A Handbook. SAGE Publications, Second Edition, 72-78, 2000.
Google Scholar
Shetty, J. and Adibi, J. Discovering Important Nodes through Graph Entropy: The Case of Enron Email Database. KDD, Proceedings of the 3rd international workshop on Link discovery, 74-81, 2005.
Google Scholar
Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J. Levitt, K., Wee, C., Yip, R. and Zerkle, D. GrIDS - A Graph Based Intrusion Detection System for Large Networks. Proceedings of the 19^th National Information Systems Security Conference, 1996.
Google Scholar
Sun, J, Qu, H., Chakrabarti, D. and Faloutsos, C. Relevance search and anomaly detection in bipartite graphs. SIGKDD Explorations 7(2), 48-55, 2005.
Article Google Scholar
Taipale, K. Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data. Columbia Science and Technology Law Review, 2003.
Google Scholar
Thomas, L., Valluri, S. and Karlapalem, K. MARGIN: Maximal Frequent Subgraph Mining. Sixth International Conference on Data Mining (ICMD '06), 109-1101, 2006.
Google Scholar
U.S. Customs Service: 1,754 Pounds of Marijuana Seized in Cargo Container at Port Everglades. November 6, 2000. (http://www.cbp.gov/hot-new/pressrel/2000/1106-01.htm)
WEKA, http://www.cs.waikato.ac.nz/∼ml/index.html .
West, D. Introduction to Graph Theory. Prentice-Hall International. Second Edition. 2001.
Google Scholar
Yan, X. and Han, J. gSpan: Graph-Based Substructure Pattern Mining. Proceedings of International Conference on Data Mining, ICDM, pp. 51-58, 2002.
Google Scholar
Zeng, Z., Wang, J., Zhou, L. and Karypis, G. Coherent closed quasi-clique discovery from large dense graph databases. Conference on Knowledge Discovery in Data, SIGKDD, 797-802, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tennessee Technological University, Box 5101, Cookeville, TN, 38505
William Eberle
School of Electrical Engineering and Computer Science, Washington State University, Box 642752, Pullman, WA, 99164
Lawrence Holder & Diane Cook

Authors

William Eberle
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Holder
View author publications
You can also search for this author in PubMed Google Scholar
Diane Cook
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Eberle .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eberle, W., Holder, L., Cook, D. (2009). Identifying Threats Using Graph-based Anomaly Detection. In: Machine Learning in Cyber Trust. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-88735-7_4

Download citation

DOI: https://doi.org/10.1007/978-0-387-88735-7_4
Published: 14 March 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-88734-0
Online ISBN: 978-0-387-88735-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics