Abstract
Most current intrusion detection systems are signature based ones or machine learning based methods. Despite the number of machine learning algorithms applied to KDD 99 cup, none of them have introduced a pre-model to reduce the huge information quantity present in the different KDD 99 datasets. We introduce a method that applies to the different datasets before performing any of the different machine learning algorithms applied to KDD 99 intrusion detection cup. This method enables us to significantly reduce the information quantity in the different datasets without loss of information. Our method is based on Principal Component Analysis (PCA). It works by projecting data elements onto a feature space, which is actually a vector space Rd, that spans the significant variations among known data elements. We present two well known algorithms we deal with, decision trees and nearest neighbor, and we show the contribution of our approach to alleviate the decision process. We rely on some experiments we perform over network records from the KDD 99 dataset, first by a direct application of these two algorithms on the rough data, second after projection of the different datasets on the new feature space.
Chapter PDF
References
Agrawal, R. and Joshi, M. V. (2000). PNrule: A New Framework for Learning Classifier Models in Data Mining A Case-Study in Network Intrusion detection. Technical Report RC-21719, IBM Research Division.
Anderson, J. P. (1980). Computer Security Threat Monitoring and Surveillance. Technical report, Co., Fort Washington, Pennsylvania.
Bay, S. D. (1998). Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets. In Proceedings of the 15th International Conf. on Machine Learning, pages 37–45, San Francisco, CA. Morgan Kaufmann.
BenAmor, N., Benferhat, S., and ElOuedi, Z. (2004). Naive Bayes vs Decision Trees in Intrusion Detection Systems. In The 19th ACM Symposium On Applied Computing — SAC 2004, Nicosia, Cyprus.
Bouzida, Y. and Gombault, S. (2003). Intrusion Detection Using Principal Component Analysis. In Proceedings of the 7th World Multiconference on Systemics, Cybernetics and Informatics, Orlando, Florida.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees.
Dasarathy, B. V. (1991). A Computational Demand Optimization Aide for Nearest-NeighborBased Decision systems. In IEEE International Conference on Systems, Man and Cybernetics, pages 1777–1782. Morgan Kaufmann.
Denning, D. (1987). An Intrusion Detection Model. IEEE Transactions on Software Engineering, 13(2):222–232.
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. (2003). A Geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. Applications of Data Mining in Computer Security.
Fix, E. and Hodges, J. L. (1951). Discriminatory analysis: Nonparametric discrimination: Consistency properties. Technical Report 21–49-004, USAF School of Aviation Medecine, Randolf Field, Texas.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Hettich, S. and Bay, S. D. (1999). The UCI KDD Archive. Available at: http://kdd.ics.uci.edu/.
Hotelling, H. (1933). Analysis of a complex statistical variables into principal components. Journal of Educational Psychology, 24:417–441.
Ilgun, K. (1993). Ustat, a real time intrusion detection system for UNIX. In IEEE Symposium on Security and Privacy, pages 16–28, Oakland, CA.
Jolliffe, I. T. (2002). Principal Component Analysis. Springer Verlag, New York, NY, third edition.
KDD Cup (1999). KDD Cup 99 Intrusion Detection Datasets. Available at: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
KDD Task (1999). KDD 99 Task. Available at: http://kdd.ics.uci.edu/databases/kddcup99/task. html.
Kirby, M. and Sirovich, L. (1990). Application of the KarhunenLoeve Procedure for the Characterization of Human Faces. IEEE Transactions On Pattern Analysis and Machine Intelligence, 12( l ):103–107.
Kumar, S. and Spafford, E. (1994). A pattern matching model for misuse intrusion detection. In Proceedings of the 17th National Computer security Conference, pages 11–21.
Lee, W., Stolfo, S. J., and Mok, K. (1999). Mining in a data flow environment: Experience in intrusion detection. In Proceeding of the 1999 Conference on Knowledge Discovery and Data Mining KDD-99.
Levin, I. (2000). KDD-99 Classifier Learning Contest LLSoft’s Results Overview. SIGKDD Explorations. ACM SIGKDD, 1:67–71.
Mitchell, T. M. (1997). Machine Learning. McGraw Hill.
Pfahringer, B. (2000). Winning the KDD Classification Cup: Bagged Boosting. SIGKDD Explorations. ACM SIGKDD, 1:65–66.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1–106.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers.
Roesch, M. (1999). Snort — Lightweight Intrusion Detection for Networks. In 13th Systems Administration Conference — LISA 99.
Sabhnani, M. and Serpen, G. (2004). On Failure of Machine Learning Algorithms for detecting Misuse in KDD intrusion Detection Data Set. lntelligent Analysis. To Appear.
Turk, M. and Pentland, A. (1991). Eigenfaces for Recognition. Cognitive Neuroscience, 13(1):71–96.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 IFIP International Federation for Information Processing
About this paper
Cite this paper
Bouzida, Y., Gombault, S. (2004). Eigenconnections to Intrusion Detection. In: Deswarte, Y., Cuppens, F., Jajodia, S., Wang, L. (eds) Security and Protection in Information Processing Systems. SEC 2004. IFIP — The International Federation for Information Processing, vol 147. Springer, Boston, MA. https://doi.org/10.1007/1-4020-8143-X_16
Download citation
DOI: https://doi.org/10.1007/1-4020-8143-X_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4757-8016-1
Online ISBN: 978-1-4020-8143-9
eBook Packages: Springer Book Archive