Abstract
Over the last decades, frequent itemset mining has become a major area of research, with applications including indexing and similarity search, as well as mining of data streams, web, and software bugs. Although several efficient techniques for generating frequent itemsets with a minimum frequency have been proposed, the number of itemsets produced is in many cases too large for effective usage in real-life applications. Indeed, the problem of deriving frequent itemsets that are both compact and of high quality, remains to a large degree open.
In this paper we address the above problem by posing frequent itemset mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players – one automaton for each item in the examplar – learns which items should be included in the mined frequent itemset. A novel reinforcement scheme allows the bandit players to learn this in a decentralized and on-line manner by observing one itemset at a time. By invoking the latter procedure recursively, a progressively more fine granular summary of the itemset stream is produced, represented as a hierarchy of frequent itemsets.
The proposed scheme is extensively evaluated using both artificial data as well as data from a real-world network intrusion detection application. The results are conclusive, demonstrating an excellent ability to find frequent itemsets. Also, computational complexity grows merely linearly with the cardinality of the examplar itemset. Finally, the hierarchical collections of frequent itemsets produced for network intrusion detection are compact, yet accurately describe the different types of network traffic present.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that in contrast to NETAD, we analyze both ingoing and outgoing network packets, for greater accuracy.
References
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, pp. 18–24 (1998)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington D.C., May 1993, pp. 207–216 (1993)
Barber, B., Hamilton, H.J.: Extracting share frequent itemsets with infrequent subsets. Data Min. Knowl. Disc. 7, 153–185 (2003)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 1997, pp. 255–264 (1997)
Han, J., Chen, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Adam, N.R., Bhargava, B.K., Yesha, Y. (eds.) Third International Conference on Information and Knowledge Management (CIKM’94), pp. 401–407. ACM Press (1994)
Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Comput. Netw. 34(4), 579–595 (2000)
Mahoney, M.V.: Network traffic anomaly detection based on packet bytes. In: Proceedings of ACM-SAC 2003, pp. 346–350. ACM (2003)
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall, Englewood Cliffs (1989)
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Heckerman, D., Mannila, H., Pregibon, D., Uthurusamy, R. (eds.) Proceedings of the 3rd International Conference Knowledge Discovery and Data Mining (KDD-97), pp. 67–73. AAAI Press (1997)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Thathachar, M.A.L., Sastry, P.S.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers, Dordrecht (2004)
Tsetlin, M.L.: Automaton Theory and Modeling of Biological Systems. Academic Press, New York (1973)
Vaarandi, R., Podins, K.: Network IDS alert classification with frequent itemset mining and data clustering. In: Proceedings of the 2010 IEEE Conference on Network and Service Management. IEEE (2010)
Wang, H., Li, Q.-H., Xiong, H., Jiang, S.-Y.: Mining maximal frequent itemsets for intrusion detection. In: Jin, H., Pan, Y., Xiao, N., Sun, J. (eds.) GCC 2004 Workshops. LNCS, vol. 3252, pp. 422–429. Springer, Heidelberg (2004)
Wang, K., He, Y., Cheung, D.W.: Mining confident rules without support requirement. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 89–96. ACM Press, New York (2001)
Zaki, M.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Haugland, V., Kjølleberg, M., Larsen, SE., Granmo, OC. (2014). A Two-Armed Bandit Collective for Hierarchical Examplar Based Mining of Frequent Itemsets with Applications to Intrusion Detection. In: Nguyen, N. (eds) Transactions on Computational Collective Intelligence XIV. Lecture Notes in Computer Science(), vol 8615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44509-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-44509-9_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44508-2
Online ISBN: 978-3-662-44509-9
eBook Packages: Computer ScienceComputer Science (R0)