Abstract
Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0–1 data and consider the problem of discovering frequently occurring members of this pattern class. Intuitively, a tree T occurs in a row u of the data, if the attributes of T that occur in u form a subtree of T containing the root. We show that this definition has advantageous properties: only shallow trees have a significant probability of occurring in random data, and the definition allows a simple levelwise algorithm for mining all frequently occurring trees. We demonstrate with empirical results that the method is feasible and that it discovers interesting trees in real data.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining – an overview. Fundamenta Informaticae 66, 161–198 (2005)
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp. 509–512 (2003)
Chi, Y., Yang, Y., Muntz, R.R.: Mining frequent rooted trees and free trees using canonical forms. Technical Report CSD-TR No. 030043, UCLA Computer Science Department (2003), ftp://ftp.cs.ucla.edu/tech-report/2003-reports/030043.pdf
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 11–20 (2004)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequneces (MGST), pp. 55–64 (2003)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 71–80 (2002)
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc., Sunderland (2004)
Pei, J., Tung, A.K., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pp. 7–12 (2001)
Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 194–203 (2001)
Seppänen, J.K., Mannila, H.: Dense itemsets. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 683–688 (2004)
Gionis, A., Kujala, T., Mannila, H.: Fragments of order. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 129–136 (2003)
Tuzhilin, A., Adomavicius, G.: Handling very large numbers of association rules in the analysis of microarray data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 396–404 (2002)
Lent, B., Swami, A.N., Widom, J.: Clustering association rules. In: Proceedings of the 13th International Conference on Data Engineering (ICDE), pp. 220–231 (1997)
Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 125–134 (1999)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the Third International Conference on Information and Knowledge Management (CIKM), pp. 401–407 (1994)
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)
Kreher, D.L., Stinson, D.R.: Combinatorial Algorithms: Generation, Enumeration and Search. In: Discrete mathematics and its applications. CRC Press, Boca Raton (1999)
Sloane, N.J.A.: The on-line encyclopedia of integer sequences (2006), http://www.research.att.com/~njas/sequences/
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Hettich, S., Bay, S.D.: The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
Fortelius, M. (coordinator): Neogene of the old world database of fossil mammals (NOW), University of Helsinki (2006), http://www.helsinki.fi/science/now/
Fortelius, M., Gionis, A., Jernvall, J., Mannila, H.: Spectral ordering and biochronology of european fossil mammals. Paleobiology 32, 206–214 (2006)
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD), pp. 85–93 (1998)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heikinheimo, H., Mannila, H., Seppänen, J.K. (2006). Finding Trees from Unordered 0–1 Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_20
Download citation
DOI: https://doi.org/10.1007/11871637_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)