Finding Trees from Unordered 0–1 Data

Heikinheimo, Hannes; Mannila, Heikki; Seppänen, Jouni K.

doi:10.1007/11871637_20

Hannes Heikinheimo²¹,
Heikki Mannila²¹ &
Jouni K. Seppänen²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3386 Accesses
1 Citations

Abstract

Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0–1 data and consider the problem of discovering frequently occurring members of this pattern class. Intuitively, a tree T occurs in a row u of the data, if the attributes of T that occur in u form a subtree of T containing the root. We show that this definition has advantageous properties: only shallow trees have a significant probability of occurring in random data, and the definition allows a simple levelwise algorithm for mining all frequently occurring trees. We demonstrate with empirical results that the method is feasible and that it discovers interesting trees in real data.

Download to read the full chapter text

Chapter PDF

Introduction to Pattern Mining

Tree Sets

Article 25 April 2017

Abstract Representations and Generalized Frequent Pattern Discovery

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining – an overview. Fundamenta Informaticae 66, 161–198 (2005)
MATH MathSciNet Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp. 509–512 (2003)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Mining frequent rooted trees and free trees using canonical forms. Technical Report CSD-TR No. 030043, UCLA Computer Science Department (2003), ftp://ftp.cs.ucla.edu/tech-report/2003-reports/030043.pdf
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 11–20 (2004)
Google Scholar
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequneces (MGST), pp. 55–64 (2003)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 71–80 (2002)
Google Scholar
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc., Sunderland (2004)
Google Scholar
Pei, J., Tung, A.K., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pp. 7–12 (2001)
Google Scholar
Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 194–203 (2001)
Google Scholar
Seppänen, J.K., Mannila, H.: Dense itemsets. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 683–688 (2004)
Google Scholar
Gionis, A., Kujala, T., Mannila, H.: Fragments of order. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 129–136 (2003)
Google Scholar
Tuzhilin, A., Adomavicius, G.: Handling very large numbers of association rules in the analysis of microarray data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 396–404 (2002)
Google Scholar
Lent, B., Swami, A.N., Widom, J.: Clustering association rules. In: Proceedings of the 13th International Conference on Data Engineering (ICDE), pp. 220–231 (1997)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 125–134 (1999)
Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the Third International Conference on Information and Knowledge Management (CIKM), pp. 401–407 (1994)
Google Scholar
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)
Chapter Google Scholar
Kreher, D.L., Stinson, D.R.: Combinatorial Algorithms: Generation, Enumeration and Search. In: Discrete mathematics and its applications. CRC Press, Boca Raton (1999)
Google Scholar
Sloane, N.J.A.: The on-line encyclopedia of integer sequences (2006), http://www.research.att.com/~njas/sequences/
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
Fortelius, M. (coordinator): Neogene of the old world database of fossil mammals (NOW), University of Helsinki (2006), http://www.helsinki.fi/science/now/
Fortelius, M., Gionis, A., Jernvall, J., Mannila, H.: Spectral ordering and biochronology of european fossil mammals. Paleobiology 32, 206–214 (2006)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD), pp. 85–93 (1998)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

HIIT Basic Research Unit, Lab. Computer and Information Science, Helsinki University of Technology, FI-02015, Finland
Hannes Heikinheimo, Heikki Mannila & Jouni K. Seppänen

Authors

Hannes Heikinheimo
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar
Jouni K. Seppänen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heikinheimo, H., Mannila, H., Seppänen, J.K. (2006). Finding Trees from Unordered 0–1 Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_20

Download citation

DOI: https://doi.org/10.1007/11871637_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Finding Trees from Unordered 0–1 Data

Abstract

Chapter PDF

Similar content being viewed by others

Introduction to Pattern Mining

Tree Sets

Abstract Representations and Generalized Frequent Pattern Discovery

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Finding Trees from Unordered 0–1 Data

Abstract

Chapter PDF

Similar content being viewed by others

Introduction to Pattern Mining

Tree Sets

Abstract Representations and Generalized Frequent Pattern Discovery

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation