Mining Tree-Based Frequent Patterns from XML

Mazuran, Mirjana; Quintarelli, Elisa; Tanca, Letizia

doi:10.1007/978-3-642-04957-6_25

Mirjana Mazuran²³,
Elisa Quintarelli²³ &
Letizia Tanca²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5822))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

777 Accesses
4 Citations

Abstract

The increasing amount of very large XML datasets available to casual users is a challenging problem for our community, and calls for an appropriate support to efficiently gather knowledge from these data. Data mining, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, is the appropriate field for knowledge elicitation. In this work we describe an approach to extract Tree-based association rules from XML documents. Such rules provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on. A prototype system demonstrates the effectiveness of the approach.

This research is partially supported by the Italian MIUR project ARTDECO and by the European Commission, Programme IDEAS-ERC, Project 227977-SMScom.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Washio, T., Inokuchi, A., Motoda, H.: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning (2003)
Google Scholar
Berzal, F., Jiménez, A., Cubero, J.C.: Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 111–120. Springer, Heidelberg (2008)
Google Scholar
Sebag, M., Ohara, K., Washio, T., Termier, A., Rousset, M., Motoda, H.: Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans. Knowl. Data Eng. (2008)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conf. on Very Large Data Bases (1994)
Google Scholar
Oliboni, B., Combi, C., Rossato, R.: Querying XML documents by using association rules. In: DEXA (2005)
Google Scholar
Ceri, S., Klemettinen, M., Braga, D., Campi, A., Lanzi, P.: Discovering interesting information in XML data with association rules. In: Proc. of the 2003 ACM symposium on Applied computing (2003)
Google Scholar
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: Fast mining of frequent tree structures by hashing and indexing. Information & Software Technology (2005)
Google Scholar
Quintarelli, E., Baralis, E., Garza, P., Tanca, L.: Answering XML queries by means of data summaries. ACM Trans. of Information Systems (2007)
Google Scholar
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. In: SIGKDD Explor. Newsl. (2004)
Google Scholar
Youn, H.Y., Paik, J., Kim, U.M.: A new method for mining association rules from a collection of XML documents. In: Computational Science and Its Applications (2005)
Google Scholar
Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng.(2004)
Google Scholar
Weigand, H., Feng, L., Dillon, T.S., Chang, E.: An XML-enabled association rule framework. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 88–97. Springer, Heidelberg (2003)
Google Scholar
Liu, H.C., Zeleznikow, J.: Relational computation for mining association rules from XML data. In: 14th Conf. on Information and Knowl. Management (2005)
Google Scholar
Quintarelli, E., Mazuran, M., Tanca, L.: Mining tree-based association rules from XML documents. Politecnico di Milano (2009), http://home.dei.polimi.it/quintare/papers/mqt09-rr.pdf
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: Proc. of the first International Workshop on Mining Graphs, Trees and Sequences (2003)
Google Scholar
Termier, A., Rousset, M., Sebag, M.: Dryade: A new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM (2004)
Google Scholar
Kawasoe, S., Arimura, H., Sakamoto, H., Asai, T., Abe, K., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Google Scholar
World Wide Web Consortium. XML Information Set (2001), http://www.w3C.org/xml-infoset/
World Wide Web Consortium. XQuery 1.0: An XML query language (2007), http://www.w3C.org/TR/xquery
World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (1998), http://www.w3C.org/TR/REC-xml/
Wan, J.W.W., Dobbie, G.: Extracting association rules from XML documents using xquery. In: Proc. of the 5th ACM international workshop on Web information and data management (2003)
Google Scholar
Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Trans. on Knowl. and Data Eng. (2000)
Google Scholar
Yang, Y., Xia, Y., Chi, Y., Muntz, R.R.: CMTreeMiner: Mining both closed and maximal frequent subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)
Google Scholar
Li, Z., Xiao, Y., Yao, J.F., Dunham, M.H.: Efficient data mining for maximal frequent subtrees. In: Proc. of the Third International Conf. on Data Mining (2003)
Google Scholar
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proc. of the ninth ACM SIGKDD Intern. Conf. on Knowl. Disc. and Data Mining (2003)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. on Knowl. Data Eng. (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano,
Mirjana Mazuran, Elisa Quintarelli & Letizia Tanca

Authors

Mirjana Mazuran
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Quintarelli
View author publications
You can also search for this author in PubMed Google Scholar
Letizia Tanca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Troels Andreasen & Henrik Bulskov &
Iona College, Machine Intelligence Institute, 10801, New Rochelle, NY, USA
Ronald R. Yager
Computer Science Dept., Research group PLIS: Programming, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Henning Christiansen
Department of Computer Science and Engineering, Aalborg University Esbjerg, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
Henrik Legind Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazuran, M., Quintarelli, E., Tanca, L. (2009). Mining Tree-Based Frequent Patterns from XML. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science(), vol 5822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04957-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-04957-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04956-9
Online ISBN: 978-3-642-04957-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics