On Efficient and Effective Association Rule Mining from XML Data

Zhang, Ji; Ling, Tok Wang; Bruckner, Robert M.; Tjoa, A Min; Liu, Han

doi:10.1007/978-3-540-30075-5_48

On Efficient and Effective Association Rule Mining from XML Data

Ji Zhang¹⁹,
Tok Wang Ling²⁰,
Robert M. Bruckner²¹,
A Min Tjoa²² &
…
Han Liu¹⁹

Conference paper

671 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Abstract

In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently and effectively. In XAR-Miner, raw XML data are first transform ed to either an Indexed Content Tree (IX-tree) or M ulti-relational databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection in the AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized meta-patterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent under-generalization or over-generalization. Resultant generalized meta-patterns are used to generate large ARs that meet the support and confidence levels. An efficient AR mining algorithm is also presented based on candidate AR generation in the hierarchy of generalized meta-patterns. The experiments show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML docume nts than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of VLDB 1994, September 1994, pp. 487–499. Santiago de Chile, Chile (1994)
Google Scholar
Amir, A., Feldman, R., Kashi, R.: A New and Versatile Method for Association Generation. Information Systems 22(6/7), 333–347 (1997)
Article MATH Google Scholar
Braga, D., Campi, A., Klemettinen, M., Lanzi, P.: Mining Association Rules from XML Data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 21–30. Springer, Heidelberg (2002)
Chapter Google Scholar
Feldman, R., Hirsh, H.: Mining Associations in the Presence of Background Knowledge. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Databases, Portland, Oregon, USA, pp. 343–346 (1996)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
IBM XML Generator, http://www.alphaworks.ibm.com/tech/xmlgenerator
Imielinski, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)
Article Google Scholar
Meo, R., Psaila, G., Ceri, S.: A New Operator for Mining Association Rules. In: Proceeding of VLDB 1996, Bombay, India, September 1996, pp. 122–133 (1996)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: A Tightly-coupled Architecture for Data Mining. In: Proceedings of ICDE 1998, Orlando, FL, USA, February 1998, pp. 316–323 (1998)
Google Scholar
PMML 2.0: Predicative Model Makeup Language (2000), Available at http://www.dmg.org
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure as its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
MATH Google Scholar
Singh, L., Chen, B., Haight, R., Scheuermann, P.: An Algorithm for Constrained Association Rule Mining in Semi-structured Data. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 148–158. Springer, Heidelberg (1999)
Chapter Google Scholar
Singh, L., Scheuermann, P., Chen, B.: Generating Association Rules from Semistructured Documents Using an Extended Concept Hierarchy. In: Proceedings of CIKM 1997, Las Vegas, Nevada, November 1997, pp. 193–200 (1997)
Google Scholar
Psaila, G., Lanzi, P.L.: Hierarchy-based Mining of Association Rules in Data Warehouses. In: Proceedings of ACM SAC 2000, Como, Italy (2000)
Google Scholar
Feng, L., Dillon, T.S., Weigand, H., Chang, E.: An XML-Enabled Association Rule Framework. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 88–97. Springer, Heidelberg (2003)
Chapter Google Scholar
Wan, W.W., Dobbie, G.: Extracting association rules from XML documents using XQuery. In: Proceedings of WIDM 2003, New Orleans, Louisiana, USA, pp. 94–97 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 3G4, Canada
Ji Zhang & Han Liu
Department of Computer Science, National University of Singapore, 117543, Singapore
Tok Wang Ling
Microsoft Research, Redmond, WA, USA
Robert M. Bruckner
Institute of Software Technology, Vienna University of Technology, Vienna, Austria
A Min Tjoa

Authors

Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tok Wang Ling
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Bruckner
View author publications
You can also search for this author in PubMed Google Scholar
A Min Tjoa
View author publications
You can also search for this author in PubMed Google Scholar
Han Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Zaragoza, Ciudad Universitaria, Plaza San Francisco, 50009, Zaragoza
Fernando Galindo
Seikei University, Japan
Makoto Takizawa
Institute of Informatics in Business and Government, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Ling, T.W., Bruckner, R.M., Tjoa, A.M., Liu, H. (2004). On Efficient and Effective Association Rule Mining from XML Data. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-30075-5_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics