Skip to main content

On Efficient and Effective Association Rule Mining from XML Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Abstract

In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently and effectively. In XAR-Miner, raw XML data are first transform ed to either an Indexed Content Tree (IX-tree) or M ulti-relational databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection in the AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized meta-patterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent under-generalization or over-generalization. Resultant generalized meta-patterns are used to generate large ARs that meet the support and confidence levels. An efficient AR mining algorithm is also presented based on candidate AR generation in the hierarchy of generalized meta-patterns. The experiments show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML docume nts than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of VLDB 1994, September 1994, pp. 487–499. Santiago de Chile, Chile (1994)

    Google Scholar 

  2. Amir, A., Feldman, R., Kashi, R.: A New and Versatile Method for Association Generation. Information Systems 22(6/7), 333–347 (1997)

    Article  MATH  Google Scholar 

  3. Braga, D., Campi, A., Klemettinen, M., Lanzi, P.: Mining Association Rules from XML Data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 21–30. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Feldman, R., Hirsh, H.: Mining Associations in the Presence of Background Knowledge. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Databases, Portland, Oregon, USA, pp. 343–346 (1996)

    Google Scholar 

  5. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  6. IBM XML Generator, http://www.alphaworks.ibm.com/tech/xmlgenerator

  7. Imielinski, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)

    Article  Google Scholar 

  8. Meo, R., Psaila, G., Ceri, S.: A New Operator for Mining Association Rules. In: Proceeding of VLDB 1996, Bombay, India, September 1996, pp. 122–133 (1996)

    Google Scholar 

  9. Meo, R., Psaila, G., Ceri, S.: A Tightly-coupled Architecture for Data Mining. In: Proceedings of ICDE 1998, Orlando, FL, USA, February 1998, pp. 316–323 (1998)

    Google Scholar 

  10. PMML 2.0: Predicative Model Makeup Language (2000), Available at http://www.dmg.org

  11. Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure as its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  12. Singh, L., Chen, B., Haight, R., Scheuermann, P.: An Algorithm for Constrained Association Rule Mining in Semi-structured Data. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 148–158. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  13. Singh, L., Scheuermann, P., Chen, B.: Generating Association Rules from Semistructured Documents Using an Extended Concept Hierarchy. In: Proceedings of CIKM 1997, Las Vegas, Nevada, November 1997, pp. 193–200 (1997)

    Google Scholar 

  14. Psaila, G., Lanzi, P.L.: Hierarchy-based Mining of Association Rules in Data Warehouses. In: Proceedings of ACM SAC 2000, Como, Italy (2000)

    Google Scholar 

  15. Feng, L., Dillon, T.S., Weigand, H., Chang, E.: An XML-Enabled Association Rule Framework. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 88–97. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Wan, W.W., Dobbie, G.: Extracting association rules from XML documents using XQuery. In: Proceedings of WIDM 2003, New Orleans, Louisiana, USA, pp. 94–97 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, J., Ling, T.W., Bruckner, R.M., Tjoa, A.M., Liu, H. (2004). On Efficient and Effective Association Rule Mining from XML Data. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30075-5_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22936-0

  • Online ISBN: 978-3-540-30075-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics