Skip to main content

Mining Interesting XML-Enabled Association Rules with Templates

  • Conference paper
Knowledge Discovery in Inductive Databases (KDID 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Included in the following conference series:

Abstract

XML-enabled association rule framework [FDWC03] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world, mining from XML data, however, is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. In order to make XML-enabled association rule mining truly practical and computationally tractable, in this study, we present a template model to help users specify the interesting XML-enabled associations to be mined. Techniques for template-guided mining of association rules from large XML data are also described in the paper. We demonstrate the effectiveness of these techniques through a set of experiments on both synthetic and real-life data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Washington D.C., USA, May 1993, pp. 207–216 (1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp. 478–499 (1994)

    Google Scholar 

  3. Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)

    Article  Google Scholar 

  4. Agrawal, R., Shim, K.: Developing tightly-coupled data mining applications on a relational database system. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  5. Bonifati, A., Ceri, S.: Comparative analysis of five XML query languages. SIGMOD Record 29(1), 68–79 (2000)

    Article  Google Scholar 

  6. Brin, S., Motwani, R., Silverstein, C.: Beyond market basket: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 265–276 (1997)

    Google Scholar 

  7. Baralis, E., Psaila, G.: Designing templates for mining association rules. Journal of Intelligent Information Systems 9(1), 7–32 (1997)

    Article  Google Scholar 

  8. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. In: Proc. Intl. Conf. Data Engineering, California, USA, March 2000, pp. 489–499 (2000)

    Google Scholar 

  9. Cheung, D., Han, J., Ng, V., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proc. of the Intl. Conf. on Data Engineering, New Orleans, Louisiana, USA, February 1996, pp. 106–114 (1996)

    Google Scholar 

  10. World Wide Web Consortium. The XML Data Model (January 2000), http://www.w3.org/XML/Datamodel.html/

  11. World Wide Web Consortium. Document Object Model (DOM) (April 2001), http://www.w3.org/DOM/

  12. World Wide Web Consortium. XQuery 1.0: An XML Query Language (April 2002), http://www.w3.org/TR/xquery/

  13. World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Functions and Operators (April 2002), http://www.w3.org/TR/xquery-operators/

  14. Cheung, D.W., Ng, V.T., Fu, A.W., Fu, Y.J.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)

    Article  Google Scholar 

  15. Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 606–617 (1998)

    Google Scholar 

  16. Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)

    Article  Google Scholar 

  17. Feng, L., Dillon, T., Weigand, H., Chang, E.: An xml-enabled association rule framework. In: Proc. of the 14th Intl. Conf. on Database and Expert Systems Applications, Prague, Czech Republic, September 2003, pp. 88–97 (2003)

    Google Scholar 

  18. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining using two-dimensional optimized association rules: Schema, algorithms, and visualization. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 13–23 (1996)

    Google Scholar 

  19. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. In: Proc. of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Montreal, Canada, June 1996, pp. 182–191 (1996)

    Google Scholar 

  20. Shen, W., Ong, K., Mitbander, B., Zaniolo, C.: Meta-queries for data mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1995)

    Google Scholar 

  21. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 299–310 (1998)

    Google Scholar 

  22. Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 420–431 (1995)

    Google Scholar 

  23. Han, J., Fu, Y.: Meta-rule-guided mining of association rules in relational databases. In: Proc. of the 1st Intl. Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases, Singapore, December 1995, pp. 39–46 (1995)

    Google Scholar 

  24. Han, J., Fu, Y., Koperski, K., Wang, W., Zaiane, O.: DMQL: a data mining query language for relational databases. In: Proc. of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada (June 1996)

    Google Scholar 

  25. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  26. Han, E.-H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 277–288 (1997)

    Google Scholar 

  27. Houtsma, M., Swami, A.: Set-oriented mining of association rules. In: Proc. of the International Conference on Data Engineering, Taipei, Taiwan (March 1995)

    Google Scholar 

  28. Kamber, M., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc. of the International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 207–210 (1997)

    Google Scholar 

  29. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proc. of the 3rd Intl. Conf. on Information and Knowledge Management, Gaithersburg, Maryland, November 1994, pp. 401–408 (1994)

    Google Scholar 

  30. Liu, B., Hsu, W., Ma, Y.: Mining association rules with multiple minimum supports. In: Proc. ACM SIGKDD Intl. Conf. Knowledge Discovery and Dara Mining, California, USA, August 1999, pp. 125–134 (1999)

    Google Scholar 

  31. Lakshmanan, L.V.S., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, USA, June 1999, pp. 157–168 (1999)

    Google Scholar 

  32. Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)

    Google Scholar 

  33. Morgado, E.J.M.: Semantic networks as abstract data types. Technical Report Ph.D. thesis, Technical Report 86-1, Department of Computer Science, SUNY at Buffalo, NY (1986)

    Google Scholar 

  34. Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proc. of the 22nd Intl. Conf. on Very Large Data Bases, Mumbai, India, September 1996, pp. 122–133 (1996)

    Google Scholar 

  35. Maruyama, K., Uehara, K.: Mining association rules from semi-structured data. In: Proc. of the ICDCS Workshop of Knowledge Discovery and Data Mining in the World-Wide Web, Taiwan (April 2000)

    Google Scholar 

  36. Miller, R.J., Yang, Y.: Association rules over interval data. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 452–461 (1997)

    Google Scholar 

  37. Ng, R., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 13–24 (1998)

    Google Scholar 

  38. Ozden, B., Ramaswamy, A., Silberschatz, A.: Cyclic association rules. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 412–421 (1998)

    Google Scholar 

  39. Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, San Jose, CA, May 1995, pp. 175–186 (1995)

    Google Scholar 

  40. Park, J.-S., Chen, M.-S., Yu, P.S.: Mining association rules with adjustable accuracy. Technical Report IBM Research Report (1995)

    Google Scholar 

  41. Park, J.-S., Chen, M.-S., Yu, P.S.: Data mining for path traversal patterns in a web environment. In: Proc. of the 16th Conference on Distributed Computing Systems, Hong Kong, May 1996, pp. 385–392 (1996)

    Google Scholar 

  42. Ramaswamy, S., Mahajan, S., Silberschatz, A.: On the discovery of interesting patterns in association rules. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 368–379 (1998)

    Google Scholar 

  43. Rastogi, R., Shim, K.: Mining optimized association rules with categorical and numerical attributes. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 503–512 (1998)

    Google Scholar 

  44. Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 409–419 (1995)

    Google Scholar 

  45. Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 1–12 (1996)

    Google Scholar 

  46. Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 594–605 (1998)

    Google Scholar 

  47. Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. Data Mining and Knowledge Discovery 4(2/3), 163–192 (2000)

    Article  Google Scholar 

  48. Singh, L., Chen, B., Haight, R., Scheuermann, P., Aoki, K.: A robust system architecture for mining semi-structured data. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA, August 1998, pp. 329–333 (1998)

    Google Scholar 

  49. Singh, L., Chen, B., Haight, R., Scheuermann, P.: An algorithm for constrained association rule mining in semi-structured data. In: Proc. of the 3rd. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Beijing, China, April 1999, pp. 148–158 (1999)

    Google Scholar 

  50. Shapiro, S.C.: Cables, paths, and subconscious reasoning in propositional semantic networks. In: Sowa, J.F. (ed.) Principles of Semantic Networks - Explorations in the Representation of Knowledge (1991)

    Google Scholar 

  51. Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 432–443 (1995)

    Google Scholar 

  52. Singh, L., Scheuermann, P., Chen, B.: Generating association rules from semi-structured documents using an extended concept hierarchy. In: Proc. of the 6th. International Conference on Information and Knowledge Management, Las Vegas, USA, November 1997, pp. 193–200 (1997)

    Google Scholar 

  53. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: Alternatives and implications. In: Proc. of the International Conference on Management of Data, USA (1998)

    Google Scholar 

  54. Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proc. of the 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, August 1997, pp. 67–73 (1997)

    Google Scholar 

  55. Schmidt, A., Waas, F., Kersten, M.L., Florescu, D., Carey, M.J., Manolescu, I., Busse, R.: Why and how to benchmark XML database. SIGMOD Record 30(3), 27–32 (2001)

    Article  Google Scholar 

  56. Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22th Conference on Very Large Data Bases, Mumbai, India, September 1996, pp. 134–145 (1996)

    Google Scholar 

  57. Thomas, S., Sarawagi, S.: Mining generalized association rules and sequential patterns using SQL queries. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA (August 1998)

    Google Scholar 

  58. Tsur, D., Ullman, J.D., Abitboul, S., Clifton, C., Motwani, R., Nestorov, S.: Query flocks: a generalization of association-rule mining. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 1–12 (1998)

    Google Scholar 

  59. Wang, K., He, Y., Han, J.: Mining frequent itemsets using support constraints. In: Proc. 26th Intl. Conf. Very Large Data Bases, Cairo, Egypt, September 2000, pp. 43–52 (2000)

    Google Scholar 

  60. Wang, H., Park, S., Fan, W., Yu, P.: ViST: A dynamic index method for querying XML data by tree structures. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, California, USA, June 2003, pp. 110–121 (2003)

    Google Scholar 

  61. Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)

    Google Scholar 

  62. Wang, K., Liu, H.: Discovering typical structures of documents: a road map approach. In: Proc. of the ACM SIGIR International Conference on Research and Development in information Retrieval, Melbourne, Australia, August 1998, pp. 146–154 (1998)

    Google Scholar 

  63. Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Transactions on Knowledge and Data Engineering 12(2), 353–371 (2000)

    Article  Google Scholar 

  64. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, August 1997, pp. 283–286 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feng, L., Dillon, T. (2005). Mining Interesting XML-Enabled Association Rules with Templates. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31841-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25082-1

  • Online ISBN: 978-3-540-31841-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics