Mining Interesting XML-Enabled Association Rules with Templates

Feng, Ling; Dillon, Tharam

doi:10.1007/978-3-540-31841-5_5

Ling Feng¹⁸ &
Tharam Dillon¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Included in the following conference series:

International Workshop on Knowledge Discovery in Inductive Databases

210 Accesses
13 Citations

Abstract

XML-enabled association rule framework [FDWC03] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world, mining from XML data, however, is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. In order to make XML-enabled association rule mining truly practical and computationally tractable, in this study, we present a template model to help users specify the interesting XML-enabled associations to be mined. Techniques for template-guided mining of association rules from large XML data are also described in the paper. We demonstrate the effectiveness of these techniques through a set of experiments on both synthetic and real-life data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Washington D.C., USA, May 1993, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp. 478–499 (1994)
Google Scholar
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Article Google Scholar
Agrawal, R., Shim, K.: Developing tightly-coupled data mining applications on a relational database system. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)
Google Scholar
Bonifati, A., Ceri, S.: Comparative analysis of five XML query languages. SIGMOD Record 29(1), 68–79 (2000)
Article Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market basket: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 265–276 (1997)
Google Scholar
Baralis, E., Psaila, G.: Designing templates for mining association rules. Journal of Intelligent Information Systems 9(1), 7–32 (1997)
Article Google Scholar
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. In: Proc. Intl. Conf. Data Engineering, California, USA, March 2000, pp. 489–499 (2000)
Google Scholar
Cheung, D., Han, J., Ng, V., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proc. of the Intl. Conf. on Data Engineering, New Orleans, Louisiana, USA, February 1996, pp. 106–114 (1996)
Google Scholar
World Wide Web Consortium. The XML Data Model (January 2000), http://www.w3.org/XML/Datamodel.html/
World Wide Web Consortium. Document Object Model (DOM) (April 2001), http://www.w3.org/DOM/
World Wide Web Consortium. XQuery 1.0: An XML Query Language (April 2002), http://www.w3.org/TR/xquery/
World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Functions and Operators (April 2002), http://www.w3.org/TR/xquery-operators/
Cheung, D.W., Ng, V.T., Fu, A.W., Fu, Y.J.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)
Article Google Scholar
Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 606–617 (1998)
Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)
Article Google Scholar
Feng, L., Dillon, T., Weigand, H., Chang, E.: An xml-enabled association rule framework. In: Proc. of the 14th Intl. Conf. on Database and Expert Systems Applications, Prague, Czech Republic, September 2003, pp. 88–97 (2003)
Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining using two-dimensional optimized association rules: Schema, algorithms, and visualization. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 13–23 (1996)
Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. In: Proc. of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Montreal, Canada, June 1996, pp. 182–191 (1996)
Google Scholar
Shen, W., Ong, K., Mitbander, B., Zaniolo, C.: Meta-queries for data mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1995)
Google Scholar
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 299–310 (1998)
Google Scholar
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 420–431 (1995)
Google Scholar
Han, J., Fu, Y.: Meta-rule-guided mining of association rules in relational databases. In: Proc. of the 1st Intl. Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases, Singapore, December 1995, pp. 39–46 (1995)
Google Scholar
Han, J., Fu, Y., Koperski, K., Wang, W., Zaiane, O.: DMQL: a data mining query language for relational databases. In: Proc. of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada (June 1996)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Han, E.-H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 277–288 (1997)
Google Scholar
Houtsma, M., Swami, A.: Set-oriented mining of association rules. In: Proc. of the International Conference on Data Engineering, Taipei, Taiwan (March 1995)
Google Scholar
Kamber, M., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc. of the International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 207–210 (1997)
Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proc. of the 3rd Intl. Conf. on Information and Knowledge Management, Gaithersburg, Maryland, November 1994, pp. 401–408 (1994)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Mining association rules with multiple minimum supports. In: Proc. ACM SIGKDD Intl. Conf. Knowledge Discovery and Dara Mining, California, USA, August 1999, pp. 125–134 (1999)
Google Scholar
Lakshmanan, L.V.S., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, USA, June 1999, pp. 157–168 (1999)
Google Scholar
Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)
Google Scholar
Morgado, E.J.M.: Semantic networks as abstract data types. Technical Report Ph.D. thesis, Technical Report 86-1, Department of Computer Science, SUNY at Buffalo, NY (1986)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proc. of the 22nd Intl. Conf. on Very Large Data Bases, Mumbai, India, September 1996, pp. 122–133 (1996)
Google Scholar
Maruyama, K., Uehara, K.: Mining association rules from semi-structured data. In: Proc. of the ICDCS Workshop of Knowledge Discovery and Data Mining in the World-Wide Web, Taiwan (April 2000)
Google Scholar
Miller, R.J., Yang, Y.: Association rules over interval data. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 452–461 (1997)
Google Scholar
Ng, R., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 13–24 (1998)
Google Scholar
Ozden, B., Ramaswamy, A., Silberschatz, A.: Cyclic association rules. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 412–421 (1998)
Google Scholar
Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, San Jose, CA, May 1995, pp. 175–186 (1995)
Google Scholar
Park, J.-S., Chen, M.-S., Yu, P.S.: Mining association rules with adjustable accuracy. Technical Report IBM Research Report (1995)
Google Scholar
Park, J.-S., Chen, M.-S., Yu, P.S.: Data mining for path traversal patterns in a web environment. In: Proc. of the 16th Conference on Distributed Computing Systems, Hong Kong, May 1996, pp. 385–392 (1996)
Google Scholar
Ramaswamy, S., Mahajan, S., Silberschatz, A.: On the discovery of interesting patterns in association rules. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 368–379 (1998)
Google Scholar
Rastogi, R., Shim, K.: Mining optimized association rules with categorical and numerical attributes. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 503–512 (1998)
Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 409–419 (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 1–12 (1996)
Google Scholar
Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 594–605 (1998)
Google Scholar
Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. Data Mining and Knowledge Discovery 4(2/3), 163–192 (2000)
Article Google Scholar
Singh, L., Chen, B., Haight, R., Scheuermann, P., Aoki, K.: A robust system architecture for mining semi-structured data. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA, August 1998, pp. 329–333 (1998)
Google Scholar
Singh, L., Chen, B., Haight, R., Scheuermann, P.: An algorithm for constrained association rule mining in semi-structured data. In: Proc. of the 3rd. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Beijing, China, April 1999, pp. 148–158 (1999)
Google Scholar
Shapiro, S.C.: Cables, paths, and subconscious reasoning in propositional semantic networks. In: Sowa, J.F. (ed.) Principles of Semantic Networks - Explorations in the Representation of Knowledge (1991)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 432–443 (1995)
Google Scholar
Singh, L., Scheuermann, P., Chen, B.: Generating association rules from semi-structured documents using an extended concept hierarchy. In: Proc. of the 6th. International Conference on Information and Knowledge Management, Las Vegas, USA, November 1997, pp. 193–200 (1997)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: Alternatives and implications. In: Proc. of the International Conference on Management of Data, USA (1998)
Google Scholar
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proc. of the 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, August 1997, pp. 67–73 (1997)
Google Scholar
Schmidt, A., Waas, F., Kersten, M.L., Florescu, D., Carey, M.J., Manolescu, I., Busse, R.: Why and how to benchmark XML database. SIGMOD Record 30(3), 27–32 (2001)
Article Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22th Conference on Very Large Data Bases, Mumbai, India, September 1996, pp. 134–145 (1996)
Google Scholar
Thomas, S., Sarawagi, S.: Mining generalized association rules and sequential patterns using SQL queries. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA (August 1998)
Google Scholar
Tsur, D., Ullman, J.D., Abitboul, S., Clifton, C., Motwani, R., Nestorov, S.: Query flocks: a generalization of association-rule mining. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 1–12 (1998)
Google Scholar
Wang, K., He, Y., Han, J.: Mining frequent itemsets using support constraints. In: Proc. 26th Intl. Conf. Very Large Data Bases, Cairo, Egypt, September 2000, pp. 43–52 (2000)
Google Scholar
Wang, H., Park, S., Fan, W., Yu, P.: ViST: A dynamic index method for querying XML data by tree structures. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, California, USA, June 2003, pp. 110–121 (2003)
Google Scholar
Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)
Google Scholar
Wang, K., Liu, H.: Discovering typical structures of documents: a road map approach. In: Proc. of the ACM SIGIR International Conference on Research and Development in information Retrieval, Melbourne, Australia, August 1998, pp. 146–154 (1998)
Google Scholar
Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Transactions on Knowledge and Data Engineering 12(2), 353–371 (2000)
Article Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, August 1997, pp. 283–286 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Twente, PO Box 217, 7500 AE, Enschede, The Netherlands
Ling Feng
Faculty of Information Technology, University of Technology, Sydney, Australia
Tharam Dillon

Authors

Ling Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tharam Dillon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mathematics and computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
Bart Goethals
Department of Computer Science, Universiteit Utrecht,
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, L., Dillon, T. (2005). Mining Interesting XML-Enabled Association Rules with Templates. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-31841-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics