Abstract
XML-enabled association rule framework [FDWC03] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world, mining from XML data, however, is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. In order to make XML-enabled association rule mining truly practical and computationally tractable, in this study, we present a template model to help users specify the interesting XML-enabled associations to be mined. Techniques for template-guided mining of association rules from large XML data are also described in the paper. We demonstrate the effectiveness of these techniques through a set of experiments on both synthetic and real-life data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Washington D.C., USA, May 1993, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp. 478–499 (1994)
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Agrawal, R., Shim, K.: Developing tightly-coupled data mining applications on a relational database system. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)
Bonifati, A., Ceri, S.: Comparative analysis of five XML query languages. SIGMOD Record 29(1), 68–79 (2000)
Brin, S., Motwani, R., Silverstein, C.: Beyond market basket: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 265–276 (1997)
Baralis, E., Psaila, G.: Designing templates for mining association rules. Journal of Intelligent Information Systems 9(1), 7–32 (1997)
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. In: Proc. Intl. Conf. Data Engineering, California, USA, March 2000, pp. 489–499 (2000)
Cheung, D., Han, J., Ng, V., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proc. of the Intl. Conf. on Data Engineering, New Orleans, Louisiana, USA, February 1996, pp. 106–114 (1996)
World Wide Web Consortium. The XML Data Model (January 2000), http://www.w3.org/XML/Datamodel.html/
World Wide Web Consortium. Document Object Model (DOM) (April 2001), http://www.w3.org/DOM/
World Wide Web Consortium. XQuery 1.0: An XML Query Language (April 2002), http://www.w3.org/TR/xquery/
World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Functions and Operators (April 2002), http://www.w3.org/TR/xquery-operators/
Cheung, D.W., Ng, V.T., Fu, A.W., Fu, Y.J.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)
Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 606–617 (1998)
Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)
Feng, L., Dillon, T., Weigand, H., Chang, E.: An xml-enabled association rule framework. In: Proc. of the 14th Intl. Conf. on Database and Expert Systems Applications, Prague, Czech Republic, September 2003, pp. 88–97 (2003)
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining using two-dimensional optimized association rules: Schema, algorithms, and visualization. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 13–23 (1996)
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. In: Proc. of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Montreal, Canada, June 1996, pp. 182–191 (1996)
Shen, W., Ong, K., Mitbander, B., Zaniolo, C.: Meta-queries for data mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1995)
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 299–310 (1998)
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 420–431 (1995)
Han, J., Fu, Y.: Meta-rule-guided mining of association rules in relational databases. In: Proc. of the 1st Intl. Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases, Singapore, December 1995, pp. 39–46 (1995)
Han, J., Fu, Y., Koperski, K., Wang, W., Zaiane, O.: DMQL: a data mining query language for relational databases. In: Proc. of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada (June 1996)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Han, E.-H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 277–288 (1997)
Houtsma, M., Swami, A.: Set-oriented mining of association rules. In: Proc. of the International Conference on Data Engineering, Taipei, Taiwan (March 1995)
Kamber, M., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc. of the International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 207–210 (1997)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proc. of the 3rd Intl. Conf. on Information and Knowledge Management, Gaithersburg, Maryland, November 1994, pp. 401–408 (1994)
Liu, B., Hsu, W., Ma, Y.: Mining association rules with multiple minimum supports. In: Proc. ACM SIGKDD Intl. Conf. Knowledge Discovery and Dara Mining, California, USA, August 1999, pp. 125–134 (1999)
Lakshmanan, L.V.S., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, USA, June 1999, pp. 157–168 (1999)
Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)
Morgado, E.J.M.: Semantic networks as abstract data types. Technical Report Ph.D. thesis, Technical Report 86-1, Department of Computer Science, SUNY at Buffalo, NY (1986)
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proc. of the 22nd Intl. Conf. on Very Large Data Bases, Mumbai, India, September 1996, pp. 122–133 (1996)
Maruyama, K., Uehara, K.: Mining association rules from semi-structured data. In: Proc. of the ICDCS Workshop of Knowledge Discovery and Data Mining in the World-Wide Web, Taiwan (April 2000)
Miller, R.J., Yang, Y.: Association rules over interval data. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Tucson, Arizona, USA, June 1997, pp. 452–461 (1997)
Ng, R., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 13–24 (1998)
Ozden, B., Ramaswamy, A., Silberschatz, A.: Cyclic association rules. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 412–421 (1998)
Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, San Jose, CA, May 1995, pp. 175–186 (1995)
Park, J.-S., Chen, M.-S., Yu, P.S.: Mining association rules with adjustable accuracy. Technical Report IBM Research Report (1995)
Park, J.-S., Chen, M.-S., Yu, P.S.: Data mining for path traversal patterns in a web environment. In: Proc. of the 16th Conference on Distributed Computing Systems, Hong Kong, May 1996, pp. 385–392 (1996)
Ramaswamy, S., Mahajan, S., Silberschatz, A.: On the discovery of interesting patterns in association rules. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 368–379 (1998)
Rastogi, R., Shim, K.: Mining optimized association rules with categorical and numerical attributes. In: Proc. of the Intl. Conf. on Data Engineering, Florida, USA, February 1998, pp. 503–512 (1998)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 409–419 (1995)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Montreal, Canada, June 1996, pp. 1–12 (1996)
Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. In: Proc. of the 24th Intl. Conf. on Very Large Data Bases, New York, USA, August 1998, pp. 594–605 (1998)
Silverstein, C., Brin, S., Motwani, R., Ullman, J.D.: Scalable techniques for mining causal structures. Data Mining and Knowledge Discovery 4(2/3), 163–192 (2000)
Singh, L., Chen, B., Haight, R., Scheuermann, P., Aoki, K.: A robust system architecture for mining semi-structured data. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA, August 1998, pp. 329–333 (1998)
Singh, L., Chen, B., Haight, R., Scheuermann, P.: An algorithm for constrained association rule mining in semi-structured data. In: Proc. of the 3rd. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Beijing, China, April 1999, pp. 148–158 (1999)
Shapiro, S.C.: Cables, paths, and subconscious reasoning in propositional semantic networks. In: Sowa, J.F. (ed.) Principles of Semantic Networks - Explorations in the Representation of Knowledge (1991)
Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 432–443 (1995)
Singh, L., Scheuermann, P., Chen, B.: Generating association rules from semi-structured documents using an extended concept hierarchy. In: Proc. of the 6th. International Conference on Information and Knowledge Management, Las Vegas, USA, November 1997, pp. 193–200 (1997)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: Alternatives and implications. In: Proc. of the International Conference on Management of Data, USA (1998)
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proc. of the 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, August 1997, pp. 67–73 (1997)
Schmidt, A., Waas, F., Kersten, M.L., Florescu, D., Carey, M.J., Manolescu, I., Busse, R.: Why and how to benchmark XML database. SIGMOD Record 30(3), 27–32 (2001)
Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22th Conference on Very Large Data Bases, Mumbai, India, September 1996, pp. 134–145 (1996)
Thomas, S., Sarawagi, S.: Mining generalized association rules and sequential patterns using SQL queries. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA (August 1998)
Tsur, D., Ullman, J.D., Abitboul, S., Clifton, C., Motwani, R., Nestorov, S.: Query flocks: a generalization of association-rule mining. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Seattle, Washington, June 1998, pp. 1–12 (1998)
Wang, K., He, Y., Han, J.: Mining frequent itemsets using support constraints. In: Proc. 26th Intl. Conf. Very Large Data Bases, Cairo, Egypt, September 2000, pp. 43–52 (2000)
Wang, H., Park, S., Fan, W., Yu, P.: ViST: A dynamic index method for querying XML data by tree structures. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, California, USA, June 2003, pp. 110–121 (2003)
Wang, K., Liu, H.: Schema discovery for semi-structured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, California, USA, August 1997, pp. 271–274 (1997)
Wang, K., Liu, H.: Discovering typical structures of documents: a road map approach. In: Proc. of the ACM SIGIR International Conference on Research and Development in information Retrieval, Melbourne, Australia, August 1998, pp. 146–154 (1998)
Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Transactions on Knowledge and Data Engineering 12(2), 353–371 (2000)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, August 1997, pp. 283–286 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feng, L., Dillon, T. (2005). Mining Interesting XML-Enabled Association Rules with Templates. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-31841-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)