Abstract
Much of the existing work in machine learning and data mining has relied on devising efficient techniques to build accurate models from the data. Research on how the accuracyof a model changes as a function of dynamic updates to the databases is very limited. In this work we show that extracting this information: knowing which aspects of the model are changing; and how theyare changing as a function of data updates; can be verye effective for interactive data mining purposes (where response time is often more important than model qualityas long as model qualityi s not too far off the best (exact) model.
In this paper we consider the problem of generating approximate models within the context of association mining, a keyda ta mining task. We propose a new approach to incrementallyg enerate approximate models of associations in evolving databases. Our approach is able to detect how patterns evolve over time (an interesting result in its own right), and uses this information in generating approximate models with high accuracy at a fraction of the cost (of generating the exact model). Extensive experimental evaluation on real databases demonstrates the effectiveness and advantages of the proposed approach.
Chapter PDF
Similar content being viewed by others
References
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20 th Int’l Conf. on Very Large Databases, San Tiago, Chile, June 1994.
D. Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. of the 12 th Intl. Conf. on Data Engineering, February 1996.
D. Cheung, K. Hu, and S. Xia. Asynchronous parallel algorithm for mining association rules on a shared-memorym ultipprocessors. In ACM Symposium on Parallel Algorithms and Architectures, pages 279–288, 1998.
D. Cheung, S. Lee, and B. Kao. A general incremental technique for maintaining discovered association rules. In Proc. of the 5 th Intl. Conf. on Database Systems for Advanced Applications, pages 1–4, April 1997.
V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. In Proc. of the 16 th Int’l Conf. on Data Engineering, pages 439–448, San Diego, USA, May 2000.
K. Gouda and M. Zaki. Efficientlymining maximal frequent itemsets. In Proc. of the 1 st IEEE Int’l Conference on Data Mining, San Jose, USA, November 2001.
J. Han, H. Jamil, Y. Lu, L. Chen, Y. Liao, and J. Pei. Dna-miner: A system prototype for mining dna sequences. In Proc. of the 2001 ACM-SIGMOD Int’l. Conf. on Management of Data, Santa Barbara, CA, May 2001.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, May 2000.
C. Kamath. On mining scientific datasets. In et al R. L. Grossman, editor, Data Mining for Scientific and Engineering Applications, pages 1–21. Kluwer Academic Publishers, 2001.
S. Lee and D. Cheung. Maintenance of discovered association rules: When to update? In Research Issues on Data Mining and Knowledge Discovery, page March, 1997.
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. In Technical Report TR C-1997-8, U. of Helsinki, January 1997.
S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. ACM Confereince on Information and Knowledge Management (CIKM), Mar 1999.
S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared-memorysy stems. In Knowledge and Information Systems, Santa Barbara, CA, February 2001.
M. Rajman and R. Besan. Text mining-knowledge extraction from unstructured textual data. In Proc. of the 6 th Int'l Conf. Federation of Classication Societies, pages 473–480, Roma, Italy, 1998.
S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. An efficient algorithm for the incremental updation of association rules. In Proc. of the 3 rd Int’l Conf. on Knowledge Discovery and Data Mining, August 1997.
A. Veloso, W. Meira Jr., M. B. de Carvalho, B. Pôssas, S. Parthasarathy, and M. Zaki. Mining frequent itemsets in evolving databases. In Proc. of the 2 nd SIAM Int’l Conf. on Data Mining, Arlington, USA, May 2002.
A. Veloso, B. Rocha, W. Meira Jr., and M. de Carvalho. Real world association rule mining. In Proc. of the 19th British National Conf. on Databases (to appear), July 2002.
M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proc. of the 2nd SIAM Int'l Conf. on Data Mining, Arlington, USA, May 2002.
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3 rd Int’l Conf. Knowledge Discovery and Data Mining, August 1997.
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discoveryof association rules. Data Mining and Knowledge Discovery: An International Journal, 4(1):343–373, December 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veloso, A., Gusmão, B., Meira, W., Carvalho, M., Parthasarathy, S., Zaki, M. (2002). Efficiently Mining Approximate Models of Associations in Evolving Databases. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_36
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive