Abstract
Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings:
-
1.
Applications that require associations with very small support have prohibitively large running times.
-
2.
They assume a static database. Some applications require generating associations in real-time from a dynamic database, where transactions are constantly being added and deleted. There are no existing algorithms to accomodate such applications.
-
3.
They can only find associations of the type where a conjunction of attributes implies a conjunction of different attributes. It turns out that there are many cases where a conjunction of attributes implies another conjunction only provided the exclusion of certain attributes. To our knowledge, there is no current algorithm that can generate such excluding associations.
We present a novel method for association generation, that answers all three above desiderata. Our method is inherently different from all existing algorithms, and especially suitable to textual databases with binary attributes. At the heart of our algorithm lies the use of subword trees for quick indexing into the required database statistics. We tested our algorithm on the Reuters-22173 database with satisfactory results.
Partially supported by NSF grant CCR-92-23699 and the Israel Ministry of Science and the Arts grant 6297.
Partially supported by the Israel Ministry of Science and the Arts grant 8615.
Chapter PDF
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Trans. Knowledge and Data Engineering, 5(6):914–925, 1993.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD, pages 207–216, Washington, DC, May 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. 20th Int'l Conf. on VLDB, Santiago, Chile, Aug 1994.
R. Feldman, A. Amir, Y. Aumann, A. Zilberstein, and H. Hirsh. Incremental algorithms for association generation. to appear, First Pacific Conference on Knowledge Discovery, July 1996.
R. Feldman and I. Dagan. Knowledge discovery in textual databases. Proc. 1st Intl. Conf. on Knowledge Discovery and Data Mining, pages 112–117, 1995.
R. Feldman, I. Dagan, and H. Hirsh. Keyword-based browsing and analysis of large document sets. In Proc. 5th Symp. on Document Analysis and Information Retrieval, Las Vegas, Nevada, April 1996.
R. Feldman, I. Dagan, and W. Kloesgen. Efficient algorithms for mining and manipulating associations in texts. In Proc. 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, April 1996.
W. Kloesgen. Problems for knowledge discovery in databases and their treatment in the statistical interpreter explora. Int'l J. for Intelligent Systems, 7(7):649–673, 1992.
W. Kloesgen. Efficient discovery of interesting statements. The Journal of Intelligent Information Systems, 4(1), 1995.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. Proc. 2nd Int'l Conference on Knowledge Discovery in Databases, 1996.
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. Proc. AAAI Workshop on Knowledge Discovery in Databases, pages 181–192, 1994.
G. Piatetsky-Shapiro and W. J. Frawley, editors. Knowledge Discovery in Databases. AAAI Press/MIT Press, 1991.
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. Proc. 21st Int'l Conf. on VLDB, 1995.
R. Sedgewick. Algorithms. Addison-Wesley, second edition, 1988.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amir, A., Feldman, R., Kashi, R. (1997). A new and versatile method for association generation. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_121
Download citation
DOI: https://doi.org/10.1007/3-540-63223-9_121
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive