Interactive Clustering for Transaction Data
We propose a clustering algorithm, OAK, targeted to transaction data as typified by market basket data, web documents, and categorical data. OAK is interactive, incremental, and scalable. Use of a dendrogram facilitates the dynamic modification of the number of clusters. In addition, a condensation technique ensures that the dendrogram (regardless of database size) can be memory resident. A performance study shows that the quality of clusters is comparable to ROCK  with reduced complexity.
KeywordsCluster Algorithm Association Rule Transaction Data Interactive Cluster Cluster Profile
Unable to display preview. Download preview PDF.
- 1.Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 94–105, 1998.Google Scholar
- 2.Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the Twentieth International Conference on Very Large Databases, pages 487–499, Santiago, Chile, 1994.Google Scholar
- 3.Paul Bradlay, Usama Fayyad, and Cory Reina. Scaling clustering algortihms to large databases. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 9–15, 1998.Google Scholar
- 4.Thomas H. Cormen, Charles E. Lerserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.Google Scholar
- 6.Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. CACTUS —-clustering categorical data using summaries. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99), pages 73–83, 1999.Google Scholar
- 7.Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of the 1999 IEEE International Conference on Data Engineering, pages 512–521, 1999.Google Scholar
- 8.Eui-Hong Han, George Karypis, Vipin Kumar, and Bamshad Mobasher. Clustering based on association rule hypergraphs. In 1997 SIGMOD Workshop on Researhc Issues on Data Mining and Knowledge Discovery, 1997.Google Scholar
- 9.Marti A. Hearst and Jan O. Pedersen. Reexamining the cluster hypothesis: Scatterlgather on retrieval results. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 76–84, 1996.Google Scholar
- 10.Christian Hidber. Online association rule mining. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 145–156, 1999.Google Scholar
- 15.Yongqiao Xiao and Margaret H Dunham. Interactive clustering for large databases, July 2000. Available from http://www.seas.smu. edu/~xiao/publication/oak-tr.ps.
- 16.Oren Zamir and Oren Etxioni. Web document clustering: A feasibility demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 46–54, 1998.Google Scholar
- 17.Tian Zhang, Raghu Ramakrishnan, and Miron Livny. Birch: A n eficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103–114, Montreal, Canada, 1996.Google Scholar