Advertisement

Interactive Clustering for Transaction Data

  • Yongqiao Xiao
  • Margaret H. Dunham
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2114)

Abstract

We propose a clustering algorithm, OAK, targeted to transaction data as typified by market basket data, web documents, and categorical data. OAK is interactive, incremental, and scalable. Use of a dendrogram facilitates the dynamic modification of the number of clusters. In addition, a condensation technique ensures that the dendrogram (regardless of database size) can be memory resident. A performance study shows that the quality of clusters is comparable to ROCK [7] with reduced complexity.

Keywords

Cluster Algorithm Association Rule Transaction Data Interactive Cluster Cluster Profile 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 94–105, 1998.Google Scholar
  2. 2.
    Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the Twentieth International Conference on Very Large Databases, pages 487–499, Santiago, Chile, 1994.Google Scholar
  3. 3.
    Paul Bradlay, Usama Fayyad, and Cory Reina. Scaling clustering algortihms to large databases. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 9–15, 1998.Google Scholar
  4. 4.
    Thomas H. Cormen, Charles E. Lerserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.Google Scholar
  5. 5.
    D. Defays. An efficient algorithm for a complete link method. The Computer Journal, 20(4):364–366, 1977.CrossRefMathSciNetzbMATHGoogle Scholar
  6. 6.
    Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. CACTUS —-clustering categorical data using summaries. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99), pages 73–83, 1999.Google Scholar
  7. 7.
    Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of the 1999 IEEE International Conference on Data Engineering, pages 512–521, 1999.Google Scholar
  8. 8.
    Eui-Hong Han, George Karypis, Vipin Kumar, and Bamshad Mobasher. Clustering based on association rule hypergraphs. In 1997 SIGMOD Workshop on Researhc Issues on Data Mining and Knowledge Discovery, 1997.Google Scholar
  9. 9.
    Marti A. Hearst and Jan O. Pedersen. Reexamining the cluster hypothesis: Scatterlgather on retrieval results. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 76–84, 1996.Google Scholar
  10. 10.
    Christian Hidber. Online association rule mining. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 145–156, 1999.Google Scholar
  11. 11.
    Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.zbMATHGoogle Scholar
  12. 12.
    F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354–359, 1983.zbMATHGoogle Scholar
  13. 13.
    G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Co., McGraw-Hill, New York, 1983.zbMATHGoogle Scholar
  14. 14.
    R. Sibson. Slink: An optimally efficient algorithm for the single link cluster method. The Computer Journal, l6(1):30–34, 1973.CrossRefMathSciNetGoogle Scholar
  15. 15.
    Yongqiao Xiao and Margaret H Dunham. Interactive clustering for large databases, July 2000. Available from http://www.seas.smu. edu/~xiao/publication/oak-tr.ps.
  16. 16.
    Oren Zamir and Oren Etxioni. Web document clustering: A feasibility demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 46–54, 1998.Google Scholar
  17. 17.
    Tian Zhang, Raghu Ramakrishnan, and Miron Livny. Birch: A n eficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103–114, Montreal, Canada, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Yongqiao Xiao
    • 1
  • Margaret H. Dunham
    • 1
  1. 1.Department of Computer Science and EngineeringSouthern Methodist UniversityDallas

Personalised recommendations