Abstract
In this paper we present a partitioning method capable to manage transactions, namelyt uples of variable size of categorical data. We adapt the standard definition of mathematical distance used in the K- Means algorithm to represent dissimilarityam ong transactions, and rede fine the notion of cluster centroid. The cluster centroid is used as the representative of the common properties of cluster elements. We show that using our concept of cluster centroid together with Jaccard distance we obtain results that are comparable in qualityw ith the most used transactional clustering approaches, but substantiallyi mprove their efficiency.
Chapter PDF
Similar content being viewed by others
References
I. Dhillon and D. Modha. Concept Decomposition for Large Sparse Data Using Clustering. Machine Learning, 42:143–175, 2001.
D. Fasulo. An analysis of Recent Work on Clustering Algorithms. Technical report, Universityof Washington, April 1999. Available at http://www.cs.washington.edu/homes/dfasulo.
M. Grötschel and Y. Wakabayashi. A cutting plane algorithm for a clustering problem. Mathematical Programming, 45:59–96, 1989.
S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In 15th International Conference on Data Engineering (ICDE’ 99), pages 512–521, Washington-Brussels-Tokyo, March 1999. IEEE.
J. Han and M. Kamber. Data Mining Techniques. Morgan Kaufman, 2001.
R. Cooley B. Mobasher and J. Srivastava. Grouping Web Page References into Transactions for Mining World Wide Web Browsing Patterns. In Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97), 1997.
L. Schulman. Clustering for Edge-Cost Minimization. In Proceedings of the thirtysecond annual acm symposium on Theory of computing, pages 547–555, Portland, USA, May 2000.
R. Srikant. Fast Algorithms for Mining Association Rules and Sequential Patterns. PhD thesis, Universityof Wisconsin-Madison, 1996.
M. Steinbach, G. Karypis, and V. Kumar. A Comparison of Document Clustering Techniques. In ACM-SIGKDD Workshop on Text Mining, 2000.
A. Strehl, J. Ghosh, and R. Mooney. Impact of Similarity Measures on Web-page Clustering. In K. Bollacker, editor, Proceedings of AAAI workshop on AI for Web Search, pages 58–64. AAAI Press, July 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giannotti, F., Gozzi, C., Manco, G. (2002). Clustering Transactional Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_15
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive