Abstract
In this paper, a new clustering algorithm called Dynamic Hierarchical Star is introduced. Our approach aims to construct a hierarchy of overlapped clusters, dealing with dynamic data sets. The experimental results on several benchmark text collections show that this method obtains smaller hierarchies than traditional algorithms while achieving a similar clustering quality. Therefore, we advocate its use for tasks that require dynamic overlapped clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.
Chapter PDF
Similar content being viewed by others
References
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: ACM SIGKDD Workshop on Text Mining, Boston, pp. 109–110 (2000)
Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64, 381–404 (2008)
Wong, W., Wai-chee Fu, A.: Incremental Document Clustering for Web Page Classification. In: IEEE Int. Conf. on Information Society in the 21st Century: Emerging technologies and new challenges, Japan (2000)
Widyantoro, D., Yen, J.: An incremental approach to building a cluster hierarchy. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 705–708 (2002)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Knowledge Discovery and Data Mining, pp. 436–442. ACM Press, Canada (2002)
Maslowska, I.: Phrase-based hierarchical clustering of web search results. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 555–562. Springer, Heidelberg (2003)
Zamir, O., Etziony, O.: Web document clustering: A Feasibility demonstration. In: 21st SIGIR Conference, pp. 46–54. ACM Press, Melbourne (1998)
Gil-GarcĂa, R., BaddĂa-Contelles, J., Pons-Porrata, A.: Dynamic Hierarchical Compact Clustering Algorithm. In: Sanfeliu, A., CortĂ©s, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 302–310. Springer, Heidelberg (2005)
Gil-GarcĂa, R., BadĂa-Contelles, J., Pons-Porrata, A.: Extended Star Clustering Algorithm. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 480–487. Springer, Heidelberg (2003)
Pérez-Súarez, A., Medina-Pagola, J.: A clustering algorithm based on generalized stars. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 248–262. Springer, Heidelberg (2007)
Aslam, J., Pelekhov, K., Rus, D.: Static and Dynamic Information Organization with Star Clusters. In: CIKM 1998, pp. 208–217. ACM Press, Maryland (1998)
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22. ACM Press, San Diego (1999)
Banerjee, A., Krumpelman, C.: Model based overlapping clustering. In: KDD 2005, pp. 532–537. ACM Press, Chicago (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gil-GarcĂa, R., Pons-Porrata, A. (2008). Hierarchical Star Clustering Algorithm for Dynamic Document Collections. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2008. Lecture Notes in Computer Science, vol 5197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85920-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-85920-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85919-2
Online ISBN: 978-3-540-85920-8
eBook Packages: Computer ScienceComputer Science (R0)