Abstract
When run, most traditional clustering algorithms require the number of clusters sought to be specified beforehand, and all clustered items to be present. These two, for practical applications very serious shortcomings are overcome by a straightforward sequential clustering algorithm. Its most crucial constituent is a distance measure whose suitable choice is discussed. It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items considered as outliers can be removed. The method’s feasible applicability to text analysis is shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Interested readers may download these datasets (1.3 MB) from http://www.docanalyser.de/cd-clustering-corpora.zip.
References
Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: HLT-NAACL 2006 Workshop on Textgraphs, pp. 73–80. Association for Computational Linguistics, Stroudsburg (2006)
Bock, H.H.: Automatische Klassifkation. Vandenhoeck & Ruprecht, Göttingen (1974)
Breuer, D.: Abstandsmaße für die multivariate adaptive Einbettung. MSc Thesis, Fernuniversität in Hagen (2014)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)
Estivill-Castro, V.: Why so many clustering algorithms - a position paper. ACM SIGKDD Explor. Newsl. 4(1), 65–75 (2002)
Kubek, M., Unger, H.: Centroid terms as text representatives. In: ACM Symposium on Document Engineering, pp. 99–102. ACM (2016)
Quasthoff, U., Wolff, C.: The Poisson collocation measure and its applications. In: 2nd International Workshop on Computational Approaches to Collocations, Vienna. IEEE (2002)
Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data structures and Algorithms, pp. 419–442. Prentice-Hall, Upper Saddle River (1992)
Schnell, P.: Eine Methode zur Auffindung von Gruppen. Biometrische Zeitschrift 6, 47–48 (1964)
Acknowledgement
This work was supported by Rajamangala University of Technology Phra Nakhon.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Komkhao, M., Kubek, M., Halang, W.A. (2018). Sequentially Grouping Items into Clusters of Unspecified Number. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-60663-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60662-0
Online ISBN: 978-3-319-60663-7
eBook Packages: EngineeringEngineering (R0)