Sequentially Grouping Items into Clusters of Unspecified Number

Komkhao, Maytiyanin; Kubek, Mario; Halang, Wolfgang A.

doi:10.1007/978-3-319-60663-7_28

Maytiyanin Komkhao¹⁷,
Mario Kubek¹⁸ &
Wolfgang A. Halang¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 566))

Included in the following conference series:

International Conference on Computing and Information Technology

788 Accesses
2 Citations

Abstract

When run, most traditional clustering algorithms require the number of clusters sought to be specified beforehand, and all clustered items to be present. These two, for practical applications very serious shortcomings are overcome by a straightforward sequential clustering algorithm. Its most crucial constituent is a distance measure whose suitable choice is discussed. It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items considered as outliers can be removed. The method’s feasible applicability to text analysis is shown.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Interested readers may download these datasets (1.3 MB) from http://www.docanalyser.de/cd-clustering-corpora.zip.

References

Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: HLT-NAACL 2006 Workshop on Textgraphs, pp. 73–80. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Bock, H.H.: Automatische Klassifkation. Vandenhoeck & Ruprecht, Göttingen (1974)
Google Scholar
Breuer, D.: Abstandsmaße für die multivariate adaptive Einbettung. MSc Thesis, Fernuniversität in Hagen (2014)
Google Scholar
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Article MathSciNet MATH Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)
Google Scholar
Estivill-Castro, V.: Why so many clustering algorithms - a position paper. ACM SIGKDD Explor. Newsl. 4(1), 65–75 (2002)
Article MathSciNet Google Scholar
Kubek, M., Unger, H.: Centroid terms as text representatives. In: ACM Symposium on Document Engineering, pp. 99–102. ACM (2016)
Google Scholar
Quasthoff, U., Wolff, C.: The Poisson collocation measure and its applications. In: 2nd International Workshop on Computational Approaches to Collocations, Vienna. IEEE (2002)
Google Scholar
Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data structures and Algorithms, pp. 419–442. Prentice-Hall, Upper Saddle River (1992)
Google Scholar
Schnell, P.: Eine Methode zur Auffindung von Gruppen. Biometrische Zeitschrift 6, 47–48 (1964)
Google Scholar

Download references

Acknowledgement

This work was supported by Rajamangala University of Technology Phra Nakhon.

Author information

Authors and Affiliations

Faculty of Science and Technology, Rajamangala University of Technology Phra Nakhon, Bangkok, Thailand
Maytiyanin Komkhao
Faculty of Mathematics and Computer Science, Fernuniversität in Hagen, Hagen, Germany
Mario Kubek & Wolfgang A. Halang

Authors

Maytiyanin Komkhao
View author publications
You can also search for this author in PubMed Google Scholar
Mario Kubek
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang A. Halang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang A. Halang .

Editor information

Editors and Affiliations

Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
Phayung Meesad
Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
Sunantha Sodsee
Lehrgebiet Kommunikationsnetze, FernUniversität in Hagen, Hagen, Germany
Herwig Unger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komkhao, M., Kubek, M., Halang, W.A. (2018). Sequentially Grouping Items into Clusters of Unspecified Number. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-60663-7_28
Published: 20 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60662-0
Online ISBN: 978-3-319-60663-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics