Abstract
We consider the possibility to use compression algorithms to compute similarity distances in order to solve the clustering problem. We propose an actual hierarchical clustering machine that constructs a binary tree of object dependencies similar to a taxonomy.
Similar content being viewed by others
References
Bennett, C.H., Gacs, P., Li, M., Vitanyi, P.M.B., and Zurek, W., Information Distance, IEEE Trans. Inf. Theory, 1998, vol. 44, no. 4, pp. 1407–1423.
Li, M., Chen, X., Li, X., Ma, B., and Vitanyi, P.M.B., The Similarity Metric, IEEE Trans. Inf. Theory, 2004, vol. 50, no. 12, pp. 3250–3264.
Cilibrasi, R. and Vitanyi, P.M.B., Clustering by Compression, IEEE Trans. Inf. Theory, 2005, vol. 51, no. 4, pp. 1523–1545.
Thaper, N.. Using Compression for Source Based Classification of Text, Master’s Thesis, MIT, 2001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © L.S. Lomakina, V.B. Rodionov, A.S. Surkova, 2012, published in Sistemy Upravleniya i Informatsionnye Tekhnologii, 2012, No. 3, pp. 39–44.
Rights and permissions
About this article
Cite this article
Lomakina, L.S., Rodionov, V.B. & Surkova, A.S. Hierarchical clustering of text documents. Autom Remote Control 75, 1309–1315 (2014). https://doi.org/10.1134/S000511791407011X
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S000511791407011X