Abstract
In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Makoto IWAYAMA, Takenobu TOKUNAGA: A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values. Proceedings of 4th Conference on Applied Natural Language Processing, pp.162–167, 1994.
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley (1989).
Yutaka TANAKA, Kazuaki Wakimoto: Statistical Analysis of Large Volumes of Information. Modern Mathematics Society, 1983.
Makoto IWAYAMA, Takenobu TOKUNAGA: Hierarchical Bayesian Clustering for Automatic Text Classification. Proceedings of IJCAI-95, pp.1322–1327, 1995.
IWAYAMA, TOKUNAGA, SAKURAI: Large-Scale Clustering for Document Search. 3rd Annual Meeting of Institute of Language Processing of Japan (March 1997), pp. 245–248, 1997.
ITOH, KAWABATA: Universal Data Compression Algorithm using Parameter Dispersion and Estimation Amount. 8th Conference on Information Theory and Applied Research, p.239–244, 1985.
Aho, Hopcroft, Ullman: The Design and Analysis of Computer Algorithms. p.54, Addison-Wesley Pub. Co., 1974.
AOKI, MATSUMOTO, HASHIMOTO: Evaluation of Clustering Methods for Large Volumes of Documents. 56th Meeting of the Institute of Information Processing of Japan (first semester, 1998), 3–100, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aoki, K., Matsumoto, K., Hoashi, K., Hashimoto, K. (1999). A Study of Bayesian Clustering of a Document Set Based on GA. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_34
Download citation
DOI: https://doi.org/10.1007/3-540-48873-1_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65907-5
Online ISBN: 978-3-540-48873-6
eBook Packages: Springer Book Archive