A Study of Bayesian Clustering of a Document Set Based on GA

Aoki, Keiko; Matsumoto, Kazunori; Hoashi, Keiichiro; Hashimoto, Kazuo

doi:10.1007/3-540-48873-1_34

Keiko Aoki⁴,
Kazunori Matsumoto⁴,
Keiichiro Hoashi⁴ &
…
Kazuo Hashimoto⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1585))

Included in the following conference series:

Asia-Pacific Conference on Simulated Evolution and Learning

940 Accesses

Abstract

In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Makoto IWAYAMA, Takenobu TOKUNAGA: A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values. Proceedings of 4th Conference on Applied Natural Language Processing, pp.162–167, 1994.
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley (1989).
Google Scholar
Yutaka TANAKA, Kazuaki Wakimoto: Statistical Analysis of Large Volumes of Information. Modern Mathematics Society, 1983.
Google Scholar
Makoto IWAYAMA, Takenobu TOKUNAGA: Hierarchical Bayesian Clustering for Automatic Text Classification. Proceedings of IJCAI-95, pp.1322–1327, 1995.
Google Scholar
IWAYAMA, TOKUNAGA, SAKURAI: Large-Scale Clustering for Document Search. 3rd Annual Meeting of Institute of Language Processing of Japan (March 1997), pp. 245–248, 1997.
Google Scholar
ITOH, KAWABATA: Universal Data Compression Algorithm using Parameter Dispersion and Estimation Amount. 8th Conference on Information Theory and Applied Research, p.239–244, 1985.
Google Scholar
Aho, Hopcroft, Ullman: The Design and Analysis of Computer Algorithms. p.54, Addison-Wesley Pub. Co., 1974.
Google Scholar
AOKI, MATSUMOTO, HASHIMOTO: Evaluation of Clustering Methods for Large Volumes of Documents. 56th Meeting of the Institute of Information Processing of Japan (first semester, 1998), 3–100, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

KDD Laboratories Inc., 2-1-15 Ohara Kamifukuoka-Shi, Saitama, 356-8502, Japan
Keiko Aoki, Kazunori Matsumoto, Keiichiro Hoashi & Kazuo Hashimoto

Authors

Keiko Aoki
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Keiichiro Hoashi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Hashimoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University College, UNSW Australian Defence Force Academy, Canberra, ACT, Australia, 2600
Bob McKay , Xin Yao & Charles S. Newton , &
Department of Electrical Engineering Korea Advanced Institute of Science and Technology, 373-1, Kusung-dong, Yusung-gu, Taejon-shi, 305-701, Korea
Jong-Hwan Kim
Department of Information Electronics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
Takeshi Furuhashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aoki, K., Matsumoto, K., Hoashi, K., Hashimoto, K. (1999). A Study of Bayesian Clustering of a Document Set Based on GA. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_34

Download citation

DOI: https://doi.org/10.1007/3-540-48873-1_34
Published: 21 May 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65907-5
Online ISBN: 978-3-540-48873-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics