Skip to main content

A Study of Bayesian Clustering of a Document Set Based on GA

  • Conference paper
  • First Online:
Book cover Simulated Evolution and Learning (SEAL 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1585))

Included in the following conference series:

  • 940 Accesses

Abstract

In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Makoto IWAYAMA, Takenobu TOKUNAGA: A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values. Proceedings of 4th Conference on Applied Natural Language Processing, pp.162–167, 1994.

    Google Scholar 

  2. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley (1989).

    Google Scholar 

  3. Yutaka TANAKA, Kazuaki Wakimoto: Statistical Analysis of Large Volumes of Information. Modern Mathematics Society, 1983.

    Google Scholar 

  4. Makoto IWAYAMA, Takenobu TOKUNAGA: Hierarchical Bayesian Clustering for Automatic Text Classification. Proceedings of IJCAI-95, pp.1322–1327, 1995.

    Google Scholar 

  5. IWAYAMA, TOKUNAGA, SAKURAI: Large-Scale Clustering for Document Search. 3rd Annual Meeting of Institute of Language Processing of Japan (March 1997), pp. 245–248, 1997.

    Google Scholar 

  6. ITOH, KAWABATA: Universal Data Compression Algorithm using Parameter Dispersion and Estimation Amount. 8th Conference on Information Theory and Applied Research, p.239–244, 1985.

    Google Scholar 

  7. Aho, Hopcroft, Ullman: The Design and Analysis of Computer Algorithms. p.54, Addison-Wesley Pub. Co., 1974.

    Google Scholar 

  8. AOKI, MATSUMOTO, HASHIMOTO: Evaluation of Clustering Methods for Large Volumes of Documents. 56th Meeting of the Institute of Information Processing of Japan (first semester, 1998), 3–100, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aoki, K., Matsumoto, K., Hoashi, K., Hashimoto, K. (1999). A Study of Bayesian Clustering of a Document Set Based on GA. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_34

Download citation

  • DOI: https://doi.org/10.1007/3-540-48873-1_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65907-5

  • Online ISBN: 978-3-540-48873-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics