Skip to main content

An Approach to Microscopic Clustering of Terms and Documents

  • Conference paper
  • First Online:
PRICAI 2002: Trends in Artificial Intelligence (PRICAI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2417))

Included in the following conference series:

Abstract

In this paper, we present an approach to clustering in text-based information retrieval systems. The proposed method generates overlapping clusters, each of which is composed of subsets of associated terms and documents with normalized significance weights. In the paper, we first briefly introduce the probabilistic formulation of our clustering scheme and then show the procedure for cluster generation. We also report some experimental results, where the generated clusters are investigated in the framework of automatic text categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. El-Hamdouchi, A., Willet, P.: Comparison of hierarchic agglomerative clustering methods for document retrieval. The Computer Journal, 32(3) (1989)

    Google Scholar 

  2. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: a cluster-based approach to browsing large document collections. Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1992) (1992) 318–329

    Google Scholar 

  3. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000) (2000) 2008–2015

    Google Scholar 

  4. Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing (1999) 368–377

    Google Scholar 

  5. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001) 269–274

    Google Scholar 

  6. Joachims, T.: A statistical learning model of text classification for support vector machines. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001) (2001) 128–136

    Google Scholar 

  7. Kita, K.: Probabilistic language models. University of Tokyo Press (1999)

    Google Scholar 

  8. Aizawa, A.: A method of cluster-based indexing of textual data. Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002) (2002) (accepted)

    Google Scholar 

  9. Aizawa, A.: A co-evolutionary framework of clustering in information retrieval systems. Proceedings of IEEE Congress on Evolutionary Computation 2002 (2002) 1787–1792

    Google Scholar 

  10. Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S.: Overview of IR tasks. Proceedings of the 1st NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition (NTCIR Workshop 1) (1999) 11–44

    Google Scholar 

  11. Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, K., Asahara, M.: Morphological analysis system ChaSen 2.0.2 users manual. NAIST Technical Report, NAIST-IS-TR99012, Nara Institute of Science and Technology (1999)

    Google Scholar 

  12. Aizawa, A.: Linguistic techniques to improve the performance of automatic text categorization. Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001) (2001) 307–314

    Google Scholar 

  13. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of American Society of Information Science 41 (1990) 391–407

    Article  Google Scholar 

  14. Dhillon, I.S., Modha, D.S.: Concept decomposition for large sparse text data using clustering. Machine Learning, 42:1 (2001) 143–175

    Article  Google Scholar 

  15. Murata, T.: Discovery of Web communities based on the co-occurrence of references. Proceedings of Discovery Science 2000, Lecture Notes in Artificial Intelligence, 1967 (2000) 65–75

    Article  Google Scholar 

  16. Oyanagi, S., Kubota, K., Nakase, A.: Matrix clustering: a new data mining algorithm for CRM. Journal of Information Processing Society of Japan, 42(8) (2001) 2156–2166 (in Japanese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aizawa, A. (2002). An Approach to Microscopic Clustering of Terms and Documents. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_44

Download citation

  • DOI: https://doi.org/10.1007/3-540-45683-X_44

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44038-3

  • Online ISBN: 978-3-540-45683-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics