Advertisement

Information Access Based on Associative Calculation

  • Akihiko Takano
  • Yoshiki Niwa
  • Shingo Nishioka
  • Makoto Iwayama
  • Toru Hisamitsu
  • Osamu Imaichi
  • Hirofumi Sakurai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1963)

Abstract

The statistical measures for similarity have been widely used in textual information retrieval for many decades. They are the basis to improve the effectiveness ofIR systems, including retrieval, clustering, and summarization. We have developed an information retrieval system DualNAVI which provides users with rich interaction both in document space and in word space. We show that associative calculation for measuring similarity among documents or words is the computational basis oft his effective information access with DualNAVI. The new approaches in document clustering (Hierarchical Bayesian Clustering), and measuring term representativeness (Baseline method) are also discussed. Both have sound mathematical basis and depend essentially on associative calculation.

Keywords

Information Retrieval Information Access Document Cluster Topic Word Dual View 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. R. Anderberg. Cluster Analysis for Applications. Academic Press, 1973. 194, 195, 197Google Scholar
  2. 2.
    D. Butler. Souped-up search engines. Nature, 405, pages 112–115, 2000. 188CrossRefGoogle Scholar
  3. 3.
    K. W. Church, and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), pages 22–29, 1990. 198Google Scholar
  4. 4.
    R. M. Cormack. A review of classification. Journal of the Royal Statistical Society, 134:321–367, 1971. 194, 195, 197MathSciNetGoogle Scholar
  5. 5.
    W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189–195, 1980. 193, 194CrossRefGoogle Scholar
  6. 6.
    W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6):451–457, 1981. 194CrossRefGoogle Scholar
  7. 7.
    T. Dunning. Accurate method for the statistics of surprise and coincidence. Computational Linguistics, 19(1), pages 61–74, 1993. 198Google Scholar
  8. 8.
    R. H. Fowler, and D. W. Dearholt. Information Retrieval Using Pathfinder Networks, chapter 12, pages 165–178, 1990. Ablex.Google Scholar
  9. 9.
    N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Retrieval, 25(1):55–72, 1989. 194MathSciNetGoogle Scholar
  10. 10.
    A. Griffiths, L. A. Robinson, and P. Willett. Hierarchic agglomerative clustering methods for automatic document classification. Journal of Documentation, 40(3):175–205, 1984. 194, 195, 197CrossRefGoogle Scholar
  11. 11.
    M. A. Hearst, and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of ACM SIGIR’96, pages 76–84, 1996.Google Scholar
  12. 12.
    T. Hisamitsu, Y. Niwa, and J. Tsujii. Measuring Representativeness of Terms. In Proceedings of IRAL’99, pages 83–90, 1999. 197, 198Google Scholar
  13. 13.
    T. Hisamitsu, Y. Niwa, and J. Tsujii. A Method of Measuring Term Representativeness. In Proceedings of COLING 2000, pages 320–326, 2000. 193, 197, 198Google Scholar
  14. 14.
    M. Iwayama and T. Tokunaga. Hierarchical Bayesian Clustering for Automatic Text Classification. In Proceedings of IJCAI’95, pages 1322–1327, 1995. 194, 195Google Scholar
  15. 15.
    N. Jardine and C. J. Van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7:217–240, 1971. 193CrossRefGoogle Scholar
  16. 16.
    K. L. Kwok. Experiments with a component theory ofp robabilistic information retrieval based on single terms as document components. ACM Transactions on Information Systems, 8(4):363–386, 1990. 194CrossRefMathSciNetGoogle Scholar
  17. 17.
    D. D. Lewis. An evaluation ofp hrasal and clustered representation on a text categorization task. In Proceedings of ACM SIGIR’92, pages 37–50, 1992. 194Google Scholar
  18. 18.
    M. Nagao, M. Mizutani, and H. Ikeda. An automated method of the extraction of important words from Japanese scientific documents. In Transaction of IPSJ, 17(2), pages 110–117, 1976. 198Google Scholar
  19. 19.
    S. Nishioka, Y. Niwa, M. Iwayama, and A. Takano. DualNAVI: An information retrieval interface. In Proceedings of JSSST WISS’97, pages 43–48, 1997. (in Japanese). 188Google Scholar
  20. 20.
    Y. Niwa, S. Nishioka, M. Iwayama, and A. Takano. Topic graph generation for query navigation: Use of frequency classes for topic extraction. In Proceedings of NLPRS’97, pages 95–100, 1997. 190Google Scholar
  21. 21.
    Y. Niwa, M. Iwayama, T. Hisamitsu, S. Nishioka, A. Takano, H. Sakurai, and O. Imaichi. Interactive Document Search with DualNAVI. In Proceedings of NTCIR’99, pages 123–130, 1999. 188, 189Google Scholar
  22. 22.
    H. Sakurai, and T. Hisamitsu. A data structure for fast lookup of grammatically connectable word pairs in japanese morphological analysis. In Proceedings of ICCPOL’99, pages 467–471, 1999.Google Scholar
  23. 23.
    G. Salton, and C. S. Yang. On the Specification of Term Values in Automatic Indexing. Journal of Documentation, 29(4):351–372, 1973. 198CrossRefGoogle Scholar
  24. 24.
    B. R. Schatz, E. H. Johnson, and P. A. Cochrane. Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In Proceedings of ACM DL’96, pages 126–133, 1996.Google Scholar
  25. 25.
    A. Singhal, C. Buckley, and M. Mitra. Pivoted Document Length Normalization In Proceedings of ACM SIGIR’96, pages 21–29, 1996. 192Google Scholar
  26. 26.
    C. J. van Rijsbergen and W. B. Croft. Document clustering: An evaluation of some experiments with the granfield 1400 collection. Information Processing & Management, 11:171–182, 1975. 193CrossRefGoogle Scholar
  27. 27.
    P. Willett. Similarity coefficients and weighting functions for automatic document classification: an empirical comparison. International Classification, 10(3):138–142, 1983. 193Google Scholar
  28. 28.
    P. Willett. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577–597, 1988. 194, 195CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Akihiko Takano
    • 1
  • Yoshiki Niwa
    • 1
  • Shingo Nishioka
    • 1
  • Makoto Iwayama
    • 1
  • Toru Hisamitsu
    • 1
  • Osamu Imaichi
    • 1
  • Hirofumi Sakurai
    • 1
  1. 1.Central Research LaboratoryHitachi, Ltd.Hatoyama, SaitamaJapan

Personalised recommendations