Flexible Category Structure for Supporting WWW Retrieval

  • Yoshiaki Takata
  • Kokoro Nakagawa
  • Hiroyuki Seki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1921)


A method for supporting WWW retrieval by constructing a flexible category structure adaptable to the user's search intention is proposed. The method uses categorization viewpoints as a priori knowledge, where a categorization viewpoint is a finite set of consistent category names. A set of documents retrieved by initial keywords is decomposed by categorization viewpoints and each decomposition is scored by clearness or entropy. The user selects an appropriate decomposition by considering the score. The decomposition is recursively performed until a category structure of reasonable size is obtained. Experimental results show that the sets of documents decomposed by the proposed method have higher precision than those decomposed by clustering (K-means). It is also shown that both the scores based on clearness and entropy of the decomposition have relatively high correlation with the precision.


Noun Phrase Category Structure Concept Hierarchy Query Processor Consistent Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anick, P. G. and Tipirneni, S.: The Paraphrase Search Assistant: Terminological Feedback for Iterative Information Seeking, in SIGIR '99, pp.153–159, 1999.Google Scholar
  2. 2.
    Dreilinger, D. and Howe, A. E.: Experiences with Selecting Search Engines using Metasearch, ACM Trans. Information Systems, Vol. 15,No.3, pp.195–222, 1997.CrossRefGoogle Scholar
  3. 3.
    Fishkin, K. and Stone, M. C.: Enhanced Dynamic Queries via Movable Filters, in CHI '95, pp.415–420, 1995.Google Scholar
  4. 4.
    Golovchinsky, G.: Queries? Links? Is there a difference?, in CHI 97, pp.407–414, 1997.Google Scholar
  5. 5.
    Grossman, D. A. and Frieder, O.: Information Retrieval: Algorithms and Heuristics, pp.134–142, Kluwer Academic Publishers, 1998.Google Scholar
  6. 6.
    Harada, M.: Freya version 0.92, 1998,
  7. 7.
    Kawano, H. and Hasegawa, T.: Data Mining Technology for WWW Resource Retrieval, in IPSJ SIG Notes, DBS108, pp.33–40, 1996.Google Scholar
  8. 8.
    Kitani, T., et al.: BMIR-J2-ATest Collection for Evaluation of Japanese Information Retrieval Systems, in IPSJ SIG Notes, DBS114, pp.15–22, 1998.Google Scholar
  9. 9.
    Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Imaichi, O. and Imamura, T.: Japanese MorphologicalAnalysis System ChaSen Manual,Technical ReportNAIST-IS-TR97007, Nara Institute of Science and Technology, 1997.Google Scholar
  10. 10.
    Pirolli, P., Shank, P., Hearst, M. and Diehl, C.: Scatter/Gather Browsing Communicates the Topic Structure of a Very Large Text Collection, in CHI 96, pp.213–220, 1996.Google Scholar
  11. 11.
    Pollitt, A. S.: The key role of classification and indexing in view-based searching, in Proc. 63rd IFLA General Conf., 1997.Google Scholar
  12. 12.
    Robertson, G. G., Card, S. K. and Mackinlay, J. D.: Information Visualization using 3D Interactive Animation, Comm. ACM, Vol. 36,No. 4, pp.57–71, 1993.Google Scholar
  13. 13.
    Salton, G., Singhal, A., Buckley, C. and Mitra, M.: Automatic Text Decomposition Using Text Segments and Text Themes, in Hypertext '96, pp.53–65, 1996.Google Scholar
  14. 14.
    Sanderson, M. and Croft, B.: Deriving concept hierarchies from text, in SIGIR '99, pp.206–213, 1999.Google Scholar
  15. 15.
    Tou, J. T. and Gonzalez, R. C.: Pattern Recognition Principles, pp.89–97, Addison-Wesley, 1974.Google Scholar
  16. 16.
    Voorhees, E. M. and Harman, D. K.: Evaluation Techniques and Measures, in The Seventh Text REtrieval Conference (TREC 7), p.A-1, National Institute of Standards and Technology (NIST), 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Yoshiaki Takata
    • 1
  • Kokoro Nakagawa
    • 1
  • Hiroyuki Seki
    • 1
  1. 1.Graduate School of Information ScienceNara Institute of Science and TechnologyJapan

Personalised recommendations