Skip to main content

Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs

  • Conference paper
  • First Online:
Book cover Discovery Science (DS 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2226))

Included in the following conference series:

Abstract

Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user’s information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passagebased method is superior to the conventional methods if long documents have to be retrieved by short queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M.A. Hearst, Untangling Text Data Mining, in Proceedings of ACL’99: the 37th AnnualMe eting of the Association for ComputationalLinguistics, 1999.

    Google Scholar 

  2. M. Grobelnik, D. Mladenic and N. Milic-Frayling, Text Mining as Integration of Several Related Research Areas: Report on KDD’2000 Workshop on Text Mining, http://www.cs.cmu.edu/ dunja/WshKDD2000.html.

  3. J.P. Callan, Passage-level evidence in document retrieval, in Proc. SIGIR’ 94, pp.302–310,1994.

    Google Scholar 

  4. G. Salton, A. Singhal and M. Mitra, Automatic text decomposition using text segments and text themes, in Proc. Hypertext’ 96, pp.53–65, 1996.

    Google Scholar 

  5. O. de Kretser and A. Moffat, Effective Document Presentation with a Locality-Based Similarity Heuristic, in Proc. SIGIR’ 99, pp.113–120, 1999.

    Google Scholar 

  6. K. Kise, H. Mizuno, M. Yamaguchi and K. Matsumoto, On the Use of Density Distribution of Keywords for Automated Generation of Hypertext Links from Arbitrary Parts of Documents, in Proc. ICDAR’99, pp.301–304, 1999.

    Google Scholar 

  7. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Pub. Co., 1999.

    Google Scholar 

  8. C.D. Manning and H. Schütze, Foundations of StatisticalNatur alL anguage Processing, MIT Press, 1999.

    Google Scholar 

  9. S. Kurohashi, N. Shiraki, and M. Nagao, A Method for Detecting Important Descriptions of a Word Based on Its Density Distribution in Text, Trans. Information Processing Society of Japan, Vol.38, No.4, pp.845–853, 1997 [In Japanese].

    Google Scholar 

  10. D. Hull, Using Statistical Testing in the Evaluation of Retrieval Experiments, in Proc. SIGIR’ 93, pp.329–338, 1993.

    Google Scholar 

  11. Y. Yang and X. Liu, A Re-Examination of Text Categorization Methods, in Proc. SIGIR’ 99, pp.42–49, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kise, K., Junker, M., Dengel, A., Matsumoto, K. (2001). Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45650-3_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42956-2

  • Online ISBN: 978-3-540-45650-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics