Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs

Kise, Koichi; Junker, Markus; Dengel, Andreas; Matsumoto, Keinosuke

doi:10.1007/3-540-45650-3_16

Koichi Kise^3,4,
Markus Junker³,
Andreas Dengel³ &
…
Keinosuke Matsumoto⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2226))

Included in the following conference series:

International Conference on Discovery Science

394 Accesses
9 Citations

Abstract

Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user’s information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passagebased method is superior to the conventional methods if long documents have to be retrieved by short queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Hearst, Untangling Text Data Mining, in Proceedings of ACL’99: the 37th AnnualMe eting of the Association for ComputationalLinguistics, 1999.
Google Scholar
M. Grobelnik, D. Mladenic and N. Milic-Frayling, Text Mining as Integration of Several Related Research Areas: Report on KDD’2000 Workshop on Text Mining, http://www.cs.cmu.edu/ dunja/WshKDD2000.html.
J.P. Callan, Passage-level evidence in document retrieval, in Proc. SIGIR’ 94, pp.302–310,1994.
Google Scholar
G. Salton, A. Singhal and M. Mitra, Automatic text decomposition using text segments and text themes, in Proc. Hypertext’ 96, pp.53–65, 1996.
Google Scholar
O. de Kretser and A. Moffat, Effective Document Presentation with a Locality-Based Similarity Heuristic, in Proc. SIGIR’ 99, pp.113–120, 1999.
Google Scholar
K. Kise, H. Mizuno, M. Yamaguchi and K. Matsumoto, On the Use of Density Distribution of Keywords for Automated Generation of Hypertext Links from Arbitrary Parts of Documents, in Proc. ICDAR’99, pp.301–304, 1999.
Google Scholar
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Pub. Co., 1999.
Google Scholar
C.D. Manning and H. Schütze, Foundations of StatisticalNatur alL anguage Processing, MIT Press, 1999.
Google Scholar
S. Kurohashi, N. Shiraki, and M. Nagao, A Method for Detecting Important Descriptions of a Word Based on Its Density Distribution in Text, Trans. Information Processing Society of Japan, Vol.38, No.4, pp.845–853, 1997 [In Japanese].
Google Scholar
D. Hull, Using Statistical Testing in the Evaluation of Retrieval Experiments, in Proc. SIGIR’ 93, pp.329–338, 1993.
Google Scholar
Y. Yang and X. Liu, A Re-Examination of Text Categorization Methods, in Proc. SIGIR’ 99, pp.42–49, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI GmbH), P.O.Box 2080, 67608, Kaiserslautern, Germany
Koichi Kise, Markus Junker & Andreas Dengel
Department of Computer and Systems Sciences Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuencho, Sakai, 599-8531, Osaka, Japan
Koichi Kise & Keinosuke Matsumoto

Authors

Koichi Kise
View author publications
You can also search for this author in PubMed Google Scholar
Markus Junker
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar
Keinosuke Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DFKI GmbH Saarbrücken, 66123, Saarbrücken, Germany
Klaus P. Jantke
Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581, Fukuoka, Japan
Ayumi Shinohara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kise, K., Junker, M., Dengel, A., Matsumoto, K. (2001). Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_16

Download citation

DOI: https://doi.org/10.1007/3-540-45650-3_16
Published: 20 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42956-2
Online ISBN: 978-3-540-45650-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics