Abstract
Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user’s information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passagebased method is superior to the conventional methods if long documents have to be retrieved by short queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M.A. Hearst, Untangling Text Data Mining, in Proceedings of ACL’99: the 37th AnnualMe eting of the Association for ComputationalLinguistics, 1999.
M. Grobelnik, D. Mladenic and N. Milic-Frayling, Text Mining as Integration of Several Related Research Areas: Report on KDD’2000 Workshop on Text Mining, http://www.cs.cmu.edu/ dunja/WshKDD2000.html.
J.P. Callan, Passage-level evidence in document retrieval, in Proc. SIGIR’ 94, pp.302–310,1994.
G. Salton, A. Singhal and M. Mitra, Automatic text decomposition using text segments and text themes, in Proc. Hypertext’ 96, pp.53–65, 1996.
O. de Kretser and A. Moffat, Effective Document Presentation with a Locality-Based Similarity Heuristic, in Proc. SIGIR’ 99, pp.113–120, 1999.
K. Kise, H. Mizuno, M. Yamaguchi and K. Matsumoto, On the Use of Density Distribution of Keywords for Automated Generation of Hypertext Links from Arbitrary Parts of Documents, in Proc. ICDAR’99, pp.301–304, 1999.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Pub. Co., 1999.
C.D. Manning and H. Schütze, Foundations of StatisticalNatur alL anguage Processing, MIT Press, 1999.
S. Kurohashi, N. Shiraki, and M. Nagao, A Method for Detecting Important Descriptions of a Word Based on Its Density Distribution in Text, Trans. Information Processing Society of Japan, Vol.38, No.4, pp.845–853, 1997 [In Japanese].
D. Hull, Using Statistical Testing in the Evaluation of Retrieval Experiments, in Proc. SIGIR’ 93, pp.329–338, 1993.
Y. Yang and X. Liu, A Re-Examination of Text Categorization Methods, in Proc. SIGIR’ 99, pp.42–49, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kise, K., Junker, M., Dengel, A., Matsumoto, K. (2001). Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_16
Download citation
DOI: https://doi.org/10.1007/3-540-45650-3_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42956-2
Online ISBN: 978-3-540-45650-6
eBook Packages: Springer Book Archive