Abstract
Many documents contain, besides text, also images, tables, and so on. This chapter concentrates on the text part only. Traditionally, systems handling text documents are called information storage and retrieval systems. Before the World-Wide Web emerged, such systems were almost exclusively used by professional users, so-called indexers and searchers, e.g., for medical research, in libraries, by governmental organizations and archives. Typically, professional users act as “search intermediaries” for end users. They try to fig out in an interactive dialogue with the system and the end user what it is the end user needs, and how this information should be used in a successful search. Professionals know the collection, they know how documents in the collection are represented in the system, and they know how to use Boolean search operators to control the number of retrieved documents.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 222–229, 1999.
H.M. Blanken, T. Grabs, H.-J. Schek, and G. Weikum, editors. Intelligent Search on XML data: Applications, Languages, Models, Implementations, and Benchmarks, volume 2818. Springer: LNCS series, 2003.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30:107–117, 1998.
G.G. Chowdhury. Introduction to modern information retrieval. Wiley, 1998.
W.B. Croft and D.J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285–295, 1979.
N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243–255, 1992.
N. Fuhr. Probabilistic datalog: A logic for powerful retrieval methods. In Proceedings of the 18th ACM Conference on Research and Development in Information Retrieval (SIGIR’95), pages 282–290, 1995.
W.R. Greiff, W.B. Croft, and H.R. Turtle. Computationally tractable probabilistic modeling of boolean operators. In Proceedings of the 20th ACM Conference on Research and Development in Information Retrieval (SIGIR’97), pages 119–128, 1997.
D.E. Heckerman. Probabilistic Similarity Networks. MIT Press, 1991.
D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL), pages 569–584, 1998.
D. Hiemstra and A.P. de Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, 2000. http://www.ub.utwente.nl/webdocs/ctit/1/00000022.pdf.
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and crosslanguage track. In Proceedings of the seventh Text Retrieval Conference TREC-7, pages 227–238. NIST Special Publication 500-242, 1999.
M.I. Jordan, editor. Learning in Graphical Models. Kluwer Academic Press, 1998.
G. Kowalski. Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers, 1997.
D.E. Losada and A. Barreiro. Using a belief revision operator for document ranking in extended boolean models. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 66–73, 1999.
H.P. Luhn. A statistical approach to mechanised encoding and searching of litary information. IBM Journal of Research and Development, 1(4):309–317, 1957.
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
D.R.H. Miller, T. Leek, and R.M. Schwartz. A hidden Markov model information retrieval system. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 214–221, 1999.
A.M. Mood and F.A. Graybill. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference, TREC-8. NIST Special Publications, to appear.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st ACM Conference on Research and Development in Information Retrieval (SIGIR’98), pages 275–281, 1998.
M.F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 1980.
B.A.N. Ribeiro and R. Muntz. A belief network model for ir. In Proceedings of the 19th ACM Conference on Research and Development in Information Retrieval (SIGIR’96), pages 252–260, 1996.
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.
J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice Hall, 1971.
G. Salton. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall, 1971.
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.
G. Salton, E.A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022–1036, 1983.
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
G. Salton and C.S. Yang. On the specification of term values in automatic indexing. Jounral of Documentation, 29(4):351–372, 1973.
P. Savino and F. Sebastiani. Essential bibliography on multimedia information retrieval, categorisation and filtering. In Slides of the 2nd European Digital Libraries Conference Tutorial on Multimedia Information Retrieval, 1998.
F. Sebastiani. A probabilistic terminological logic for modelling information retrieval. In Proceedings of the 17th ACM Conference on Research and Development in Information Retrieval (SIGIR’94), pages 122–130, 1994.
C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.
K. Sparck-Jones. A statistical interpretation of term specifity and its application in retrieval. Journal of Documentation, 28(1):11–20, 1972.
H. Turtle and W.B. Croft. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222, 1991.
H.R. Turtle. Inference Networks for Document Retrieval. PhD thesis, Centre for Intelligent Information Retrieval, University of Massachusetts Amherst, 1991.
H.R. Turtle and W.B. Croft. A comparison of text retrieval models. The Computer Journal, 35(3):279–290, 1992.
C.J. van Rijsbergen. Information Retrieval, second edition. Butterworths, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html.
C.J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481–485, 1986.
I.H. Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
I.H. Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Indexing. Morgan Kaufmann, 1999.
S.K.M. Wong and Y.Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13:38–68, 1995.
L.A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Blanken, H., Hiemstra, D. (2007). Searching for Text Documents. In: Blanken, H.M., Blok, H.E., Feng, L., de Vries, A.P. (eds) Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72895-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-72895-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72894-8
Online ISBN: 978-3-540-72895-5
eBook Packages: Computer ScienceComputer Science (R0)