© 2006

Information Extraction: Algorithms and Prospects in a Retrieval Context


  • Comprehensive overview of current and past technology for information extraction

  • Innovative integration of information extraction and retrieval

  • Novel avenues for research and algorithmic development

  • Focus on applicability of the techniques in many domains


About this book


Information extraction regards the processes of structuring and combining content that is explicitly stated or implied in one or multiple unstructured information sources. It involves a semantic classification and linking of certain pieces of information and is considered as a light form of content understanding by the machine. Currently, there is a considerable interest in integrating the results of information extraction in retrieval systems, because of the growing demand for search engines that return precise answers to flexible information queries. Advanced retrieval models satisfy that need and they rely on tools that automatically build a probabilistic model of the content of a (multi-media) document.

The book focuses on content recognition in text. It elaborates on the past and current most successful algorithms and their application in a variety of domains (e.g., news filtering, mining of biomedical text, intelligence gathering, competitive intelligence, legal information searching, and processing of informal text). An important part discusses current statistical and machine learning algorithms for information detection and classification and integrates their results in probabilistic retrieval models. The book also reveals a number of ideas towards an advanced understanding and synthesis of textual content.

The book is aimed at researchers and software developers interested in information extraction and retrieval, but the many illustrations and real world examples make it also suitable as a handbook for students.


Berck DOM Hidden Markov Model Information Technology Performance algorithms classification cognition filtering information processing intelligence learning machine learning

Authors and affiliations

  1. 1.Katholieke UniversiteitBelgium

Bibliographic information


From the reviews:

"Information Extraction (IE) and Information Retrieval (IR) are core enabling technologies … . In this text, Moens brings these two techniques together to illustrate how information derived using IE could be highly beneficial in IR systems. … the text is highly readable and aimed at both practitioners and researchers … . One trait that I offer particular praise to the author for is the pragmatic presentation of ideas. … the text should be beneficial both to seasoned professionals in this area and relative newcomers." (Tom Betts, Informer, Winter 2006/2007)

"After definition and explanation of the basic concepts and description of the historical development of the area, the past and current most successful algorithms and their application in a variety of domains are discussed. Especially important is the explanation of statistical and machine learning algorithms for information detection and classification and integration of their results in probabilistic retrieval models. … Because its broad coverage and clear and sound explanation it is suitable and valuable both for researchers and for students." (Antonín Ríha, Zentralblatt MATH, Vol. 1108 (10), 2007)

"This book … provide a comprehensive overview of text-extraction algorithms. It does well in … explaining the intricacies of the basic approaches and concepts used. … for advanced undergraduate students, graduate students, researchers, and people working in the field, the book is a good starting point for learning the basics. … I would recommend the book for those who need to get into … the field. … the book is one that should be on your must-read list if you are involved in this field." (Karthik Gajjala, ACM Computing Reviews, Vol. 49 (2), February, 2008)