This book explains how can be created information extraction (IE) applications that are able to tap the vast amount of relevant information available in natural language sources: Internet pages, official documents such as laws and regulations, books and newspapers, and social web. Readers are introduced to the problem of IE and its current challenges and limitations, supported with examples. The book discusses the need to fill the gap between documents, data, and people, and provides a broad overview of the technology supporting IE. The authors present a generic architecture for developing systems that are able to learn how to extract relevant information from natural language documents, and illustrate how to implement working systems using state-of-the-art and freely available software tools. The book also discusses concrete applications illustrating IE uses.
· Provides an overview of state-of-the-art technology in information extraction (IE), discussing achievements and limitations for the software developer and providing references for specialized literature in the area
· Presents a comprehensive list of freely available, high quality software for several subtasks of IE and for several natural languages
· Describes a generic architecture that can learn how to extract information for a given application domain