Advertisement

Document Classification and Information Extraction

  • Qianhong Liu
  • Peter A. Ng

Abstract

In Chapter 4 and 5, we turn our attention to the techniques used for document classification and information extraction [60, 61, 62, 174, 175]. In TEXPROS, the task of document classification is to determine the types of the office documents. That is, given an office document, the document classification subsystem identifies the corresponding frame template of the document. By identifying the defined type of the documents, it is possible to implement efficient storage and access methods to enhance the performance of retrieval. The task of information extraction is extracting from the contents of the document the most relevant information pertinent to the user. That is, given an office document, the information extraction subsystem forms its frame instance by instantiating its corresponding frame template. The document classification and information extraction can be achieved in aid of analyzing the document structures.

Keywords

Noun Phrase Information Extraction Conceptual Structure Edit Distance Content Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Qianhong Liu
    • 1
  • Peter A. Ng
    • 1
  1. 1.New Jersey Institute of TechnologyNewarkUSA

Personalised recommendations