The input for the overall content management process is product catalogs in various formats and without much standardization. Therefore, the information must first be extracted from various information sources and moulded into a structured and well-defined form. Currently, most of this work is done manually. However, there is a strong need to automate the extraction of information from unstructured and semi-structured multi-media information in order to improve the overall productivity of the content management process.
KeywordsInformation Extraction Product Schema Word Sense Extraction Rule Extraction Pattern
Unable to display preview. Download preview PDF.