A Knowledge-Based Information Extraction System for Semi-structured Labeled Documents
This paper presents a scheme of knowledge-based wrapper generation for semi-structured and labeled documents. The implementation of an agent-oriented information extraction system, XTROS, is described. In contrast with previous wrapper learning agents, XTROS represents both the domain knowledge and the wrappers by XML documents to increase modularity, flexibility, and interoperability. XTROS shows good performance on several Web sites in the domain of real estate, and it is expected to be easily adaptable to different domains by plugging in appropriate XML-based domain knowledge.
KeywordsReal Estate Domain Knowledge Extraction Rule Logical Line Label Document
Unable to display preview. Download preview PDF.
- 1.Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the world wide web. Proceedings of the First International Conference on Autonomous Agents. (1997) 39–48Google Scholar
- 2.Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M., Vassalos, V.: Template-based wrappers in the TSIMMIS system. Proceedings of the ACM SIGMOD International Conference on Management of Data. (1997) 532–535Google Scholar
- 3.Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. Proceedings of the International Joint Conference on Artificial Intelligence. (1997) 729–735Google Scholar
- 5.Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. Proceedings of the Third International Conference on Autonomous Agents. (1999) 190–197Google Scholar
- 6.Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a conceptual dictionary. Proceedings of the 15th International Conference on Artificial Intelligence. (1995) 1314–1321Google Scholar