Advertisement

A Knowledge-Based Information Extraction System for Semi-structured Labeled Documents

  • Jaeyoung Yang
  • Heekuck Oh
  • Kyung-Goo Doh
  • Joongmin Choi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2412)

Abstract

This paper presents a scheme of knowledge-based wrapper generation for semi-structured and labeled documents. The implementation of an agent-oriented information extraction system, XTROS, is described. In contrast with previous wrapper learning agents, XTROS represents both the domain knowledge and the wrappers by XML documents to increase modularity, flexibility, and interoperability. XTROS shows good performance on several Web sites in the domain of real estate, and it is expected to be easily adaptable to different domains by plugging in appropriate XML-based domain knowledge.

Keywords

Real Estate Domain Knowledge Extraction Rule Logical Line Label Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the world wide web. Proceedings of the First International Conference on Autonomous Agents. (1997) 39–48Google Scholar
  2. 2.
    Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M., Vassalos, V.: Template-based wrappers in the TSIMMIS system. Proceedings of the ACM SIGMOD International Conference on Management of Data. (1997) 532–535Google Scholar
  3. 3.
    Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. Proceedings of the International Joint Conference on Artificial Intelligence. (1997) 729–735Google Scholar
  4. 4.
    Kushmerick, N.: Gleaning the web. IEEE Intelligent Systems. 14 (1999) 20–22CrossRefGoogle Scholar
  5. 5.
    Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. Proceedings of the Third International Conference on Autonomous Agents. (1999) 190–197Google Scholar
  6. 6.
    Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a conceptual dictionary. Proceedings of the 15th International Conference on Artificial Intelligence. (1995) 1314–1321Google Scholar
  7. 7.
    Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning. 34 (1999) 233–272zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jaeyoung Yang
    • 1
  • Heekuck Oh
    • 1
  • Kyung-Goo Doh
    • 1
  • Joongmin Choi
    • 1
  1. 1.Department of Computer Science and EngineeringHanyang UniversityKyunggi-DoKorea

Personalised recommendations