Skip to main content

A Semi-automatic System for Knowledge Base Population

  • Conference paper
Knowledge Discovery, Knowlege Engineering and Knowledge Management (IC3K 2009)

Abstract

The typical method for transferring key information from unstructured text to knowledge bases is laborious manual entry, but automated information extraction is still at unacceptable accuracies to replace it. A viable alternative is a user interface that allows correction and validation of assertions proposed by the automated extractor for entry into the knowledge base. In this paper, we discuss our system for semi-automatic database population and how issues arising in content extraction and knowledge base population are addressed. The major contributions are detailing challenges in building a semi-automated tool, classifying expected extraction errors, identifying the gaps in current extraction technology with regard to databasing, and designing and developing the FEEDE system that supports human correction of automated content extractors in order to speed up data entry into knowledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Message Understanding Conference – 6: A Brief History. In: Proc. 16th International Conference on Computational Linguistics (COLING), Ministry of Research, Denmark, Copenhagen, pp. 466–471 (1996)

    Google Scholar 

  2. ACE (Automatic Content Extraction) English Annotation Guidelines for Entities Version 5.6.1 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Entities-Guidelines_v5.6.1.pdf

  3. Vilain, M., Su, J., Lubar, S.: Entity Extraction is a Boring Solved Problem—Or is it? In: HLT-NAACL – Short Papers, pp. 181–184. ACL, Rochester (2007)

    Google Scholar 

  4. Marsh, E., Perzanowsi, D.: MUC-7 Evaluation of IE Technology: Overview of Results (1998), http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc.html

  5. ACE (Automatic Content Extraction) English Annotation Guidelines for Relations Version 5.8.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Relations-Guidelines_v5.8.3.pdf

  6. ACE (Automatic Content Extraction) English Annotation Guidelines for Events Version 5.4.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Events-Guidelines_v5.4.3.pdf

  7. Working Guidelines ACE++ Events (2007) (unpublished Internal Report)

    Google Scholar 

  8. Automatic Content Extraction 2008 Evaluation Plan, http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf

  9. Barclay, C., Boisen, S., Hyde, C., Weischedel, R.: The Hookah Information Extraction System. In: Proc. Workshop on TIPSTER II, pp. 79–82. ACL, Vienna (1996)

    Google Scholar 

  10. Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G., Michalickova, K., Pawson, T., Hogue, C.: PreBIND and Textomy—Mining the Biomedical Literature for Protein-Protein Interactions Using a Support Vector Machine. BMC Bioinformatics 4(11) (2003)

    Google Scholar 

  11. Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES—2005 Standard for the Annotation of Temporal Expressions. Technical Report, MITRE (2005), http://timex2.mitre.org/annotation_guidelines/2005_timex2_standard_v1.1.pdf

  12. Evaluation Scoring Script, v14a (2005), ftp://jaguar.ncsl.nist.gov/ace/resources/ace05-eval-v14a.pl

  13. Harabagiu, S., Bunescu, R., Maiorano, S.: Text and Knowledge Mining for Coreference Resolution. In: Proc. 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 55–62. ACL, Pittsburgh (2001)

    Google Scholar 

  14. NIST 2005 Automatic Content Extraction Evaluation Official Results (2006), http://www.nist.gov/speech/tests/ace/2005/doc/ace05eval_official_results_20060110.html

  15. Frokjaer, E., Hertzum, M., Hornbaek, K.: Measuring Usability: Are Effectiveness, Efficiency, and Satisfaction Really Correlated? In: Proc. ACM CHI 2000 Conference on Human Factors in Computing Systems, pp. 345–352. ACM Press, The Hague (2000)

    Google Scholar 

  16. Haimson, C., Grossman, J.: A GOMSL analysis of semi-automated data entry. In: Proc. ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 61–66. ACM, Pittsburgh (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goldstein-Stewart, J., Winder, R.K. (2011). A Semi-automatic System for Knowledge Base Population. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19032-2_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19031-5

  • Online ISBN: 978-3-642-19032-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics