Skip to main content

Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5723))

Abstract

The main contribution of this paper is a systematic analysis of a minimally supervised machine learning method for relation extraction grammars. The method is based on a bootstrapping approach in which the bootstrapping is triggered by semantic seeds. The starting point of our analysis is the pattern-learning graph which is a subgraph of the bipartite graph representing all connections between linguistic patterns and relation instances exhibited by the data. It is shown that the performance of such general learning framework for actual tasks is dependent on certain properties of the data and on the selection of seeds. Several experiments have been conducted to gain explanatory insights into the interaction of these two factors. From the investigation of more effective seeds and benevolent data we understand how to improve the learning in less fortunate configurations. A relation extraction method only based on positive examples cannot avoid all false positives, especially when the data properties yield a high recall. Therefore, negative seeds are employed to learn negative patterns, which boost precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 360–367 (2002)

    Google Scholar 

  2. Abney, S.: Understanding the Yarowsky algorithm. Computational Linguistics 30(3), 365–395 (2004)

    Article  MathSciNet  Google Scholar 

  3. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL 2000), San Antonio, TX (June 2000)

    Google Scholar 

  4. Blohm, S., Cimiano, P.: Using the Web to Reduce Data Sparseness in Pattern-based Information Extraction. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 18–29. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled sata with co-training. In: COLT, pp. 92–100 (1998)

    Google Scholar 

  6. Brin, S.: Extracting patterns and relations from the world wide web. In: WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT 1998 (1998)

    Google Scholar 

  7. Greenwood, M.A., Stevenson, M.: Improving semi-supervised acquisition of relation extraction patterns. In: Proceedings of the Workshop on Information Extraction Beyond The Document, Sydney, Australia, July 2006, pp. 29–35. Association for Computational Linguistics (2006)

    Google Scholar 

  8. Hearst, M.A.: Automatic Acquisition of Hyponyms om Large Text Corpora. In: Proceedings of the Fourteenth International Conference on Computational Linguistics (1992)

    Google Scholar 

  9. Jones, R.: Learning to Extract Entities from Labeled and Unlabeled Text. PhD thesis, University of Utah (2005)

    Google Scholar 

  10. Riloff, E.: Automatically generating extraction patterns from untagged text. In: Proceedings of Thirteenth National Conference on Artificial Intelligence (AAAI 1996), pp. 1044–1049. AAAI Press/MIT Press (1996)

    Google Scholar 

  11. Stevenson, M., Greenwood, M.A.: A Semantic Approach to IE Pattern Induction. Ann. Arbor. 100 (2005)

    Google Scholar 

  12. Sudo, K., Sekine, S., Grishman, R.: An improved extraction pattern representation model for automatic IE pattern acquisition. In: Proceedings of ACL 2003, pp. 224–231 (2003)

    Google Scholar 

  13. Xu, F.: Bootstrapping Relation Extraction from Semantic Seeds. Phd-thesis, Saarland University (2007)

    Google Scholar 

  14. Xu, F., Uszkoreit, H.: Minimally supervised learning of relation extraction rules using semantic seeds. In: A seminar talk at the National Center for Text Mining (NaCTeM) (May 2007)

    Google Scholar 

  15. Xu, F., Uszkoreit, H., Li, H.: Automatic event and relation detection with seeds of varying complexity. In: Proceedings of AAAI 2006 Workshop Event Extraction and Synthesis, Boston (July 2006)

    Google Scholar 

  16. Xu, F., Uszkoreit, H., Li, H.: A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), pp. 584–591 (2007)

    Google Scholar 

  17. Yangarber, R.: Scenarion Customization for Information Extraction. Dissertation, Department of Computer Science, Graduate School of Arts and Science, New York University, New York, USA (2001)

    Google Scholar 

  18. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics (ACL), Morristown, pp. 189–196 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Uszkoreit, H., Xu, F., Li, H. (2010). Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction. In: Horacek, H., Métais, E., Muñoz, R., Wolska, M. (eds) Natural Language Processing and Information Systems. NLDB 2009. Lecture Notes in Computer Science, vol 5723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12550-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12550-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12549-2

  • Online ISBN: 978-3-642-12550-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics