Skip to main content

Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction

  • Chapter
  • First Online:
Solving Large Scale Learning Tasks. Challenges and Algorithms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9580))

  • 1608 Accesses

Abstract

One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a given set of propositions as a source of supervision. We propose a different approach: we infer weakly supervised examples for relations from statistical relational models learned by using knowledge outside the natural language task. We argue that this deep distant supervision creates more robust examples that are particularly useful when learning the entire model (the structure and parameters). We demonstrate on several domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nist.gov/tac/2015/KBP/ColdStart/index.html.

  2. 2.

    The ratio is actually a log-odds of their weights. We refer to the book for more details [4].

  3. 3.

    With probabilistic training examples, it can be shown that minimizing the KL-divergence between the examples and the current model gives true \(probability - \) predicted probability as the gradient. This has the similar effect of pushing the predicted probabilities closer to the true probabilities.

References

  1. Bell, B., Koren, Y., Volinsky, C.: The bellkor solution to the netflix grand prize (2009)

    Google Scholar 

  2. Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB (1999)

    Google Scholar 

  3. Devlin, S., Kudenko, D., Grzes, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(2), 251–278 (2011)

    Article  MathSciNet  Google Scholar 

  4. Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for AI. Morgan & Claypool, San Rafael (2009)

    MATH  Google Scholar 

  5. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: ACL (2011)

    Google Scholar 

  6. Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: ICML (2008)

    Google Scholar 

  7. Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning Markov logic networks via functional gradient boosting. In: ICDM (2011)

    Google Scholar 

  8. Kim, J., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP’09 shared task on event extraction. In: BioNLP Workshop Companion Volume for Shared Task (2009)

    Google Scholar 

  9. Kuhlmann, G., Stone, P., Mooney, R.J., Shavlik, J.W.: Guiding a reinforcement learner with natural language advice: initial results in robocup soccer. In: AAAI Workshop on Supervisory Control of Learning and Adaptive Systems (2004)

    Google Scholar 

  10. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL and AFNLP (2009)

    Google Scholar 

  11. Natarajan, S., Kersting, K., Khot, T., Shavlik, J.: Boosted Statistical Relational Learners: From Benchmarks to Data-Driven Medicine. SpringerBriefs in Computer Science. Springer, Heidelberg (2015)

    MATH  Google Scholar 

  12. Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: the relational dependency network case. Mach. Learn. 86(1), 25–56 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Natarajan, S., Picado, J., Khot, T., Kersting, K., Re, C., Shavlik, J.: Effectively creating weakly labeled training examples via approximate domain knowledge. In: Davis, J., Ramon, J. (eds.) ILP 2014. LNCS, vol. 9046, pp. 92–107. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23708-4_7

    Chapter  Google Scholar 

  14. Neville, J., Jensen, D.: Relational dependency networks. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning, pp. 653–692. MIT Press, Cambridge (2007)

    Google Scholar 

  15. Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)

    Google Scholar 

  16. Poon, H., Vanderwende, L.: Joint inference for knowledge extraction from biomedical literature. In: NAACL (2010)

    Google Scholar 

  17. Raghavan, S., Mooney, R.: Online inference-rule learning from natural-language extractions. In: International Workshop on Statistical Relational AI (2013)

    Google Scholar 

  18. Riedel, S., Chun, H., Takagi, T., Tsujii, J.: A Markov logic approach to bio-molecular event extraction. In: BioNLP (2009)

    Google Scholar 

  19. Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 148–163. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Sorower, S., Dietterich, T., Doppa, J., Orr, W., Tadepalli, P., Fern, X.: Inverting Grice’s maxims to learn rules from natural language extractions. In: NIPS, pp. 1053–1061 (2011)

    Google Scholar 

  21. Surdeanu, M., Ciaramita, M.: Robust information extraction with perceptrons. In: NIST ACE (2007)

    Google Scholar 

  22. Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.: Multi-instance multi-label learning for relation extraction. In: EMNLP-CoNLL (2012)

    Google Scholar 

  23. Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: ACL (2012)

    Google Scholar 

  24. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Transfer learning via advice taking. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds.) Advances in Machine Learning I. SCI, vol. 262, pp. 147–170. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., Pustejovsky, J.: SemEval-2007 task 15: TempEval temporal relation identification. In: SemEval (2007)

    Google Scholar 

  26. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

    Google Scholar 

  27. Yoshikawa, K., Riedel, S., Asahara, M., Matsumoto, Y.: Jointly identifying temporal relations with Markov logic. In: ACL and AFNLP (2009)

    Google Scholar 

  28. Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL (2005)

    Google Scholar 

Download references

Acknowledgements

Sriraam Natarajan, Anurag Wazalwar and Dileep Viswanathan gratefully acknowledge the support of the DARPA Machine Reading Program and DEFT Program under the Air Force Research Laboratory (AFRL) prime contract nos. FA8750-09-C-0181 and FA8750-13-2-0039 respectively. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. Kristian Kersting was supported by the Fraunhofer ATTRACT fellowship STREAM and by the European Commission under contract number FP7-248258-First-MM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriraam Natarajan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Natarajan, S., Soni, A., Wazalwar, A., Viswanathan, D., Kersting, K. (2016). Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41706-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41705-9

  • Online ISBN: 978-3-319-41706-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics