Mining Relations from Unstructured Content

Lourentzou, Ismini; Alba, Alfredo; Coden, Anni; Gentile, Anna Lisa; Gruhl, Daniel; Welch, Steve

doi:10.1007/978-3-319-93037-4_29

Ismini Lourentzou¹⁹,
Alfredo Alba²¹,
Anni Coden²⁰,
Anna Lisa Gentile²¹,
Daniel Gruhl²¹ &
…
Steve Welch²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2057 Accesses
6 Citations

Abstract

Extracting relations from unstructured Web content is a challenging task and for any new relation a significant effort is required to design, train and tune the extraction models. In this work, we investigate how to obtain suitable results for relation extraction with modest human efforts, relying on a dynamic active learning approach. We propose a method to reliably generate high quality training/test data for relation extraction - for any generic user-demonstrated relation, starting from a few user provided examples and extracting valuable samples from unstructured and unlabeled Web content. To this extent we propose a strategy which learns how to identify the best order to human-annotate data, maximizing learning performance early in the process. We demonstrate the viability of the approach (i) against state of the art datasets for relation extraction as well as (ii) a real case study identifying text expressing a causal relation between a drug and an adverse reaction from user generated Web content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://surdeanu.info/kbp2013/.
2.
In our experiments we use pairs of entities, however we should note that our models can handle n-ary relations as well. We leave this to future work.
3.
The size of the batch is adjustable, the human-in-the-loop can specify it. In our experiments, the involved medical doctor indicated 100 as a good size in terms of keeping focus.
4.
http://doi.org/10.4225/08/570FB102BDAD2.
5.
https://github.com/Isminoula/CausalADEs.
6.
On a Linux server with 48 Intel Xeon CPUs @2.20GHz, 231GBs RAM, NVIDIA GeForce GTX 1080 GPU, on causalADE task albl (the libact implementation https://github.com/ntucllab/libact) took 3hrs-10mins, our pruning method took 7 min.

References

Adel, H., Roth, B., Schütze, H.: Comparing convolutional neural networks to traditional models for slot filling. In: NAACL-HLT (2016)
Google Scholar
Alba, A., Coden, A., Gentile, A.L., Gruhl, D., Ristoski, P., Welch, S.: Language agnostic dictionary extraction. In: ISWC (ISWC-PD-Industry). CEUR Workshop Proceedings, vol. 1963 (2017)
Google Scholar
Angeli, G., Tibshirani, J., Wu, J., Manning, C.D.: Combining distant and partial supervision for relation extraction. In: EMNLP, pp. 1556–1567 (2014)
Google Scholar
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: ICLR (2017)
Google Scholar
Augenstein, I., Maynard, D., Ciravegna, F.: Distantly supervised web relation extraction for knowledge base population. Semant. Web 7(4), 335–349 (2016)
Article Google Scholar
Bengio, Y.: Curriculum learning. In: ICML (2009)
Google Scholar
Bunescu, R.C., Mooney, R.J.: A shortest path dependency kernel for relation extraction. In: HLT/EMNLP, pp. 724–731. ACL (2005)
Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: ACL, p. 423. ACL (2004)
Google Scholar
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_14
Chapter Google Scholar
Fu, L., Grishman, R.: An efficient active learning framework for new relation types. In: IJCNLP, pp. 692–698 (2013)
Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: ICML (2017)
Google Scholar
Gentile, A.L., Zhang, Z., Augenstein, I., Ciravegna, F.: Unsupervised wrapper induction using linked data. In: K-CAP, pp. 41–48. ACM (2013)
Google Scholar
Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: DEW Workshop, pp. 94–99. ACL (2009)
Google Scholar
Hsu, W., Lin, H.: Active learning by learning. In: Bonet, B., Koenig, S. (eds.) AAAI, pp. 2659–2665. AAAI Press (2015)
Google Scholar
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: NIPS, pp. 892–900 (2010)
Google Scholar
Ji, G., Liu, K., He, S., Zhao, J.: Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: AAAI, pp. 3060–3066 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: ICML, pp. 148–156 (1994)
Chapter Google Scholar
Liu, M.X.C.: Semantic relation classification via hierarchical recurrent neural network with attention. In: COLING (2016)
Google Scholar
Mooney, R.J., Bunescu, R.C.: Subsequence kernels for relation extraction. In: NIPS, pp. 171–178 (2006)
Google Scholar
Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward nets: some experiments. In: NIPS, pp. 630–637 (1990)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)
Google Scholar
Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: ICML. ACM (2004)
Google Scholar
Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: VS@ HLT-NAACL, pp. 39–48 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Ratner, A.J., Sa, C.D., Wu, S., Selsam, D., Ré, C.: Data programming: creating large training sets, quickly. In: NIPS, pp. 3567–3575 (2016)
Google Scholar
Roth, B., Barth, T., Wiegand, M., Klakow, D.: A survey of noise reduction methods for distant supervision. In: AKBC, pp. 73–78. ACM (2013)
Google Scholar
Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)
Google Scholar
Stanovsky, G., Gruhl, D., Mendes, P.: Recognizing mentions of adverse drug reaction in social media using knowledge-infused recurrent models. In: EACL, pp. 142–151. ACL (2017)
Google Scholar
Sterckx, L., Demeester, T., Deleu, J., Develder, C.: Using active learning and semantic clustering for noise reduction in distant supervision. In: AKBC at NIPS, pp. 1–6 (2014)
Google Scholar
Vu, N.T., Adel, H., Gupta, P., et al.: Combining recurrent and convolutional neural networks for relation classification. In: NAACL-HLT, pp. 534–539 (2016)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)
MathSciNet MATH Google Scholar
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J., et al.: Relation classification via convolutional deep neural network. In: COLING, pp. 2335–2344 (2014)
Google Scholar
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: ACL, pp. 419–426. ACL (2005)
Google Scholar
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: ACL - Short Papers, vol. 2, pp. 207–212 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana - Champaign, Champaign, USA
Ismini Lourentzou
IBM Watson Research Laboratory, New York, NY, USA
Anni Coden
IBM Research Almaden, San Jose, CA, USA
Alfredo Alba, Anna Lisa Gentile, Daniel Gruhl & Steve Welch

Authors

Ismini Lourentzou
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Alba
View author publications
You can also search for this author in PubMed Google Scholar
Anni Coden
View author publications
You can also search for this author in PubMed Google Scholar
Anna Lisa Gentile
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gruhl
View author publications
You can also search for this author in PubMed Google Scholar
Steve Welch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Lisa Gentile .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lourentzou, I., Alba, A., Coden, A., Gentile, A.L., Gruhl, D., Welch, S. (2018). Mining Relations from Unstructured Content. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-93037-4_29
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics