Semantic Relation Extraction. Resources, Tools and Strategies

Garcia, Marcos

doi:10.1007/978-3-319-41552-9_15

Marcos Garcia¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

699 Accesses

Abstract

Relation extraction is a subtask of information extraction that aims at obtaining instances of semantic relations present in texts. This information can be arranged in machine-readable formats, useful for several applications that need structured semantic knowledge. The work presented in this paper explores different strategies to automate the extraction of semantic relations from texts in Portuguese, Galician and Spanish. Both machine learning (distant-supervised and supervised) and rule-based techniques are investigated, and the impact of the different levels of linguistic knowledge is analyzed for the various approaches. Regarding domains, the experiments are focused on the extraction of encyclopedic knowledge, by means of the development of biographical relations classifiers (in a closed domain) and the evaluation of an open information extraction tool. To implement the extraction systems, several natural language processing tools have been built for the three research languages: From sentence splitting and tokenization modules to part-of-speech taggers, named entity recognizers and coreference resolution systems. Furthermore, several lexica and corpora have been compiled and enriched with different levels of linguistic annotation, which are useful for both training and testing probabilistic and symbolic models. As a result of the performed work, new resources and tools are available for automated processing of texts in Portuguese, Galician and Spanish.

This work has been partially supported by the Spanish Ministry of Economy and Competitiveness through the project FFI2014-51978-C2-1-R, and by a Juan de la Cierva formación grant, reference FJCI-2014-22853.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A possible English translation could be: “John A. Garcia (born in 1949 in Galicia) is one of the pioneers of the modern American computer game industry and the current president of Novalogic.”.
2.
All of them are freely available at http://gramatica.usc.es/~marcos/phd.html.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries, pp. 85–94 (2000)
Google Scholar
Banko, M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007), pp. 2670–2676 (2007)
Google Scholar
Barcala, F.M., Domínguez Noya, E.M., Otero, P.G., López Martínez, M., Moscoso Mato, E.M., Rojo, G., Santalla del Río, M.P., Sotelo Docío, S.: A corpus and lexical resources for multi-word terminology extraction in the field of economy in a in a minority language. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, Proceedings of the 3rd Language & Technology Conference, pp. 359–363 (2007)
Google Scholar
Bosque 8.0: Uma floresta integralmente revista por linguistas (2008)
Google Scholar
Branco, A., Silva, J.R.: Contractions: breaking the tokenization-tagging circularity. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.G.V. (eds.) PROPOR 2003. LNCS (LNAI), vol. 2721, pp. 167–170. Springer, Heidelberg (2003)
Chapter Google Scholar
Branco, A., Silva, J.: Evaluating solutions for the rapid development of state-of-the-art POS taggers for portuguese. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 507–510 (2004)
Google Scholar
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proceedings of the WebDB Workshop at the 6th International Conference on Extending Database Technology (EDBT 1998), pp. 172–183 (1998)
Google Scholar
Bruckschen, M., Camargo de Souza, J., Vieira, R., Rigo, S.: Sistema SeRELeP para o reconhecimento de relações entre entidades mencionadas. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, Chap. 14, pp. 247–260. Linguateca (2008)
Google Scholar
Cardoso, N.: REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações ANálise Detalhada do Texto. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pp. 195–211. Linguateca (2008)
Google Scholar
Carreras, X., Márquez, L., Padró, L.: A simple named entity extractor using AdaBoost. In: Proceedings of the 7th Conference on Natural Language Learning at HLT/NAACL 2003, vol. 4, pp. 152–155. ACL (2003)
Google Scholar
Chaves, M.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pp. 231–245. Linguateca (2008)
Google Scholar
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web (WWW 2013), pp. 355–366 (2013)
Google Scholar
Eleutério, S., Ranchhod, E., Mota, C., Carvalho, P.: Dicionários Electrónicos do Português. Características e Aplicações. In: Actas del VIII Simposio Internacional de Comunicación Social, pp. 636–642 (2003)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D., Yates, A.: Web-scale information extraction in KnowItAll. In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 100–110. ACM (2004)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pp. 3–10 (2011)
Google Scholar
Gamallo, P., Garcia, M.: A resource-based method for named entity extraction and classification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 610–623. Springer, Heidelberg (2011)
Chapter Google Scholar
Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 10–18. ACL (2012)
Google Scholar
Gamallo, P., González López, I.: A grammatical formalism based on patterns of part-of-speech tags. Int. J. Corpus Linguist. 16(1), 45–71 (2011)
Article Google Scholar
Garcia, M.: Extracção de Relações Semânticas. Recursos, Ferramentas e Estratégias. Ph.D. thesis, Universidade de Santiago de Compostela (2014)
Google Scholar
Garcia, M., Gamallo, P.: Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação. Linguamática. Revista para o Processamento Automático das Línguas Ibéricas 2(2), 59–67 (2010)
Google Scholar
Garcia, M., Gamallo, P.: Using morphosyntactic post-processing to improve PoS-tagging accuracy. In: Proceedings of the 9th International Conference on Computational Processing of Portuguese Language (PROPOR 2010), Extended Activities Proceedings (2010)
Google Scholar
Garcia, M., Gamallo, P.: A weakly-supervised rule-based approach for relation extraction. In: Proceedings of the XIV Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011). Workshop on Knowledge Extraction and Exploitation from Semi-structures Online Sources (KEESOS) (2011)
Google Scholar
Garcia, M., Gamallo, P.: An exploration of the linguistic knowledge for semantic relation extraction in Spanish. In: Proceedings of the Joint Workshop FAM-LbR/KRAQ 2011. In: Learning by Reading and its Applications in Intelligent Question-Answering at 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pp. 7–12 (2011)
Google Scholar
Garcia, M., Gamallo, P.: Dependency-based text compression for semantic relation extraction. In: Proceedings of the Workshop on Information Extraction and Knowledge Acquisition (IEKA 2011) at 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), pp. 21–28 (2011)
Google Scholar
Garcia, M., Gamallo, P.: Evaluating various features on semantic relation extraction. In: Proceedings of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), pp. 721–726 (2011)
Google Scholar
Garcia, M., Gamallo, P.: Exploring the effectiveness of linguistic knowledge for biographical relation extraction. Nat. Lang. Eng. 21(4), 519–551 (2013)
Article Google Scholar
Garcia, M., Gamallo, P.: An entity-centric coreference resolution system for person entities with rich linguistic information. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 741–752 (2014)
Google Scholar
Garcia, M., Gamallo, P.: Entity-centric coreference resolution of person entities for open information extraction. Procesamiento del Lenguaje Natural 53, 25–32 (2014)
Google Scholar
Garcia, M., Gamallo, P.: Multilingual corpora with coreference annotation of person entities. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3229–3233. ELRA (2014)
Google Scholar
Garcia, M., Gamallo, P., Gayo, I., Pousada Cruz, M.: PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Procesamiento del Lenguaje Natural 53, 95–101 (2014)
Google Scholar
Garcia, M., Gayo, I., González López, I.: Identificação e Classificação de Entidades Mencionadas em Galego. Estudos de Lingüística Galega 4, 13–25 (2012)
Google Scholar
Graña, J., Barcala, F.-M., Vilares, J.: Formal methods of tokenization for part-of-speech tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 123–144. Springer, Heidelberg (2002)
Chapter Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. ACL (1992)
Google Scholar
Leach, G., Wilson, A.: Recommendations for the morphosyntactic annotation of corpora. Technical report, Expert Advisory Group on Language Engineering Standard (EAGLES) (1996)
Google Scholar
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)
Article Google Scholar
Mikheev, A., Grover, C., Moens, M.: XML tools and architecture for Named Entity Recognition. J. Markup Lang. Theory Pract. 1(3), 89–113 (1998)
Article Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009), pp. 1003–1011. ACL (2009)
Google Scholar
Mota, C., Santos, D. (eds.): Desafios na avaliação conjunta do reconhecimento de entidades mencionadas. O Segundo HAREM. Linguateca (2008)
Google Scholar
Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA (2012)
Google Scholar
Palomar, M., Ferrández, A., Moreno, L.: Martínez-Barco, P., Peral, J., Saiz-Noeda, M., Muñoz, R.: An algorithm for anaphora resolution in Spanish texts. Comput. Linguist. 27(4), 545–567 (2001)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), pp. 113–120. ACL (2006)
Google Scholar
Recasens, M.: Martí, M.: AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Lang. Res. Eval. 44(4), 315–345 (2010)
Google Scholar
Santos, D., Cardoso, N. (eds.): Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Linguateca (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Grupo LyS, Departamento de Galego-Português, Francês e Linguística Faculdade de Filologia, Universidade da Coruña, Campus da Coruña, Coruña, Spain
Marcos Garcia

Authors

Marcos Garcia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos Garcia .

Editor information

Editors and Affiliations

Universidade de Lisbon, Portugal
João Silva
ISCTE-IUL, Lisbon, Portugal
Ricardo Ribeiro
Universidade de Évora, Évora, Portugal
Paulo Quaresma
Universidade de Caxias do Sul, Caxias do Suö, Brazil
André Adami
Universidade de Lisbon, Lisboa, Portugal
António Branco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia, M. (2016). Semantic Relation Extraction. Resources, Tools and Strategies. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-41552-9_15
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics