Skip to main content

From Textual Information Sources to Linked Data in the Agatha Project

  • Conference paper
  • First Online:
Declarative Programming and Knowledge Management (INAP 2019, WLP 2019, WFLP 2019)

Abstract

Automatic reasoning about textual information is a challenging task in modern Natural Language Processing (NLP) systems. In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri. Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal investigations. The provided architecture and ontology are language independent. Although some of the NLP modules are language dependent, they can be built using adequate AI methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.agatha-osi.com/en/.

  2. 2.

    https://talp-upc.gitbook.io/freeling-4-0-user-manual/tagsets/tagset-pt.

References

  1. Automated event extraction model for multiple linked portuguese documents. https://github.com/kraiyani/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents/blob/master/Universal_to_eagle_tagset.xlsx. Accessed 06 May 2019

  2. Eu vocabularies. https://publications.europa.eu/en/web/eu-vocabularies. Accessed 06 May 2019

  3. Eu vocabularies, thesauri, 1216 criminal law. https://publications.europa.eu/en/web/eu-vocabularies/th-concept-scheme/-/resource/eurovoc/100180?target=Browse. Accessed 06 May 2019

  4. Extended ontology. http://owlgred.lumii.lv/online_visualization/e9fh. Accessed 25 June 2019

  5. Graphdb. http://graphdb.ontotext.com/. Accessed 06 May 2019

  6. Iate (interactive terminology for Europe). https://iate.europa.eu/home. Accessed 06 May 2019

  7. Portuguese universal propositions. https://github.com/System-T/UniversalPropositions/tree/master/UP_Portuguese-Bosque. Accessed 06 May 2019

  8. Protege. https://protege.stanford.edu/. Accessed 06 May 2019

  9. Training and development dataset for automated event extraction model for multiple linked portuguese documents. https://github.com/kraiyani/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents. Accessed 06 May 2019

  10. Amato, F., Moscato, V., Picariello, A., Sperlì, G.: Extreme events management using multimedia social networks. Future Gener. Comp. Syst. 94, 444–452 (2019). https://doi.org/10.1016/j.future.2018.11.035

    Article  Google Scholar 

  11. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)

    Google Scholar 

  12. Cardoso, N.: Rembrandt - a named-entity recognition framework. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pp. 1240–1243. European Language Resources Association (ELRA), Istanbul, May 2012. http://www.lrec-conf.org/proceedings/lrec2012/pdf/409_Paper.pdf

  13. Carreras, X., Chao, I., Padró, L., Padro, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (2004)

    Google Scholar 

  14. Carreras, X., Màrquez, L., Padró, L.: A simple named entity extractor using AdaBoost. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 (2003)

    Google Scholar 

  15. Guarino, N., Giaretta, P.: Ontologies and knowledge bases: towards a terminological clarification. In: Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, pp. 25–32. IOS Press (1995)

    Google Scholar 

  16. Guarino, N., Oberle, D., Staab, S.: What Is an Ontology?, pp. 1–17, May 2009

    Google Scholar 

  17. Raiyani, K., Gonçalves, T., Quaresma, P., Nogueira, V.B.: Fully connected neural network with advance preprocessor to identify aggression over Facebook and Twitter. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 28–41. Association for Computational Linguistics (2018). http://aclweb.org/anthology/W18-4404

  18. Raiyani, K., Gonçalves, T., Quaresma, P., Nogueira, V.B.: Multi-language neural network model with advance preprocessor for gender classification over social media: notebook for PAN at CLEF 2018. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018. (2018). http://ceur-ws.org/Vol-2125/paper_105.pdf

  19. Raiyani, K., Gonçalves, T., Quaresma, P., Nogueira, V.B.: Automated event extraction model for linked Portuguese documents. In: Proceedings of Text2Story – Second Workshop on Narrative Extraction from Texts Co-located with 41th European Conference on Information Retrieval (ECIR 2019), Cologne, Germany, 14 April (2019). http://ceur-ws.org/Vol-2342/paper2.pdf

  20. Raiyani, K., Gonçalves, T., Quaresma, P., Nogueira, V.B.: Vista.ue at semeval-2019 task 5: single multilingual hate speech detection model. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), pp. 520–524. Association for Computational Linguistics (2019)

    Google Scholar 

  21. Raiyani, K., Quaresma, P.: Keyword & machine learning based Japanese statute law retrieval and entailment task at COLIEE-2019. In: Proceedings of Competition on Legal Information Retrieval and Entailment Workshop (COLIEE 2019) in Association with the 17th International Conference on Artificial Intelligence and Law 2019 (ICAIL 2019). Easychair (2019)

    Google Scholar 

  22. Van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use of the simple event model (SEM). Web Semant. Sci. Serv. Agents World Wide Web 9(2), 128–136 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank COMPETE 2020, PORTUGAL 2020 Program, the European Union, and ALENTEJO 2020 for supporting this research as part of Agatha Project SI & IDT number 18022 (Intelligent analysis system of open of sources information for surveillance/crime control). The authors would also like to thank LISP - Laboratory of Informatics, Systems and Parallelism.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vitor Beires Nogueira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Quaresma, P., Beires Nogueira, V., Raiyani, K., Bayot, R., Gonçalves, T. (2020). From Textual Information Sources to Linked Data in the Agatha Project. In: Hofstedt, P., Abreu, S., John, U., Kuchen, H., Seipel, D. (eds) Declarative Programming and Knowledge Management. INAP WLP WFLP 2019 2019 2019. Lecture Notes in Computer Science(), vol 12057. Springer, Cham. https://doi.org/10.1007/978-3-030-46714-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46714-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46713-5

  • Online ISBN: 978-3-030-46714-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics