Skip to main content

Improving Question-Answering for Portuguese Using Triples Extracted from Corpora

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

We present here an evolution of a QA system for Portuguese that uses subject-predicate-object triples extracted from sentences in a corpus. The system is supported by indices that store those triples, related sentences and documents. It processes the questions and retrieves answers based on the triples.

For purposes of testing and evaluation, we have used the CHAVE corpus, used in multiple editions of the CLEF multilingual QA tracks. The questions from those editions were used to query and benchmark our system. Currently, the system manages to answer up to 42 % of those questions. This document describes the modules that compose the system and how they are combined, providing a brief analysis on them, and also current results, as well as some expectations regarding future work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://incubator.apache.org/opennlp/.

  2. 2.

    http://www.linguateca.pt/floresta/BibliaFlorestal/completa.html.

  3. 3.

    Loosely translated as: “Mel Blanc, the man who lent his voice to the world’s most famous rabbit, Bugs Bunny, was allergic to carrots.”.

  4. 4.

    Loosely translated as: “What was Mel Blanc allergic to?”.

  5. 5.

    In 2004, one of the questions was unintentionally duplicated, hence 599 and not 600.

References

  1. Afonso, S., Bick, E., Haber, R., Santos, D.: Floresta sintá(c)tica: a treebank for portuguese. In: Rodríguez, M.G., Araujo, C.P.S. (eds.) Proceedings of LREC 2002, The Third International Conference on Language Resources and Evaluation, pp. 1698–1703. ELRA, Paris (2002)

    Google Scholar 

  2. Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Priberam’s question answering system for portuguese. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Carvalho, G., de Matos, D.M., Rocio, V.: IdSay: question answering for portuguese. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 345–352. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Carvalho, G., Matos, D.M., Rocio, V.: Robust Question Answering. In: PhD and MSc/MA Dissertation Contest of the of the 10th International Conference on Computational Processing of the Portuguese Language (PROPOR 2012), Coimbra, Portugal, April 2012

    Google Scholar 

  5. Costa, L.F.: Esfinge – a question answering system in the web using the web. In: Proceedings of the Demonstration Session of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 410–419. Association for Computational Linguistics, Trento, Italy, April 2006

    Google Scholar 

  6. Filho, P.P.B., de Uzêda, V.R., Pardo, T.A.S., das Graças Volpe Nunes, M.: Using a Text Summarization System for Monolingual Question Answering. In: CLEF 2006 Working Notes (2006)

    Google Scholar 

  7. Forner, P., Peñas, A., Agirre, E., Alegria, I., Forăscu, C., Moreau, N., Osenova, P., Prokopidis, P., Rocha, P., Sacaleanu, B., Sutcliffe, R., Tjong Kim Sang, E.: Overview of the CLEF 2008 multilingual question answering track. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 262–295. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Gamallo, P.: An overview of open information extraction. In: Pereira, M.J.V., Leal, J.P., Simões, A. (eds.) Proceedings of the 3rd Symposium on Languages, Applications and Technologies (SLATE 2014), pp. 13–16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Publishing, Germany (2014)

    Google Scholar 

  9. Giampiccolo, D., Forner, P., Herrera, J., Peñas, A., Ayache, C., Forascu, C., Jijkoun, V., Osenova, P., Rocha, P., Sacaleanu, B., Sutcliffe, R.F.E.: Overview of the CLEF 2007 multilingual question answering track. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 200–236. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Oliveira, H.G., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 31–40. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson Education International Inc., Upper Saddle River (2008)

    Google Scholar 

  12. Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Jijkoun, V., Osenova, P., Peñas, A., Rocha, P., Sacaleanu, B., Sutcliffe, R.F.E.: Overview of the CLEF 2006 multilingual question answering track. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 223–256. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Magnini, B., Vallin, A., Ayache, C., Erbach, G., Peñas, A., de Rijke, M., Rocha, P., Simov, K.I., Sutcliffe, R.F.E.: Overview of the CLEF 2004 multilingual question answering track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 371–391. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action. Manning Publications Co., Greenwich (2010)

    Google Scholar 

  15. Mendes, A., Coheur, L., Mamede, N.J., Ribeiro, R., Batista, F., de Matos, D.M.: QA@L\(^{2}\)F, first steps at QA@CLEF. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 356–363. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Moens, M.F.: Information Extraction: Algorithms and Prospects in a Retrieval Context. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  17. Mota, C.: Resultados Págicos: Participação, Resultados e Recursos. Linguamática 4(1), April 2012

    Google Scholar 

  18. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiğit, G., Kübler, S., Marinov, S., Marsi, E.: MaltParser: a language-independent system for data-driven dependency parsing. Nat. Lang. Eng. 13(2), 95–135 (2007)

    Google Scholar 

  19. Pardo, T.A.S., Rino, L.H.M., Nunes, M.G.V.: GistSumm: a summarization tool based on a new extractive method. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 210–218. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Quaresma, P., Quintano, L., Rodrigues, I., Saias, J., Salgueiro, P.: The University of Évora approach to QA@CLEF-2004. In: CLEF 2004 Working Notes (2004)

    Google Scholar 

  21. Rodrigues, R., Gonçalo-Oliveira, H., Gomes, P.: LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In: Pereira, M.J.V., Leal, J.P., Simões, A. (eds.) Proceedings of the 3rd Symposium on Languages, Applications and Technologies (SLATE 2014). pp. 267–274. Germany (2014)

    Google Scholar 

  22. Saias, J., Quaresma, P.: The senso question answering approach to portuguese QA@CLEF-2007. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop, Budapest, Hungary, September 2007

    Google Scholar 

  23. Santos, D., Rocha, P.: The key to the first CLEF with portuguese: topics, questions and answers in CHAVE. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) Multilingual Information Access for Text, Speech and Images. LNCS, vol. 3491, pp. 821–832. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  24. Sarmento, L., Oliveira, E.: Making RAPOSA (FOX) smarter. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop, Budapest, Hungary, September 2007

    Google Scholar 

  25. Strzalkowski, T., Harabagiu, S. (eds.): Advances in Open Domain Question Answering, Text, Speech and Language Technology, vol. 32. Springer, Heidelberg (2006)

    Google Scholar 

  26. Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.C.N., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 639–648. ACM Press, Lyon, France, April 2012

    Google Scholar 

  27. Vallin, A., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., Peñas, A., de Rijke, M., Sacaleanu, B., Santos, D., Sutcliffe, R.F.E.: Overview of the CLEF 2005 multilingual question answering track. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 307–331. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rodrigues, R., Gomes, P. (2016). Improving Question-Answering for Portuguese Using Triples Extracted from Corpora. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics