Advertisement

Results and Lessons of the Question Answering Track at CLEF

  • Anselmo Peñas
  • Álvaro RodrigoEmail author
  • Bernardo Magnini
  • Pamela Forner
  • Eduard Hovy
  • Richard Sutcliffe
  • Danilo Giampiccolo
Chapter
Part of the The Information Retrieval Series book series (INRE, volume 41)

Abstract

The Question Answering track at CLEF ran for 13 years, from 2003 until 2015. Along these years, many different tasks, resources and evaluation methodologies were developed. We divide the CLEF Question Answering campaigns into four eras: (1) Ungrouped mainly factoid questions asked against monolingual newspapers (2003–2006), (2) Grouped questions asked against newspapers and Wikipedias (2007–2008), (3) Ungrouped questions against multilingual parallel-aligned EU legislative documents (2009–2010), and (4) Questions about a single document using a related document collection as background information (2011–2015). We provide the description and the main results for each of these eras, together with the pilot exercises and other Question Answering tasks that ran in CLEF. Finally, we conclude with some of the lessons learnt along these years.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work has been partially funded by the Spanish Research Agency (Agencia Estatal de Investigación) LIHLITH project (PCIN-2017-085/AEI).

References

  1. Cassan A, Figueira H, Martins A, Mendes A, Mendes P, Pinto C, Vidal D (2007) Priberam’s question answering system in a cross-language environment. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval : seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS) 4730. Springer, Heidelberg, pp 300–309CrossRefGoogle Scholar
  2. Clark P, Etzioni O (2016) My computer is an honor student - but how intelligent is it? Standardized tests as a measure of AI. AI Mag 37(1):5–12CrossRefGoogle Scholar
  3. Ferrucci DA, Brown EW, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager JM, Schlaefer N, Welty CA (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79CrossRefGoogle Scholar
  4. Forner P, Peñas A, Agirre E, Alegria I, Forascu C, Moreau N, Osenova P, Prokopidis P, Rocha P, Sacaleanu B, Sutcliffe RFE, Sang EFTK (2009) Overview of the Clef 2008 multilingual question answering track. In: Peters C, Deselaers T, Ferro N, Gonzalo J, Jones GJF, Kurimo M, Mandl T, Peñas A (eds) Evaluating systems for multilingual and multimodal information access: ninth workshop of the cross–language evaluation forum (CLEF 2008). Revised selected papers. Lecture notes in computer science (LNCS) 5706. Springer, Heidelberg, pp 262–295Google Scholar
  5. Giampiccolo D, Forner P, Herrera J, Peñas A, Ayache C, Forascu C, Jijkoun V, Osenova P, Rocha P, Sacaleanu B, Sutcliffe RFE (2008) Overview of the CLEF 2007 multilingual question answering track. In: Peters C, Jijkoun V, Mandl T, Müller H, Oard DW, Peñas A, Petras V, Santos D (eds) Advances in multilingual and multimodal information retrieval: eighth workshop of the cross–language evaluation forum (CLEF 2007). Revised selected papers. Lecture notes in computer science (LNCS) 5152. Springer, Heidelberg, pp 200–236Google Scholar
  6. Harabagiu S, Lacatusu F, Hickl A (2006) Answering complex questions with random walk models. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’06, pp 220–227Google Scholar
  7. Herrera J, Peñas A, Verdejo F (2005) Question answering pilot task at CLEF 2004. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images: fifth workshop of the cross–language evaluation forum (CLEF 2004) revised selected papers. Lecture notes in computer science (LNCS) 3491. Springer, Heidelberg, pp 581–590CrossRefGoogle Scholar
  8. Jijkoun V, de Rijke M (2007) Overview of the wiqa task at CLEF 2006. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval : seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS) 4730. Springer, Heidelberg, pp 265–274CrossRefGoogle Scholar
  9. Lamel L, Rosset S, Ayache C, Mostefa D, Turmo J, Comas P (2008) Question answering on speech transcriptions: the QAST evaluation in CLEF. In: Proceedings of the international conference on language resources and evaluation, LREC 2008, 26 May–1 June 2008, Marrakech, MoroccoGoogle Scholar
  10. Laurent D (2014) English run of synapse développement at entrance exams 2014. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, notebook papers. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/, pp 1404–1414
  11. Laurent D, Séguéla P, Nègre S (2007) Cross lingual question answering using QRISTAL for CLEF 2006. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval : seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS) 4730. Springer, Heidelberg, pp 339–350CrossRefGoogle Scholar
  12. Laurent D, Chardon B, Nègre S (2014) French run of synapse développement at entrance exams 2014. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, notebook papers. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/, pp 1415–1426
  13. Laurent D, Chardon B, Nègre S, Pradel C, Séguéla P (2015) Reading comprehension at entrance exams 2015. In: Cappellato L, Ferro N, Jones GJF, SanJuan E (eds) CLEF 2015 labs and workshops, notebook papers. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1391/
  14. Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. Web Semantics: Sci Serv Agents World Wide Web 21(0):3–13. Special Issue on Evaluation of Semantic TechnologiesCrossRefGoogle Scholar
  15. Magnini B, Romagnoli S, Vallin A, Herrera J, Peñas A, Peinado V, Verdejo MF, de Rijke M (2004) The multiple language question answering track at CLEF 2003. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Comparative evaluation of multilingual information access systems: fourth workshop of the cross–language evaluation forum (CLEF 2003) revised selected papers. Lecture notes in computer science (LNCS) 3237. Springer, Heidelberg, pp 471–486CrossRefGoogle Scholar
  16. Magnini B, Vallin A, Ayache C, Erbach G, Peñas A, de Rijke M, Rocha P, Simov KI, Sutcliffe RFE (2005) Overview of the CLEF 2004 multilingual question answering track. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images: fifth workshop of the cross–language evaluation forum (CLEF 2004) Revised selected papers. Lecture notes in computer science (LNCS) 3491. Springer, Heidelberg, pp 371–391CrossRefGoogle Scholar
  17. Magnini B, Giampiccolo D, Forner P, Ayache C, Jijkoun V, Osenova P, Peñas A, Rocha P, Sacaleanu B, Sutcliffe RFE (2007) Overview of the CLEF 2006 multilingual question answering track. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval : seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS) 4730. Springer, Heidelberg, pp 223–256CrossRefGoogle Scholar
  18. Montes-y-Gómez M, Pineda LV, Pérez-Coutiño MA, Soriano JMG, Arnal ES, Rosso P (2006) A full data-driven system for multiple language question answering. In: Peters C, Gey FC, Gonzalo J, Jones GJF, Kluck M, Magnini B, Müller H, de Rijke M (eds) Accessing multilingual information repositories: sixth workshop of the cross–language evaluation forum (CLEF 2005). Revised selected papers. Lecture notes in computer science (LNCS) 4022. Springer, Heidelberg, pp 420–428CrossRefGoogle Scholar
  19. Morante R, Daelemans W (2011) Overview of the QA4MRE pilot task: annotating modality and negation for a machine reading evaluation. In: Petras V, Forner P, Clough P, Ferro N (eds) CLEF 2011 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1177/
  20. Morante R, Krallinger M, Valencia A, Daelemans W (2013) Machine reading of biomedical texts about alzheimer’s disease. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
  21. Noguera E, Llopis F, Ferrández A, Escapa A (2007) Evaluation of open-domain question answering systems within a time constraint. In: 21st International conference on advanced information networking and applications (AINA 2007). Workshops proceedings, vol 1, May 21–23, 2007, Niagara Falls, Canada, pp 260–265Google Scholar
  22. Peñas A, Rodrigo A (2011) A simple measure to assess non-response. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - vol 1. Association for Computational Linguistics, HLT ’11, pp 1415–1424Google Scholar
  23. Peñas A, Rodrigo Á, Sama V, Verdejo F (2007) Overview of the answer validation exercise 2006. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval : seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS) 4730. Springer, Heidelberg, pp 257–264CrossRefGoogle Scholar
  24. Peñas A, Forner P, Rodrigo A, Sutcliffe RFE, Forascu C, Mota C (2010a) Overview of ResPubliQA 2010: question answering evaluation over European legislation. In: Braschler M, Harman DK, Pianta E, Ferro N (eds) CLEF 2010 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073, http://ceur-ws.org/Vol-1176/
  25. Peñas A, Forner P, Sutcliffe RFE, Rodrigo A, Forascu C, Alegria I, Giampiccolo D, Moreau N, Osenova P (2010b) Overview of ResPubliQA 2009: question answering evaluation over European legislation. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D, Peñas A, Roda G (eds) Multilingual information access evaluation vol. I text retrieval experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS) 6241. Springer, Heidelberg, pp 174–196Google Scholar
  26. Peñas A, Hovy EH, Forner P, Rodrigo A, Sutcliffe RFE, Forascu C, Sporleder C (2011) Overview of QA4MRE at CLEF 2011: question answering for machine reading evaluation. In: Petras V, Forner P, Clough P, Ferro N (eds) CLEF 2011 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1177/
  27. Peñas A, Miyao Y, Hovy E, Forner P, Kando N (2013) Overview of QA4MRE 2013 entrance exams task. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
  28. Santos D, Cabral LM (2010) GikiCLEF: expectations and lessons learned. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D, Peñas A, Roda G (eds) Multilingual information access evaluation vol. I text retrieval experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS) 6241, Springer, Heidelberg, pp 212–222Google Scholar
  29. Tsatsaronis G, Balikas G, Malakasiotis P, Partalas I, Zschunke M, Alvers MR, Weissenborn D, Krithara A, Petridis S, Polychronopoulos D, Almirantis Y, Pavlopoulos J, Baskiotis N, Gallinari P, Artières T, Ngonga A, Heino N, Gaussier É, Barrio-Alvers L, Schroeder M, Androutsopoulos I, Paliouras G (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinf 16:138:1–138:28Google Scholar
  30. Vallin A, Magnini B, Giampiccolo D, Aunimo L, Ayache C, Osenova P, Peñas A, de Rijke M, Sacaleanu B, Santos D, Sutcliffe RFE (2006) Overview of the CLEF 2005 multilingual question answering track. In: Peters C, Gey FC, Gonzalo J, Jones GJF, Kluck M, Magnini B, Müller H, de Rijke M (eds) Accessing multilingual information repositories: sixth workshop of the cross–language evaluation forum (CLEF 2005). Revised selected papers. Lecture notes in computer science (LNCS) 4022. Springer, Heidelberg, pp 307–331CrossRefGoogle Scholar
  31. Voorhees EM (2000) Overview of the TREC-9 question answering track. In: Proceedings of the ninth text retrieval conference, TREC 2000, Gaithersburg, Maryland, USA, November 13–16, 2000Google Scholar
  32. Voorhees EM (2002) Overview of TREC 2002 question answering track. In: Voorhees EM, Buckland LP (eds) Proceedings of the eleventh text retrieval conference (TREC 2002). NIST Publication 500-251, pp 57–68Google Scholar
  33. Voorhees EM, Tice DM (1999) The TREC-8 question answering track evaluation. In: Text retrieval conference TREC-8, pp 83–105Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anselmo Peñas
    • 1
  • Álvaro Rodrigo
    • 1
    Email author
  • Bernardo Magnini
    • 2
  • Pamela Forner
    • 3
  • Eduard Hovy
    • 4
  • Richard Sutcliffe
    • 5
  • Danilo Giampiccolo
    • 3
  1. 1.NLP&IR Group at UNEDMadridSpain
  2. 2.Natural Language Processing Research UnitFBKTrentoItaly
  3. 3.FBK - PMGTrentoItaly
  4. 4.Carnegie Mellon UniversityLanguage Technologies InstitutePittsburghUSA
  5. 5.CSIS DepartmentUniversity of LimerickLimerickIreland

Personalised recommendations