Skip to main content

Neural Learning for Question Answering in Italian

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11298))

Abstract

The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The dataset can be downloaded at http://sag.art.uniroma2.it/squadit.html.

  2. 2.

    A 3-dimensional binary vector represents the match with respect to the original, lowercase or lemma form of the given input token.

  3. 3.

    One of the currently best performing systems, deepl.com system, was used.

  4. 4.

    Obviously, this measure is indicative, since the dataset will be different and the quality of the subset used as test set was not manually validated.

  5. 5.

    The script is available at https://rajpurkar.github.io/SQuAD-explorer/. We applied some updates to translate stop words from the English to the Italian ones.

  6. 6.

    https://it.wikipedia.org/wiki/Danubio.

  7. 7.

    https://it.wikipedia.org/wiki/Isole_Marshall.

References

  1. Baudiš, P., Šedivý, J.: Modeling of the question answering task in the YodaQA system. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 222–228. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_20

    Chapter  Google Scholar 

  2. Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544. ACL (2013)

    Google Scholar 

  3. Brill, E., Dumais, S., Banko, M., Brill, E., Banko, M., Dumais, S.: An analysis of the AskMSR question-answering system. In: Proceedings of EMNLP 2002, January 2002

    Google Scholar 

  4. Caputo, A., de Gemmis, M., Lops, P., Lovecchio, F., Manzari, V.: Overview of the EVALITA 2016 question answering for frequently asked questions (QA4FAQ) task. In: CLiC-it/EVALITA. CEUR Workshop Proceedings, vol. 1749. CEUR-WS.org (2016)

    Google Scholar 

  5. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1, pp. 1870–1879 (2017)

    Google Scholar 

  6. Ferrucci, D.A., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)

    Article  Google Scholar 

  7. Harabagiu, S.M., et al.: FALCON: boosting knowledge for answer engines. In: Proceedings of The Ninth Text REtrieval Conference, TREC 2000, Gaithersburg, Maryland, USA, 13–16 November 2000 (2000)

    Google Scholar 

  8. Hirschman, L., Gaizauskas, R.: Natural language question answering: the view from here. Nat. Lang. Eng. 7(4), 275–300 (2001)

    Article  Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  10. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)

    Google Scholar 

  11. Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling question answering to the web. In: WWW, pp. 150–161 (2001)

    Google Scholar 

  12. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  13. Miller, A.H., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP (2016)

    Google Scholar 

  14. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)

    Google Scholar 

  15. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100.000+ questions for machine comprehension of text. CoRR abs/1606.05250 (2016)

    Google Scholar 

  16. Sun, H., Ma, H., Yih, W.T., Tsai, C.T., Liu, J., Chang, M.W.: Open domain question answering via semantic enrichment. In: WWW (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danilo Croce .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Croce, D., Zelenanska, A., Basili, R. (2018). Neural Learning for Question Answering in Italian. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03840-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03839-7

  • Online ISBN: 978-3-030-03840-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics