Neural Learning for Question Answering in Italian

Croce, Danilo; Zelenanska, Alexandra; Basili, Roberto

doi:10.1007/978-3-030-03840-3_29

Neural Learning for Question Answering in Italian

Danilo Croce¹⁶,
Alexandra Zelenanska¹⁶ &
Roberto Basili¹⁶

Conference paper
First Online: 09 November 2018

1064 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11298))

Abstract

The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The dataset can be downloaded at http://sag.art.uniroma2.it/squadit.html.
2.
A 3-dimensional binary vector represents the match with respect to the original, lowercase or lemma form of the given input token.
3.
One of the currently best performing systems, deepl.com system, was used.
4.
Obviously, this measure is indicative, since the dataset will be different and the quality of the subset used as test set was not manually validated.
5.
The script is available at https://rajpurkar.github.io/SQuAD-explorer/. We applied some updates to translate stop words from the English to the Italian ones.
6.
https://it.wikipedia.org/wiki/Danubio.
7.
https://it.wikipedia.org/wiki/Isole_Marshall.

References

Baudiš, P., Šedivý, J.: Modeling of the question answering task in the YodaQA system. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 222–228. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_20
Chapter Google Scholar
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544. ACL (2013)
Google Scholar
Brill, E., Dumais, S., Banko, M., Brill, E., Banko, M., Dumais, S.: An analysis of the AskMSR question-answering system. In: Proceedings of EMNLP 2002, January 2002
Google Scholar
Caputo, A., de Gemmis, M., Lops, P., Lovecchio, F., Manzari, V.: Overview of the EVALITA 2016 question answering for frequently asked questions (QA4FAQ) task. In: CLiC-it/EVALITA. CEUR Workshop Proceedings, vol. 1749. CEUR-WS.org (2016)
Google Scholar
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1, pp. 1870–1879 (2017)
Google Scholar
Ferrucci, D.A., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Article Google Scholar
Harabagiu, S.M., et al.: FALCON: boosting knowledge for answer engines. In: Proceedings of The Ninth Text REtrieval Conference, TREC 2000, Gaithersburg, Maryland, USA, 13–16 November 2000 (2000)
Google Scholar
Hirschman, L., Gaizauskas, R.: Natural language question answering: the view from here. Nat. Lang. Eng. 7(4), 275–300 (2001)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)
Google Scholar
Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling question answering to the web. In: WWW, pp. 150–161 (2001)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Google Scholar
Miller, A.H., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100.000+ questions for machine comprehension of text. CoRR abs/1606.05250 (2016)
Google Scholar
Sun, H., Ma, H., Yih, W.T., Tsai, C.T., Liu, J., Chang, M.W.: Open domain question answering via semantic enrichment. In: WWW (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Enterprise Engineering, University of Roma Tor Vergata, Rome, Italy
Danilo Croce, Alexandra Zelenanska & Roberto Basili

Authors

Danilo Croce
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Zelenanska
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Basili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danilo Croce .

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Povo (TN), Italy
Chiara Ghidini
Fondazione Bruno Kessler, Povo (TN), Italy
Bernardo Magnini
University of Trento, Povo (TN), Italy
Andrea Passerini
Fondazione Bruno Kessler, Povo (TN), Italy
Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Croce, D., Zelenanska, A., Basili, R. (2018). Neural Learning for Question Answering in Italian. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-03840-3_29
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03839-7
Online ISBN: 978-3-030-03840-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics