Abstract
The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The dataset can be downloaded at http://sag.art.uniroma2.it/squadit.html.
- 2.
A 3-dimensional binary vector represents the match with respect to the original, lowercase or lemma form of the given input token.
- 3.
One of the currently best performing systems, deepl.com system, was used.
- 4.
Obviously, this measure is indicative, since the dataset will be different and the quality of the subset used as test set was not manually validated.
- 5.
The script is available at https://rajpurkar.github.io/SQuAD-explorer/. We applied some updates to translate stop words from the English to the Italian ones.
- 6.
- 7.
References
Baudiš, P., Šedivý, J.: Modeling of the question answering task in the YodaQA system. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 222–228. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_20
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544. ACL (2013)
Brill, E., Dumais, S., Banko, M., Brill, E., Banko, M., Dumais, S.: An analysis of the AskMSR question-answering system. In: Proceedings of EMNLP 2002, January 2002
Caputo, A., de Gemmis, M., Lops, P., Lovecchio, F., Manzari, V.: Overview of the EVALITA 2016 question answering for frequently asked questions (QA4FAQ) task. In: CLiC-it/EVALITA. CEUR Workshop Proceedings, vol. 1749. CEUR-WS.org (2016)
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1, pp. 1870–1879 (2017)
Ferrucci, D.A., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Harabagiu, S.M., et al.: FALCON: boosting knowledge for answer engines. In: Proceedings of The Ninth Text REtrieval Conference, TREC 2000, Gaithersburg, Maryland, USA, 13–16 November 2000 (2000)
Hirschman, L., Gaizauskas, R.: Natural language question answering: the view from here. Nat. Lang. Eng. 7(4), 275–300 (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)
Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling question answering to the web. In: WWW, pp. 150–161 (2001)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Miller, A.H., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP (2016)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100.000+ questions for machine comprehension of text. CoRR abs/1606.05250 (2016)
Sun, H., Ma, H., Yih, W.T., Tsai, C.T., Liu, J., Chang, M.W.: Open domain question answering via semantic enrichment. In: WWW (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Croce, D., Zelenanska, A., Basili, R. (2018). Neural Learning for Question Answering in Italian. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-03840-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03839-7
Online ISBN: 978-3-030-03840-3
eBook Packages: Computer ScienceComputer Science (R0)