Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems

Damiano, Emanuele; Minutolo, Aniello; Silvestri, Stefano; Esposito, Massimo

doi:10.1007/978-3-319-69835-9_29

Emanuele Damiano⁵,
Aniello Minutolo⁵,
Stefano Silvestri⁵ &
…
Massimo Esposito⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 13))

Included in the following conference series:

International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

1435 Accesses
1 Citations

Abstract

Recently, Question Answering (QA) systems have emerged as efficient solutions for helping users find proper answers to questions pertaining to a specific situation. One of the major modern paradigms for QA is based on Information Retrieval (IR) techniques, where the text of a user question is evaluated in order to extract a collection of relevant keywords, formulate queries on the top of them for a search engine and extract candidate answers from documents matching with the queries. Nevertheless, in the case of semantically complex and rich languages, like Italian, many concepts can be expressed in a variety of distinct linguistic forms. This problem particularly arises when QA is applied to smaller sets of documents pertaining to a closed domain, where an answer might appear only once, and its exact wording might differ partially or completely from the one used in the query. To solve this issue, this paper proposes a hybrid approach of Query Expansion (QE) where lexical resources and word embeddings (WEs) are combined to generate synonyms and hypernyms of relevant words extracted from the user question and contextualize this set with respect to the corpus of interest and with respect to the peculiar question. An experimental session has been arranged in order to compare the proposed QE approach with other different techniques and evaluate its impact of with respect to the accuracy of a QA system in extracting proper answers to factoid questions from documents pertaining to the Cultural Heritage domain. The experiments showed the effectiveness of the proposed solution with respect to three different evaluation metrics typically used in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Hwang, C.H.: Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: KRDB, vol. 21, pp. 14–20 (1999)
Google Scholar
Attardi, G., Atzori, L., Simi, M.: Index expansion for machine reading and question answering. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Article MATH Google Scholar
Xu, Y., Jones, G.J., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–66. ACM (2009)
Google Scholar
Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordnet. In: Proceedings of the 2009 International e-Conference on Advanced Science and Technology, pp. 52–55. IEEE Computer Society (2009)
Google Scholar
Zhu, W., Xu, X., Hu, X., Song, I.Y., Allen, R.B.: Using umls-based re-weighting terms as a query expansion strategy. In: GrC, pp. 217–222 (2006)
Google Scholar
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. Springer, New York (1994)
Google Scholar
Serizawa, M., Kobayashi, I.: A study on query expansion based on topic distributions of retrieved documents. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 369–379. Springer (2013)
Google Scholar
Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic computing (ICSC), pp. 9–15. IEEE (2010)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Peat, H.J., Willett, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. J. Am. Soc. Inf. Sci. 42(5), 378 (1991)
Article Google Scholar
Jiani, H., Deng, W., Guo, J.: Improving retrieval performance by global analysis. In: ICPR 2006, pp. 703–706 (2006)
Google Scholar
Alicante, A., Benerecetti, M., Corazza, A., Silvestri, S.: A distributed architecture to integrate ontological knowledge into information extraction. Int. J. Grid Util. Comput. 7(4), 245–256 (2016)
Article Google Scholar
Bouramoul, A.: Contextualisation of information retrieval process and document ranking task in web search tools. Int. J. Space-Based Situated Comput. 6(2), 74–89 (2016)
Article Google Scholar
Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Unsupervised entity and relation extraction from clinical records in Italian. Comput. Biol. Med. 72, 263–275 (2016)
Article Google Scholar
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM (2001)
Google Scholar
Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In CIKM 2009, pp. 1895–1898 (2009)
Google Scholar
Vaidyanathan, R., Das, S., Srivastava, N.: Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval (2015). arXiv preprint arXiv:1502.05168
Raman, K., Udupa, R., Bhattacharyya, P., Bhole, A.: On improving pseudo-relevance feedback using pseudo-irrelevant documents. In ECIR, pp. 573–576 (2010)
Google Scholar
Montazeralghaem, A., Zamani, H., Shakery, A.: Axiomatic analysis for improving the log-logistic feedback model. In: SIGIR 2016, pp. 765–768 (2016)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: CIKM 2009, pp. 837–846 (2009)
Google Scholar
Damiano, E., Spinelli, R., Esposito, M., De Pietro, G.: An effective corpus-based question answering pipeline for Italian, pp. 80–90 (2017)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Teufel, S.: An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering. In: Evaluation of Text and Speech Systems, pp. 163–186 (2007)
Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, pp. 45–50 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for High Performance Computing and Networking, ICAR-CNR, via P. Castellino, 111-80131, Naples, Italy
Emanuele Damiano, Aniello Minutolo, Stefano Silvestri & Massimo Esposito

Authors

Emanuele Damiano
View author publications
You can also search for this author in PubMed Google Scholar
Aniello Minutolo
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Silvestri
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuele Damiano .

Editor information

Editors and Affiliations

Technical University of Catalonia, Barcelona, Spain
Fatos Xhafa
Open University of Catalonia, Barcelona, Spain
Santi Caballé
Fukuoka Institute of Technology (FIT), Fukuoka, Japan
Leonard Barolli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Damiano, E., Minutolo, A., Silvestri, S., Esposito, M. (2018). Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-69835-9_29
Published: 03 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69834-2
Online ISBN: 978-3-319-69835-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics