Abstract
Recently, Question Answering (QA) systems have emerged as efficient solutions for helping users find proper answers to questions pertaining to a specific situation. One of the major modern paradigms for QA is based on Information Retrieval (IR) techniques, where the text of a user question is evaluated in order to extract a collection of relevant keywords, formulate queries on the top of them for a search engine and extract candidate answers from documents matching with the queries. Nevertheless, in the case of semantically complex and rich languages, like Italian, many concepts can be expressed in a variety of distinct linguistic forms. This problem particularly arises when QA is applied to smaller sets of documents pertaining to a closed domain, where an answer might appear only once, and its exact wording might differ partially or completely from the one used in the query. To solve this issue, this paper proposes a hybrid approach of Query Expansion (QE) where lexical resources and word embeddings (WEs) are combined to generate synonyms and hypernyms of relevant words extracted from the user question and contextualize this set with respect to the corpus of interest and with respect to the peculiar question. An experimental session has been arranged in order to compare the proposed QE approach with other different techniques and evaluate its impact of with respect to the accuracy of a QA system in extracting proper answers to factoid questions from documents pertaining to the Cultural Heritage domain. The experiments showed the effectiveness of the proposed solution with respect to three different evaluation metrics typically used in literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hwang, C.H.: Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: KRDB, vol. 21, pp. 14–20 (1999)
Attardi, G., Atzori, L., Simi, M.: Index expansion for machine reading and question answering. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Xu, Y., Jones, G.J., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–66. ACM (2009)
Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordnet. In: Proceedings of the 2009 International e-Conference on Advanced Science and Technology, pp. 52–55. IEEE Computer Society (2009)
Zhu, W., Xu, X., Hu, X., Song, I.Y., Allen, R.B.: Using umls-based re-weighting terms as a query expansion strategy. In: GrC, pp. 217–222 (2006)
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. Springer, New York (1994)
Serizawa, M., Kobayashi, I.: A study on query expansion based on topic distributions of retrieved documents. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 369–379. Springer (2013)
Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic computing (ICSC), pp. 9–15. IEEE (2010)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Peat, H.J., Willett, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. J. Am. Soc. Inf. Sci. 42(5), 378 (1991)
Jiani, H., Deng, W., Guo, J.: Improving retrieval performance by global analysis. In: ICPR 2006, pp. 703–706 (2006)
Alicante, A., Benerecetti, M., Corazza, A., Silvestri, S.: A distributed architecture to integrate ontological knowledge into information extraction. Int. J. Grid Util. Comput. 7(4), 245–256 (2016)
Bouramoul, A.: Contextualisation of information retrieval process and document ranking task in web search tools. Int. J. Space-Based Situated Comput. 6(2), 74–89 (2016)
Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Unsupervised entity and relation extraction from clinical records in Italian. Comput. Biol. Med. 72, 263–275 (2016)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM (2001)
Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In CIKM 2009, pp. 1895–1898 (2009)
Vaidyanathan, R., Das, S., Srivastava, N.: Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval (2015). arXiv preprint arXiv:1502.05168
Raman, K., Udupa, R., Bhattacharyya, P., Bhole, A.: On improving pseudo-relevance feedback using pseudo-irrelevant documents. In ECIR, pp. 573–576 (2010)
Montazeralghaem, A., Zamani, H., Shakery, A.: Axiomatic analysis for improving the log-logistic feedback model. In: SIGIR 2016, pp. 765–768 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: CIKM 2009, pp. 837–846 (2009)
Damiano, E., Spinelli, R., Esposito, M., De Pietro, G.: An effective corpus-based question answering pipeline for Italian, pp. 80–90 (2017)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Teufel, S.: An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering. In: Evaluation of Text and Speech Systems, pp. 163–186 (2007)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, pp. 45–50 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Damiano, E., Minutolo, A., Silvestri, S., Esposito, M. (2018). Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-69835-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69834-2
Online ISBN: 978-3-319-69835-9
eBook Packages: EngineeringEngineering (R0)