Skip to main content

Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems

  • Conference paper
  • First Online:
Book cover Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2017)

Abstract

Recently, Question Answering (QA) systems have emerged as efficient solutions for helping users find proper answers to questions pertaining to a specific situation. One of the major modern paradigms for QA is based on Information Retrieval (IR) techniques, where the text of a user question is evaluated in order to extract a collection of relevant keywords, formulate queries on the top of them for a search engine and extract candidate answers from documents matching with the queries. Nevertheless, in the case of semantically complex and rich languages, like Italian, many concepts can be expressed in a variety of distinct linguistic forms. This problem particularly arises when QA is applied to smaller sets of documents pertaining to a closed domain, where an answer might appear only once, and its exact wording might differ partially or completely from the one used in the query. To solve this issue, this paper proposes a hybrid approach of Query Expansion (QE) where lexical resources and word embeddings (WEs) are combined to generate synonyms and hypernyms of relevant words extracted from the user question and contextualize this set with respect to the corpus of interest and with respect to the peculiar question. An experimental session has been arranged in order to compare the proposed QE approach with other different techniques and evaluate its impact of with respect to the accuracy of a QA system in extracting proper answers to factoid questions from documents pertaining to the Cultural Heritage domain. The experiments showed the effectiveness of the proposed solution with respect to three different evaluation metrics typically used in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tint.fbk.eu/.

  2. 2.

    http://snowball.tartarus.org/algorithms/italian/stop.txt.

References

  1. Hwang, C.H.: Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: KRDB, vol. 21, pp. 14–20 (1999)

    Google Scholar 

  2. Attardi, G., Atzori, L., Simi, M.: Index expansion for machine reading and question answering. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  3. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)

    Article  MATH  Google Scholar 

  4. Xu, Y., Jones, G.J., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–66. ACM (2009)

    Google Scholar 

  5. Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordnet. In: Proceedings of the 2009 International e-Conference on Advanced Science and Technology, pp. 52–55. IEEE Computer Society (2009)

    Google Scholar 

  6. Zhu, W., Xu, X., Hu, X., Song, I.Y., Allen, R.B.: Using umls-based re-weighting terms as a query expansion strategy. In: GrC, pp. 217–222 (2006)

    Google Scholar 

  7. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. Springer, New York (1994)

    Google Scholar 

  8. Serizawa, M., Kobayashi, I.: A study on query expansion based on topic distributions of retrieved documents. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 369–379. Springer (2013)

    Google Scholar 

  9. Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic computing (ICSC), pp. 9–15. IEEE (2010)

    Google Scholar 

  10. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  11. Peat, H.J., Willett, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. J. Am. Soc. Inf. Sci. 42(5), 378 (1991)

    Article  Google Scholar 

  12. Jiani, H., Deng, W., Guo, J.: Improving retrieval performance by global analysis. In: ICPR 2006, pp. 703–706 (2006)

    Google Scholar 

  13. Alicante, A., Benerecetti, M., Corazza, A., Silvestri, S.: A distributed architecture to integrate ontological knowledge into information extraction. Int. J. Grid Util. Comput. 7(4), 245–256 (2016)

    Article  Google Scholar 

  14. Bouramoul, A.: Contextualisation of information retrieval process and document ranking task in web search tools. Int. J. Space-Based Situated Comput. 6(2), 74–89 (2016)

    Article  Google Scholar 

  15. Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Unsupervised entity and relation extraction from clinical records in Italian. Comput. Biol. Med. 72, 263–275 (2016)

    Article  Google Scholar 

  16. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)

    Google Scholar 

  17. Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM (2001)

    Google Scholar 

  18. Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In CIKM 2009, pp. 1895–1898 (2009)

    Google Scholar 

  19. Vaidyanathan, R., Das, S., Srivastava, N.: Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval (2015). arXiv preprint arXiv:1502.05168

  20. Raman, K., Udupa, R., Bhattacharyya, P., Bhole, A.: On improving pseudo-relevance feedback using pseudo-irrelevant documents. In ECIR, pp. 573–576 (2010)

    Google Scholar 

  21. Montazeralghaem, A., Zamani, H., Shakery, A.: Axiomatic analysis for improving the log-logistic feedback model. In: SIGIR 2016, pp. 765–768 (2016)

    Google Scholar 

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  23. Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: CIKM 2009, pp. 837–846 (2009)

    Google Scholar 

  24. Damiano, E., Spinelli, R., Esposito, M., De Pietro, G.: An effective corpus-based question answering pipeline for Italian, pp. 80–90 (2017)

    Google Scholar 

  25. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  26. Teufel, S.: An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering. In: Evaluation of Text and Speech Systems, pp. 163–186 (2007)

    Google Scholar 

  27. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, pp. 45–50 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuele Damiano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Damiano, E., Minutolo, A., Silvestri, S., Esposito, M. (2018). Query Expansion Based on WordNet and Word2vec for Italian Question Answering Systems. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69835-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69834-2

  • Online ISBN: 978-3-319-69835-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics