Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering

HajiAminShirazi, Shahrzad; Momtazi, Saeedeh

doi:10.1007/s10590-020-09257-7

Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering

Published: 29 January 2021

Volume 34, pages 287–303, (2020)
Cite this article

Machine Translation

258 Accesses
2 Citations
Explore all metrics

Abstract

In today’s digital world people are keen on finding the knowledge they need by surfing the internet to find the answers to their questions. To this aim, many Community Question Answering (CQA) systems are established, in which people can ask their question and receive the required information. The gathered data in such systems is a rich repository for people to search through the available questions that have been answered before. CQA users, however, are not always successful in finding their answers in their native CQA systems. One solution to enrich the searching process is translating input questions and searching them in other CQA systems. This solution is useless as the process of translating each question is time-consuming. To make the non-English CQA systems richer in finding the available answers, the systems can develop a model to find similar English questions. To help Persian CQA systems in providing the answers to the questions, we propose a cross-lingual question retrieval model to retrieve relevant English questions to any input Persian question. In the proposed model, we benefit from a translation model-based retrieval using neural cross-lingual word embedding. Our experiment shows that the proposed model achieves 71.4% MRR and 83.5% success@5 using supervised cross-lingual word embedding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

Notes

Entezar is a Persian word which is has the same meaning as “expect” and cosine similarity is able to find the related and similar words which has related or similar meanings.
http://snowball.tartarus.org/algorithms/porter/stemmer.html.

References

Abdulmutalib N, Fuhr N (2010) Language models, smoothing, and IDF weighting. In Proceedings of the Information Retrieval 2010 Workshop at LWA 2010, Kassel, Germany, pp 169–174
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard Persian text collection. Knowl Based Syst 22(5):382–387 ISSN 0950-7051
Article Google Scholar
Artetxe M, Labaka G, Agirre E (2016) Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2289–2294
Bae K, Ko Y (2019) Improving question retrieval in community question answering service using dependency relations and question classification. J Assoc Inf Sci Technol 70(11):1194–1209
Article Google Scholar
Berger A, Lafferty J(1999) Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99, pp 222–229, New York, NY, USA, ACM
Bernhard D, Gurevych I (2009) Combining lexical semantic resources with question & answer archives for translation-based answer finding. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL ’09, pp 728–736, USA, Association for Computational Linguistics
Bogdanova D, Foster J (2016) This is how we do it: answer reranking for open-domain how questions with paragraph vectors and minimal feature engineering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp 1290–1295, San Diego, California, Association for Computational Linguistics
Carmel D, Lewin-Eytan L, Libov A, Maarek Y, Raviv A (2017) Promoting relevant results in time-ranked mail search. In: Barrett R, Cummings R, Agichtein E, Gabrilovich E (eds) Proceedings of the 26th International Conference on World Wide Web, WWW 2017. Perth, Australia, pp 1551–1559 ACM
Google Scholar
Chandar S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, Saha A (2014) An auto encoder approach to learning bilingual word representations. In Proceedings of the 27th annual conference on neural information processing systems, pp 1853–1861
Da San Martino G, Romeo S, Barroon-Cedeno A, Joty S, Maarquez L, Moschitti A, Nakov P (2017) Cross-language question re-ranking. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1145–1148, New York, NY, USA, Association for Computing Machinery
Deng Y, Lam W, Xie Y, Chen D, Li Y, Yang M, Shen Y (2020) Joint learning of answer selection and answer summary generation in community question answering. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp 7651–7658. AAAI Press
Espina A, Figueroa A (2017) Why was this asked? Automatically recognizing multiple motivations behind community question-answering questions. Expert. Syst. Appl. 80:126–135
Article Google Scholar
Ghasemi R, Asl AA, Momtazi S (2020) Deep Persian sentiment analysis: cross-lingual training for low-resource languages. J. Inf. Sci.
Gouws S, Bengio Y, Corrado G (2015) BilBOWA: fast bilingual distributed representations without word alignments. In: Proceedings of International Conference on Machine Learning
Hadifar A, Momtazi S (2018) The impact of corpus domain on word representation: a study on Persian word embeddings. J Lang Resour Eval 52(4):997–1019
Article Google Scholar
Jabbari F, Bakhshaei S, Mohammadzadeh S, Khadivi S (2012) Developing an open-domain English–Farsi translation system using afec: Amirkabir bilingual Farsi–English corpus. In: Proceedings of the fourth workshop on computational approaches to Arabic script-based language
Joty SR, Nakov P, Màrquez L, Jaradat I (2017) Cross-language learning with adversarial neural networks: application to community question answering. Proc SIGNLL Conf Comput Nat Lang Learn, New York, pp 226–237
Google Scholar
Karimzadehgan M, Zhai C (2010) Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, SIGIR ,10, pp 323–330, New York, NY, USA, Association for Computing Machinery
Lauly S, Boulanger A, Larochelle H (2013) Learning multilingual word representations using a bag-of-words auto encoder. In: Proceedings of the neural information processing systems workshop on deep learning
Luong M-T, Pham H, Manning DC (2015) Bilingual word representations with monolingual quality in mind. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 151–159
Merkel A, Klakow D (2007) Comparing Improved Language Models for Sentence Retrieval in Question Answering. In: Proceedings of the computational linguistics in the Netherlands conference, pp 475–481
Momtazi S (2018) Unsupervised latent Dirichlet allocation for supervised question classification. Inf Process Manage 54(3):380–393
Article Google Scholar
Momtazi S, Klakow D (2009) A word clustering approach for language model-based sentence retrieval in question answering systems. In: Proceedings of the annual international ACM conference on information and knowledge management (CIKM), pp 1911–1914. ACM
Momtazi S, Klakow D (2010) Hierarchical Pitman-Yor language model for information retrieval. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval. ACM
Murdock V, Bruce CW (2004) Simple translation models for sentence retrieval in factoid question answering. In: SIGIR 2004
Ponte MJ, Bruce CW ( 1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’98, pp 275–281, New York, NY, USA, Association for Computing Machinery
Rücklé A, Swarnkar K, Gurevych I (2019) Improved cross-lingual question retrieval for community question answering. In: The World Wide Web Conference, pp 3179–3186. Association for Computing Machinery
Ruder S, Vuliundefined I, Søgaard A (2019) A survey of cross-lingual word embedding models. J. Artif. Int. Res. 65(1):569–630 ISSN 1076-9757
MathSciNet MATH Google Scholar
Smith SL, Turban DH, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net
Søgaard A, Agić Z, Martínez Alonso H, Plank B, Bohnet B, Johannsen A (2015) Inverted indexing for cross-lingual NLP. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 1713–1722, Beijing, China, Association for Computational Linguistics
Vulic I, Moens M-F (2016) Bilingual distributed word representations from document-aligned comparable data. J. Artif. Int. Res. 55(1):953–994 ISSN 1076-9757
MathSciNet MATH Google Scholar
Vyas Y, Carpuat M (2016) Sparse bilingual word representations for cross-lingual lexical entailment. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1187–1197, San Diego, California, Association for Computational Linguistics
Xu B, Xing Z, Xia X, Lo D, Le DX-B (2017) Xsearch: a domain-specific cross-language relevant question retrieval tool. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017, pp 1009–1013, New York, NY, USA, Association for computing machinery
Sha Yuan Yu, Zhang JT, Hall W, Cabotà JB (2020) Expert finding in community question answering: a review. Artif. Intell. Rev. 53(2):843–874
Article Google Scholar
Zhai C, Lafferty J (2004) A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems (TOIS), New York
Book Google Scholar
Zuccon G, Koopman B, Bruza P, Azzopardi L (2015) Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of the 20th Australasian document computing symposium, ADCS ’15, New York, NY, USA, Association for Computing Machinery

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Shahrzad HajiAminShirazi & Saeedeh Momtazi

Authors

Shahrzad HajiAminShirazi
View author publications
You can also search for this author in PubMed Google Scholar
Saeedeh Momtazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeedeh Momtazi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

HajiAminShirazi, S., Momtazi, S. Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering. Machine Translation 34, 287–303 (2020). https://doi.org/10.1007/s10590-020-09257-7

Download citation

Received: 25 February 2020
Accepted: 14 December 2020
Published: 29 January 2021
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10590-020-09257-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Impact of word embedding models on text analytics in deep learning environment: a review

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Impact of word embedding models on text analytics in deep learning environment: a review

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation