In today’s digital world people are keen on finding the knowledge they need by surfing the internet to find the answers to their questions. To this aim, many Community Question Answering (CQA) systems are established, in which people can ask their question and receive the required information. The gathered data in such systems is a rich repository for people to search through the available questions that have been answered before. CQA users, however, are not always successful in finding their answers in their native CQA systems. One solution to enrich the searching process is translating input questions and searching them in other CQA systems. This solution is useless as the process of translating each question is time-consuming. To make the non-English CQA systems richer in finding the available answers, the systems can develop a model to find similar English questions. To help Persian CQA systems in providing the answers to the questions, we propose a cross-lingual question retrieval model to retrieve relevant English questions to any input Persian question. In the proposed model, we benefit from a translation model-based retrieval using neural cross-lingual word embedding. Our experiment shows that the proposed model achieves 71.4% MRR and 83.5% success@5 using supervised cross-lingual word embedding.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Entezar is a Persian word which is has the same meaning as “expect” and cosine similarity is able to find the related and similar words which has related or similar meanings.
Abdulmutalib N, Fuhr N (2010) Language models, smoothing, and IDF weighting. In Proceedings of the Information Retrieval 2010 Workshop at LWA 2010, Kassel, Germany, pp 169–174
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard Persian text collection. Knowl Based Syst 22(5):382–387 ISSN 0950-7051
Artetxe M, Labaka G, Agirre E (2016) Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2289–2294
Bae K, Ko Y (2019) Improving question retrieval in community question answering service using dependency relations and question classification. J Assoc Inf Sci Technol 70(11):1194–1209
Berger A, Lafferty J(1999) Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99, pp 222–229, New York, NY, USA, ACM
Bernhard D, Gurevych I (2009) Combining lexical semantic resources with question & answer archives for translation-based answer finding. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL ’09, pp 728–736, USA, Association for Computational Linguistics
Bogdanova D, Foster J (2016) This is how we do it: answer reranking for open-domain how questions with paragraph vectors and minimal feature engineering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp 1290–1295, San Diego, California, Association for Computational Linguistics
Carmel D, Lewin-Eytan L, Libov A, Maarek Y, Raviv A (2017) Promoting relevant results in time-ranked mail search. In: Barrett R, Cummings R, Agichtein E, Gabrilovich E (eds) Proceedings of the 26th International Conference on World Wide Web, WWW 2017. Perth, Australia, pp 1551–1559 ACM
Chandar S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, Saha A (2014) An auto encoder approach to learning bilingual word representations. In Proceedings of the 27th annual conference on neural information processing systems, pp 1853–1861
Da San Martino G, Romeo S, Barroon-Cedeno A, Joty S, Maarquez L, Moschitti A, Nakov P (2017) Cross-language question re-ranking. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1145–1148, New York, NY, USA, Association for Computing Machinery
Deng Y, Lam W, Xie Y, Chen D, Li Y, Yang M, Shen Y (2020) Joint learning of answer selection and answer summary generation in community question answering. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp 7651–7658. AAAI Press
Espina A, Figueroa A (2017) Why was this asked? Automatically recognizing multiple motivations behind community question-answering questions. Expert. Syst. Appl. 80:126–135
Ghasemi R, Asl AA, Momtazi S (2020) Deep Persian sentiment analysis: cross-lingual training for low-resource languages. J. Inf. Sci.
Gouws S, Bengio Y, Corrado G (2015) BilBOWA: fast bilingual distributed representations without word alignments. In: Proceedings of International Conference on Machine Learning
Hadifar A, Momtazi S (2018) The impact of corpus domain on word representation: a study on Persian word embeddings. J Lang Resour Eval 52(4):997–1019
Jabbari F, Bakhshaei S, Mohammadzadeh S, Khadivi S (2012) Developing an open-domain English–Farsi translation system using afec: Amirkabir bilingual Farsi–English corpus. In: Proceedings of the fourth workshop on computational approaches to Arabic script-based language
Joty SR, Nakov P, Màrquez L, Jaradat I (2017) Cross-language learning with adversarial neural networks: application to community question answering. Proc SIGNLL Conf Comput Nat Lang Learn, New York, pp 226–237
Karimzadehgan M, Zhai C (2010) Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, SIGIR ,10, pp 323–330, New York, NY, USA, Association for Computing Machinery
Lauly S, Boulanger A, Larochelle H (2013) Learning multilingual word representations using a bag-of-words auto encoder. In: Proceedings of the neural information processing systems workshop on deep learning
Luong M-T, Pham H, Manning DC (2015) Bilingual word representations with monolingual quality in mind. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 151–159
Merkel A, Klakow D (2007) Comparing Improved Language Models for Sentence Retrieval in Question Answering. In: Proceedings of the computational linguistics in the Netherlands conference, pp 475–481
Momtazi S (2018) Unsupervised latent Dirichlet allocation for supervised question classification. Inf Process Manage 54(3):380–393
Momtazi S, Klakow D (2009) A word clustering approach for language model-based sentence retrieval in question answering systems. In: Proceedings of the annual international ACM conference on information and knowledge management (CIKM), pp 1911–1914. ACM
Momtazi S, Klakow D (2010) Hierarchical Pitman-Yor language model for information retrieval. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval. ACM
Murdock V, Bruce CW (2004) Simple translation models for sentence retrieval in factoid question answering. In: SIGIR 2004
Ponte MJ, Bruce CW ( 1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’98, pp 275–281, New York, NY, USA, Association for Computing Machinery
Rücklé A, Swarnkar K, Gurevych I (2019) Improved cross-lingual question retrieval for community question answering. In: The World Wide Web Conference, pp 3179–3186. Association for Computing Machinery
Ruder S, Vuliundefined I, Søgaard A (2019) A survey of cross-lingual word embedding models. J. Artif. Int. Res. 65(1):569–630 ISSN 1076-9757
Smith SL, Turban DH, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net
Søgaard A, Agić Z, Martínez Alonso H, Plank B, Bohnet B, Johannsen A (2015) Inverted indexing for cross-lingual NLP. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 1713–1722, Beijing, China, Association for Computational Linguistics
Vulic I, Moens M-F (2016) Bilingual distributed word representations from document-aligned comparable data. J. Artif. Int. Res. 55(1):953–994 ISSN 1076-9757
Vyas Y, Carpuat M (2016) Sparse bilingual word representations for cross-lingual lexical entailment. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1187–1197, San Diego, California, Association for Computational Linguistics
Xu B, Xing Z, Xia X, Lo D, Le DX-B (2017) Xsearch: a domain-specific cross-language relevant question retrieval tool. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017, pp 1009–1013, New York, NY, USA, Association for computing machinery
Sha Yuan Yu, Zhang JT, Hall W, Cabotà JB (2020) Expert finding in community question answering: a review. Artif. Intell. Rev. 53(2):843–874
Zhai C, Lafferty J (2004) A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems (TOIS), New York
Zuccon G, Koopman B, Bruza P, Azzopardi L (2015) Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of the 20th Australasian document computing symposium, ADCS ’15, New York, NY, USA, Association for Computing Machinery
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
HajiAminShirazi, S., Momtazi, S. Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering. Machine Translation (2021). https://doi.org/10.1007/s10590-020-09257-7
- Community question answering
- Cross-lingual embedding
- Question retrieval
- Low-resource languages