A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation

Lahbib, Wiem; Bounhas, Ibrahim; Slimani, Yahya

doi:10.1007/978-3-030-00840-6_25

A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation

Wiem Lahbib^12,13,
Ibrahim Bounhas^12,13,14 &
Yahya Slimani^12,13,15

Conference paper
First Online: 16 September 2018

614 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 935))

Abstract

This paper proposes a hybrid possibilistic approach for bilingual terminology extraction using possibility and necessity measures. On the one hand, we extract domain-relevant terms from the source language, and on the other hand, we build a co-occurrence-based translation graph, which is mined to translate terms in the target language. We compare our approach with different state-of-the art approaches. Experimental results show that the possibilistic approach reaches better results in terms of Recall, Precision and Mean Average Precision (MAP). The differences between the compared approaches show that our contribution is significant in terms of p-value.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In parallel corpora, documents are translated sentence-by- sentence.
2.
In comparable corpora, documents are dealing with same topics and subjects.
3.
http://www.statmt.org/moses/giza/GIZA++.html.
4.
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.
5.
https://camel.abudhabi.nyu.edu/madamira/.
6.
http://www.jarir.tn/kunuzproject.

References

Shah, N.S.: Review of indexing techniques applied in information retrieval. Pak. J. Eng. Technol. Sci. 5(1) (2016)
Google Scholar
Hazem, A., Morin, E.: Extraction de lexiques bilingues à partir de corpus comparables par combinaison de représentations contextuelles. In: Actes de la 20^ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), Sables d’Olonne, France, 17–21 June, pp. 243–256 (2013)
Google Scholar
Sellami, R., Sadat, F., Belguith, L.H.: Extraction de lexiques bilingues à partir de Wikipédia. In: Atelier de Traitement Automatique des Langues Africaines, JEP (conférence Journées d’Études en Parole) -TALN-RECITAL, Grenoble, France, 4–8 June (2012)
Google Scholar
Hazem, A., Morin, E.: Efficient data selection for bilingual terminology extraction from comparable corpora. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, 11–16 Dec 2016. Technical Papers, pp. 3401–3411 (2016)
Google Scholar
Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1(1), 3–28 (1978)
Article MathSciNet Google Scholar
Bouamor, D., Popescu, A., Semmar, N., Zweigenbaum, P.: Building specialized bilingual lexicons using large scale background knowledge. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18–21 Oct, pp. 479–489 (2013)
Google Scholar
Zhao, B., Xing, E.P.: HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In: Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 3–6 Dec, pp. 1689–1696 (2007)
Google Scholar
Lefever, E., Macken, L., Hoste, V.: Language-independent bilingual terminology extraction from a multilingual parallel corpus. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, 03 Apr, pp. 496–504 (2009)
Google Scholar
Okita, T., Hosseinzadeh Vahid, A., Way, A., Liu, Q.: The DCU terminology translation system for the medical query subtask at WMT 2014. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, USA, 26–27 June, pp. 239–245 (2014)
Google Scholar
Vulic, I., Moens, M.F.: Bilingual distributed word representations from document-aligned comparable data. J. Artif. Intell. Res. 55(1), 953–994 (2016)
Article MathSciNet Google Scholar
Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46
Chapter Google Scholar
Dubois, D., Prade, H.: Possibility theory and its application: where do we stand. Mathw. Soft Comput. 18(1), 18–31 (2011)
Google Scholar
Menacer, M.A., Boumerdas, A., Zakaria, C., Smaili, K.: A new language model based on possibility theory. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 127–139. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_8
Chapter Google Scholar
Bounhas, I., Ayed, R., Elayeb, B., Evrard, F., Saoud, N.B.B.: Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Comput. Speech Lang. 33(1), 67–87 (2015)
Article Google Scholar
Bounhas, I., Ayed, R., Elayeb, B., Saoud, N.B.B.: A hybrid possibilistic approach for Arabic full morphological disambiguation. Data Knowl. Eng. 100, 240–254 (2015)
Article Google Scholar
Lahbib, W., Bounhas, I., Slimani, Y.: Arabic terminology extraction and enrichment based on domain-specific text mining. In: The 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, 9–11 Nov, pp. 340–347 (2015)
Google Scholar
Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R.: A new similarity measure and mathematical model for text summarization. Problems Inf. Technol. 6(1), 42–53 (2015)
Google Scholar
Lahbib, W., Bounhas, I., Elayeb, B.: Arabic-English domain terminology extraction from aligned corpora. In: Meersman, R., et al. (eds.) OTM 2014. LNCS, vol. 8841, pp. 745–759. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_46
Chapter Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), Atlanta, Georgia, 10–12 June, pp. 746–751 (2013)
Google Scholar
Demˇsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

LISI Laboratory of Computer Science for Industrial Systems, Carthage University, Tunis, Tunisia
Wiem Lahbib, Ibrahim Bounhas & Yahya Slimani
JARIR: Joint Group for Artificial Reasoning and Information Retrieval, Manouba, Tunisia
Wiem Lahbib, Ibrahim Bounhas & Yahya Slimani
Higher Institute of Documentation, La Manouba University, Manouba, Tunisia
Ibrahim Bounhas
Higher Institute of Multimedia Arts, La Manouba University, Manouba, Tunisia
Yahya Slimani

Authors

Wiem Lahbib
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Bounhas
View author publications
You can also search for this author in PubMed Google Scholar
Yahya Slimani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wiem Lahbib .

Editor information

Editors and Affiliations

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Tadeusz Czachórski
Department of Electrical and Electronic Engineering, Imperial College London, London, UK
Erol Gelenbe
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Krzysztof Grochla
University of Houston, Houston, TX, USA
Ricardo Lent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lahbib, W., Bounhas, I., Slimani, Y. (2018). A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds) Computer and Information Sciences. ISCIS 2018. Communications in Computer and Information Science, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-00840-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-00840-6_25
Published: 16 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00839-0
Online ISBN: 978-3-030-00840-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics