Skip to main content

A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 935))

Abstract

This paper proposes a hybrid possibilistic approach for bilingual terminology extraction using possibility and necessity measures. On the one hand, we extract domain-relevant terms from the source language, and on the other hand, we build a co-occurrence-based translation graph, which is mined to translate terms in the target language. We compare our approach with different state-of-the art approaches. Experimental results show that the possibilistic approach reaches better results in terms of Recall, Precision and Mean Average Precision (MAP). The differences between the compared approaches show that our contribution is significant in terms of p-value.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In parallel corpora, documents are translated sentence-by- sentence.

  2. 2.

    In comparable corpora, documents are dealing with same topics and subjects.

  3. 3.

    http://www.statmt.org/moses/giza/GIZA++.html.

  4. 4.

    http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.

  5. 5.

    https://camel.abudhabi.nyu.edu/madamira/.

  6. 6.

    http://www.jarir.tn/kunuzproject.

References

  1. Shah, N.S.: Review of indexing techniques applied in information retrieval. Pak. J. Eng. Technol. Sci. 5(1) (2016)

    Google Scholar 

  2. Hazem, A., Morin, E.: Extraction de lexiques bilingues à partir de corpus comparables par combinaison de représentations contextuelles. In: Actes de la 20ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), Sables d’Olonne, France, 17–21 June, pp. 243–256 (2013)

    Google Scholar 

  3. Sellami, R., Sadat, F., Belguith, L.H.: Extraction de lexiques bilingues à partir de Wikipédia. In: Atelier de Traitement Automatique des Langues Africaines, JEP (conférence Journées d’Études en Parole) -TALN-RECITAL, Grenoble, France, 4–8 June (2012)

    Google Scholar 

  4. Hazem, A., Morin, E.: Efficient data selection for bilingual terminology extraction from comparable corpora. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, 11–16 Dec 2016. Technical Papers, pp. 3401–3411 (2016)

    Google Scholar 

  5. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1(1), 3–28 (1978)

    Article  MathSciNet  Google Scholar 

  6. Bouamor, D., Popescu, A., Semmar, N., Zweigenbaum, P.: Building specialized bilingual lexicons using large scale background knowledge. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18–21 Oct, pp. 479–489 (2013)

    Google Scholar 

  7. Zhao, B., Xing, E.P.: HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In: Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 3–6 Dec, pp. 1689–1696 (2007)

    Google Scholar 

  8. Lefever, E., Macken, L., Hoste, V.: Language-independent bilingual terminology extraction from a multilingual parallel corpus. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, 03 Apr, pp. 496–504 (2009)

    Google Scholar 

  9. Okita, T., Hosseinzadeh Vahid, A., Way, A., Liu, Q.: The DCU terminology translation system for the medical query subtask at WMT 2014. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, USA, 26–27 June, pp. 239–245 (2014)

    Google Scholar 

  10. Vulic, I., Moens, M.F.: Bilingual distributed word representations from document-aligned comparable data. J. Artif. Intell. Res. 55(1), 953–994 (2016)

    Article  MathSciNet  Google Scholar 

  11. Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46

    Chapter  Google Scholar 

  12. Dubois, D., Prade, H.: Possibility theory and its application: where do we stand. Mathw. Soft Comput. 18(1), 18–31 (2011)

    Google Scholar 

  13. Menacer, M.A., Boumerdas, A., Zakaria, C., Smaili, K.: A new language model based on possibility theory. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 127–139. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_8

    Chapter  Google Scholar 

  14. Bounhas, I., Ayed, R., Elayeb, B., Evrard, F., Saoud, N.B.B.: Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Comput. Speech Lang. 33(1), 67–87 (2015)

    Article  Google Scholar 

  15. Bounhas, I., Ayed, R., Elayeb, B., Saoud, N.B.B.: A hybrid possibilistic approach for Arabic full morphological disambiguation. Data Knowl. Eng. 100, 240–254 (2015)

    Article  Google Scholar 

  16. Lahbib, W., Bounhas, I., Slimani, Y.: Arabic terminology extraction and enrichment based on domain-specific text mining. In: The 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, 9–11 Nov, pp. 340–347 (2015)

    Google Scholar 

  17. Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R.: A new similarity measure and mathematical model for text summarization. Problems Inf. Technol. 6(1), 42–53 (2015)

    Google Scholar 

  18. Lahbib, W., Bounhas, I., Elayeb, B.: Arabic-English domain terminology extraction from aligned corpora. In: Meersman, R., et al. (eds.) OTM 2014. LNCS, vol. 8841, pp. 745–759. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_46

    Chapter  Google Scholar 

  19. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), Atlanta, Georgia, 10–12 June, pp. 746–751 (2013)

    Google Scholar 

  20. Demˇsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wiem Lahbib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lahbib, W., Bounhas, I., Slimani, Y. (2018). A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds) Computer and Information Sciences. ISCIS 2018. Communications in Computer and Information Science, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-00840-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00840-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00839-0

  • Online ISBN: 978-3-030-00840-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics