Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents

Wali, Wafa; Gargouri, Bilel; Ben Hamadou, Abdelmajid

doi:10.1007/978-3-319-76348-4_6

Wafa Wali¹⁸,
Bilel Gargouri¹⁸ &
Abdelmajid Ben Hamadou¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1842 Accesses
2 Citations

Abstract

Plagiarism detection it is a challenging task, particularly in natural language texts. Some plagiarism detection tools have been developed for diverse natural languages, especially English. In this paper, we propose, a new plagiarism detection system devoted to Arabic text documents. This system is based on an algorithm that uses a semantic sentence similarity measure. Indeed, the sentence similarity measure aggregates in a linear function between three components: the lexical-based LS including the common words, the semantic-based SS using the synonymy relationships, and the syntactico-semantic- based SSS semantic arguments properties notably semantic argument and thematic role. It measures the semantic similarity between words that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the LMF Arabic standardized dictionary ElMadar. The performance of the proposed system was confirmed through experiments with student thesis reports that promising capabilities in identifying literal and some types of intelligent plagiarism. We also demonstrate its advantages over other plagiarism detection tools, including Aplag.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi, A., Idris, N., Alguliyev, R.M., Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)
Article Google Scholar
Riad, A.M., Farahat, A.S., Zaher, M.A.: Studying different methods for plagiarism detection. Int. J. Comput. Sci. Eng. (IJCSE) 2(5), 147–154 (2013)
Google Scholar
Alzahrani, S.M., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 133–149 (2012)
Article Google Scholar
Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)
Article Google Scholar
Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic plagiarism detection using n-gram classes. In: EMNLP, pp. 1459–1464 (2014)
Google Scholar
Darwish, K., Magdy, W. et al.: Arabic information retrieval. Found. Trends® Inf. Retr. 7(4), 239–342 (2014)
Google Scholar
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Google Scholar
Franco-Salvador, M., Rosso, P., Montes-y Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)
Article Google Scholar
Green, S., Manning, C.D.: Better Arabic parsing: baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Paris (1901)
Google Scholar
Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents, pp. 145–153. Springer, Heidelberg (2012)
Google Scholar
Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents. Comput. Sci. Inf. Technol. 01–09 (2015)
Google Scholar
Khemakhem, A., Gargouri, A., Hamadou, A.B., Francopoulou, G.: ISO standard modeling of a large Arabic dictionary. Nat. Lang. Eng. 22, 849–879 (2016)
Article Google Scholar
Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)
Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)
Google Scholar
Velásquez, J.D., Covacevich, Y., Molina, F., Marrese-Taylor, E., Rodríguez, C., Bravo-Marquez, F.: Docode 3.0 (document copy detector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fusion 27, 64–75 (2016)
Article Google Scholar
Wali, W., Gargouri, B., Hamadou, A.B.: Supervised learning to measure the semantic similarity between Arabic sentences. In: Computational Collective Intelligence, pp. 158–167. Springer, Cham (2015)
Google Scholar
Wali, W., Gargouri, B., Hamadou, A.B.: Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge. Vietnam J. Comput. Sci. 4(1), 51–60 (2017)
Article Google Scholar
Wali, W., Gargouri, B., Hamadou, A.B.: Using standardized lexical semantic knowledge to measure similarity. In: Knowledge Science, Engineering and Management, pp. 93–104. Springer, Cham (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

MIRACL Laboratory, Sfax University, Sfax, Tunisia
Wafa Wali, Bilel Gargouri & Abdelmajid Ben Hamadou

Authors

Wafa Wali
View author publications
You can also search for this author in PubMed Google Scholar
Bilel Gargouri
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmajid Ben Hamadou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wafa Wali .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs , Auburn, Washington, USA
Ajith Abraham
Department of Computer Science, South Asian University, Chanakyapuri, Delhi, India
Pranab Kr. Muhuri
Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka , Durian Tunggal, Melaka, Malaysia
Azah Kamilah Muda
Machine Intelligence Research Labs , Auburn, Washington, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wali, W., Gargouri, B., Ben Hamadou, A. (2018). Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-76348-4_6
Published: 22 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics