Abstract
Plagiarism detection it is a challenging task, particularly in natural language texts. Some plagiarism detection tools have been developed for diverse natural languages, especially English. In this paper, we propose, a new plagiarism detection system devoted to Arabic text documents. This system is based on an algorithm that uses a semantic sentence similarity measure. Indeed, the sentence similarity measure aggregates in a linear function between three components: the lexical-based LS including the common words, the semantic-based SS using the synonymy relationships, and the syntactico-semantic- based SSS semantic arguments properties notably semantic argument and thematic role. It measures the semantic similarity between words that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the LMF Arabic standardized dictionary ElMadar. The performance of the proposed system was confirmed through experiments with student thesis reports that promising capabilities in identifying literal and some types of intelligent plagiarism. We also demonstrate its advantages over other plagiarism detection tools, including Aplag.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi, A., Idris, N., Alguliyev, R.M., Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)
Riad, A.M., Farahat, A.S., Zaher, M.A.: Studying different methods for plagiarism detection. Int. J. Comput. Sci. Eng. (IJCSE) 2(5), 147–154 (2013)
Alzahrani, S.M., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 133–149 (2012)
Barrón-Cedeño, A., Vila, M., MartÃ, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)
Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic plagiarism detection using n-gram classes. In: EMNLP, pp. 1459–1464 (2014)
Darwish, K., Magdy, W. et al.: Arabic information retrieval. Found. Trends® Inf. Retr. 7(4), 239–342 (2014)
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Franco-Salvador, M., Rosso, P., Montes-y Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)
Green, S., Manning, C.D.: Better Arabic parsing: baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Paris (1901)
Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents, pp. 145–153. Springer, Heidelberg (2012)
Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents. Comput. Sci. Inf. Technol. 01–09 (2015)
Khemakhem, A., Gargouri, A., Hamadou, A.B., Francopoulou, G.: ISO standard modeling of a large Arabic dictionary. Nat. Lang. Eng. 22, 849–879 (2016)
Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)
Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)
Velásquez, J.D., Covacevich, Y., Molina, F., Marrese-Taylor, E., RodrÃguez, C., Bravo-Marquez, F.: Docode 3.0 (document copy detector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fusion 27, 64–75 (2016)
Wali, W., Gargouri, B., Hamadou, A.B.: Supervised learning to measure the semantic similarity between Arabic sentences. In: Computational Collective Intelligence, pp. 158–167. Springer, Cham (2015)
Wali, W., Gargouri, B., Hamadou, A.B.: Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge. Vietnam J. Comput. Sci. 4(1), 51–60 (2017)
Wali, W., Gargouri, B., Hamadou, A.B.: Using standardized lexical semantic knowledge to measure similarity. In: Knowledge Science, Engineering and Management, pp. 93–104. Springer, Cham (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wali, W., Gargouri, B., Ben Hamadou, A. (2018). Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-76348-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)