Skip to main content

Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Abstract

Plagiarism detection it is a challenging task, particularly in natural language texts. Some plagiarism detection tools have been developed for diverse natural languages, especially English. In this paper, we propose, a new plagiarism detection system devoted to Arabic text documents. This system is based on an algorithm that uses a semantic sentence similarity measure. Indeed, the sentence similarity measure aggregates in a linear function between three components: the lexical-based LS including the common words, the semantic-based SS using the synonymy relationships, and the syntactico-semantic- based SSS semantic arguments properties notably semantic argument and thematic role. It measures the semantic similarity between words that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the LMF Arabic standardized dictionary ElMadar. The performance of the proposed system was confirmed through experiments with student thesis reports that promising capabilities in identifying literal and some types of intelligent plagiarism. We also demonstrate its advantages over other plagiarism detection tools, including Aplag.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdi, A., Idris, N., Alguliyev, R.M., Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)

    Article  Google Scholar 

  2. Riad, A.M., Farahat, A.S., Zaher, M.A.: Studying different methods for plagiarism detection. Int. J. Comput. Sci. Eng. (IJCSE) 2(5), 147–154 (2013)

    Google Scholar 

  3. Alzahrani, S.M., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 133–149 (2012)

    Article  Google Scholar 

  4. Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)

    Article  Google Scholar 

  5. Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic plagiarism detection using n-gram classes. In: EMNLP, pp. 1459–1464 (2014)

    Google Scholar 

  6. Darwish, K., Magdy, W. et al.: Arabic information retrieval. Found. Trends® Inf. Retr. 7(4), 239–342 (2014)

    Google Scholar 

  7. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)

    Google Scholar 

  8. Franco-Salvador, M., Rosso, P., Montes-y Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)

    Article  Google Scholar 

  9. Green, S., Manning, C.D.: Better Arabic parsing: baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  10. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Paris (1901)

    Google Scholar 

  11. Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents, pp. 145–153. Springer, Heidelberg (2012)

    Google Scholar 

  12. Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents. Comput. Sci. Inf. Technol. 01–09 (2015)

    Google Scholar 

  13. Khemakhem, A., Gargouri, A., Hamadou, A.B., Francopoulou, G.: ISO standard modeling of a large Arabic dictionary. Nat. Lang. Eng. 22, 849–879 (2016)

    Article  Google Scholar 

  14. Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)

    Google Scholar 

  15. Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)

    Google Scholar 

  16. Velásquez, J.D., Covacevich, Y., Molina, F., Marrese-Taylor, E., Rodríguez, C., Bravo-Marquez, F.: Docode 3.0 (document copy detector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fusion 27, 64–75 (2016)

    Article  Google Scholar 

  17. Wali, W., Gargouri, B., Hamadou, A.B.: Supervised learning to measure the semantic similarity between Arabic sentences. In: Computational Collective Intelligence, pp. 158–167. Springer, Cham (2015)

    Google Scholar 

  18. Wali, W., Gargouri, B., Hamadou, A.B.: Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge. Vietnam J. Comput. Sci. 4(1), 51–60 (2017)

    Article  Google Scholar 

  19. Wali, W., Gargouri, B., Hamadou, A.B.: Using standardized lexical semantic knowledge to measure similarity. In: Knowledge Science, Engineering and Management, pp. 93–104. Springer, Cham (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wafa Wali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wali, W., Gargouri, B., Ben Hamadou, A. (2018). Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76348-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76347-7

  • Online ISBN: 978-3-319-76348-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics