Skip to main content

Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce

  • Chapter
  • First Online:

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 372))

Abstract

Ranging from modifying texts into semantically equivalent up to translation and adopting ideas, without proper referencing to its originator, Cross Language Plagiarism can be of many different natures. Among the most common problems in any data processing system is reliable large-scale text comparison, especially in a fuzzy semantic based similarity due to the complexity of natural languages in particular Arabic, and the increasing number of publications which raise the rate of suspicious documents sources of plagiarism. CLPD is more complicated than monolingual plagiarism, it goes beyond copy+translate and paste, consequently the detecting process exposes the need for vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices of hidden plagiarism in Arabic documents translated from English or French sources. In this paper, we propose a fuzzy-semantic similarity for CLPD using WordNet taxonomy and three semantic approaches Wu and Palmer, Lin and Leacock-Chodorow for Arabic documents; the work has been parallelized using Apache Hadoop with HDFS file system and MapReduce programming model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. G.A. Miller, WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  2. M.A. Beyer, D. Laney, The importance of big data : a definition, Stamford CT Gart., 2014–2018 (2012)

    Google Scholar 

  3. M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, Muharemagic E., Deep learning applications and challenges in big data analytics. J. Big Data, 2 (2015)

    Google Scholar 

  4. B. Parhami, A Highly Parallel Computing System for Information Retrieval, in Proceedings of the December 5–7, 1972, Fall Joint Computer Conference, Part II (New York, NY, USA, 1972) pp. 681–690

    Google Scholar 

  5. Q. Zhang, Y. Zhang, H. Yu, X. Huang, Efficient partial-duplicate detection based on sequence matching, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–682 (2010)

    Google Scholar 

  6. J. Dwivedi, A. Tiwary, Plagiarism detection on bigdata using modified map-reduced based SCAM algorithm. Int. Conference on Innovative Mechanisms Ind. Appl. (ICIMIA) 2017, 608–610 (2017)

    Google Scholar 

  7. H. Ezzikouri, M. Erritali, M. Oukessou, Fuzzy-semantic similarity for automatic multilingual plagiarism detection. Int. J. Adv. Comput. Sci. Appl. 8(9), 86–90 (2017)

    Google Scholar 

  8. C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification. WordNet Electron. Lex. Database 49(2), 265–283 (1998)

    Google Scholar 

  9. Z. Wu, M. Palmer, Verbs semantics and lexical selection, in Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)

    Google Scholar 

  10. P. Rensik, Using information content to evaluate semantic similarity, in Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  11. D. Lin, An information-theoretic definition of similarity, in Icml, 98, 296–304 (1998)

    Google Scholar 

  12. G. Hirst, D. St-Onge, Lexical chains as representations of context for the detection and correction of malapropisms. WordNet Electron. Lex. Database 305, 305–332 (1998)

    Google Scholar 

  13. T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms-Secund Edition. (McGraw-Hill, 2001)

    Google Scholar 

  14. D. Gupta, K. Vani, C.K. Singh, Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection, in International Conference on Advances in Computing, Communications and Informatics (ICACCI, 2014) pp. 2694–2699

    Google Scholar 

  15. N. Werro, Fuzzy Classification of Online Customers (Thesis University of Fribourg (Switzerland), Fuzzy Management Methods, 2008)

    Google Scholar 

  16. C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, D. McClosky, The Stanford CoreNLP natural language processing toolkit, in Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60 (2014)

    Google Scholar 

  17. H. Ezzikouri, M. Erritali, M. Oukessou, Semantic similarity/relatedness for cross language plagiarism detection. Indones. J. Electr. Eng. Comput. Sci. 1(2), 371–374 (2016)

    Article  Google Scholar 

  18. S. Alzahrani, N. Salim, Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler Harman 1176, 1–8 (2010)

    Google Scholar 

  19. R. Yerra, Y.-K. Ng, A sentence-based copy detection approach for web documents, in International Conference on Fuzzy Systems and Knowledge Discovery. vol. 2005, pp. 557–570 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Oukessou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ezzikouri, H., Oukessou, M., Erritali, M., Madani, Y. (2019). Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce. In: Melliani, S., Castillo, O. (eds) Recent Advances in Intuitionistic Fuzzy Logic Systems. Studies in Fuzziness and Soft Computing, vol 372. Springer, Cham. https://doi.org/10.1007/978-3-030-02155-9_15

Download citation

Publish with us

Policies and ethics