Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce

Ezzikouri, H.; Oukessou, M.; Erritali, M.; Madani, Y.

doi:10.1007/978-3-030-02155-9_15

Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce

H. Ezzikouri⁴,
M. Oukessou⁴,
M. Erritali⁴ &
…
Y. Madani⁴

Chapter
First Online: 09 October 2018

352 Accesses
4 Citations

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 372))

Abstract

Ranging from modifying texts into semantically equivalent up to translation and adopting ideas, without proper referencing to its originator, Cross Language Plagiarism can be of many different natures. Among the most common problems in any data processing system is reliable large-scale text comparison, especially in a fuzzy semantic based similarity due to the complexity of natural languages in particular Arabic, and the increasing number of publications which raise the rate of suspicious documents sources of plagiarism. CLPD is more complicated than monolingual plagiarism, it goes beyond copy+translate and paste, consequently the detecting process exposes the need for vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices of hidden plagiarism in Arabic documents translated from English or French sources. In this paper, we propose a fuzzy-semantic similarity for CLPD using WordNet taxonomy and three semantic approaches Wu and Palmer, Lin and Leacock-Chodorow for Arabic documents; the work has been parallelized using Apache Hadoop with HDFS file system and MapReduce programming model.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

G.A. Miller, WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
M.A. Beyer, D. Laney, The importance of big data : a definition, Stamford CT Gart., 2014–2018 (2012)
Google Scholar
M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, Muharemagic E., Deep learning applications and challenges in big data analytics. J. Big Data, 2 (2015)
Google Scholar
B. Parhami, A Highly Parallel Computing System for Information Retrieval, in Proceedings of the December 5–7, 1972, Fall Joint Computer Conference, Part II (New York, NY, USA, 1972) pp. 681–690
Google Scholar
Q. Zhang, Y. Zhang, H. Yu, X. Huang, Efficient partial-duplicate detection based on sequence matching, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–682 (2010)
Google Scholar
J. Dwivedi, A. Tiwary, Plagiarism detection on bigdata using modified map-reduced based SCAM algorithm. Int. Conference on Innovative Mechanisms Ind. Appl. (ICIMIA) 2017, 608–610 (2017)
Google Scholar
H. Ezzikouri, M. Erritali, M. Oukessou, Fuzzy-semantic similarity for automatic multilingual plagiarism detection. Int. J. Adv. Comput. Sci. Appl. 8(9), 86–90 (2017)
Google Scholar
C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification. WordNet Electron. Lex. Database 49(2), 265–283 (1998)
Google Scholar
Z. Wu, M. Palmer, Verbs semantics and lexical selection, in Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)
Google Scholar
P. Rensik, Using information content to evaluate semantic similarity, in Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
D. Lin, An information-theoretic definition of similarity, in Icml, 98, 296–304 (1998)
Google Scholar
G. Hirst, D. St-Onge, Lexical chains as representations of context for the detection and correction of malapropisms. WordNet Electron. Lex. Database 305, 305–332 (1998)
Google Scholar
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms-Secund Edition. (McGraw-Hill, 2001)
Google Scholar
D. Gupta, K. Vani, C.K. Singh, Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection, in International Conference on Advances in Computing, Communications and Informatics (ICACCI, 2014) pp. 2694–2699
Google Scholar
N. Werro, Fuzzy Classification of Online Customers (Thesis University of Fribourg (Switzerland), Fuzzy Management Methods, 2008)
Google Scholar
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, D. McClosky, The Stanford CoreNLP natural language processing toolkit, in Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60 (2014)
Google Scholar
H. Ezzikouri, M. Erritali, M. Oukessou, Semantic similarity/relatedness for cross language plagiarism detection. Indones. J. Electr. Eng. Comput. Sci. 1(2), 371–374 (2016)
Article Google Scholar
S. Alzahrani, N. Salim, Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler Harman 1176, 1–8 (2010)
Google Scholar
R. Yerra, Y.-K. Ng, A sentence-based copy detection approach for web documents, in International Conference on Fuzzy Systems and Knowledge Discovery. vol. 2005, pp. 557–570 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Sultan Moulay Slimane University, BP 523, Beni Mellal, Morocco
H. Ezzikouri, M. Oukessou, M. Erritali & Y. Madani

Authors

H. Ezzikouri
View author publications
You can also search for this author in PubMed Google Scholar
M. Oukessou
View author publications
You can also search for this author in PubMed Google Scholar
M. Erritali
View author publications
You can also search for this author in PubMed Google Scholar
Y. Madani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Oukessou .

Editor information

Editors and Affiliations

Department of Mathematics, Université Sultan Moulay Slimane, Beni Mellal, Morocco
Said Melliani
Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Baja California, Mexico
Oscar Castillo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ezzikouri, H., Oukessou, M., Erritali, M., Madani, Y. (2019). Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce. In: Melliani, S., Castillo, O. (eds) Recent Advances in Intuitionistic Fuzzy Logic Systems. Studies in Fuzziness and Soft Computing, vol 372. Springer, Cham. https://doi.org/10.1007/978-3-030-02155-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-02155-9_15
Published: 09 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02154-2
Online ISBN: 978-3-030-02155-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics