Abstract
Ranging from modifying texts into semantically equivalent up to translation and adopting ideas, without proper referencing to its originator, Cross Language Plagiarism can be of many different natures. Among the most common problems in any data processing system is reliable large-scale text comparison, especially in a fuzzy semantic based similarity due to the complexity of natural languages in particular Arabic, and the increasing number of publications which raise the rate of suspicious documents sources of plagiarism. CLPD is more complicated than monolingual plagiarism, it goes beyond copy+translate and paste, consequently the detecting process exposes the need for vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices of hidden plagiarism in Arabic documents translated from English or French sources. In this paper, we propose a fuzzy-semantic similarity for CLPD using WordNet taxonomy and three semantic approaches Wu and Palmer, Lin and Leacock-Chodorow for Arabic documents; the work has been parallelized using Apache Hadoop with HDFS file system and MapReduce programming model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
G.A. Miller, WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
M.A. Beyer, D. Laney, The importance of big data : a definition, Stamford CT Gart., 2014–2018 (2012)
M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, Muharemagic E., Deep learning applications and challenges in big data analytics. J. Big Data, 2 (2015)
B. Parhami, A Highly Parallel Computing System for Information Retrieval, in Proceedings of the December 5–7, 1972, Fall Joint Computer Conference, Part II (New York, NY, USA, 1972) pp. 681–690
Q. Zhang, Y. Zhang, H. Yu, X. Huang, Efficient partial-duplicate detection based on sequence matching, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–682 (2010)
J. Dwivedi, A. Tiwary, Plagiarism detection on bigdata using modified map-reduced based SCAM algorithm. Int. Conference on Innovative Mechanisms Ind. Appl. (ICIMIA) 2017, 608–610 (2017)
H. Ezzikouri, M. Erritali, M. Oukessou, Fuzzy-semantic similarity for automatic multilingual plagiarism detection. Int. J. Adv. Comput. Sci. Appl. 8(9), 86–90 (2017)
C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification. WordNet Electron. Lex. Database 49(2), 265–283 (1998)
Z. Wu, M. Palmer, Verbs semantics and lexical selection, in Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)
P. Rensik, Using information content to evaluate semantic similarity, in Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
D. Lin, An information-theoretic definition of similarity, in Icml, 98, 296–304 (1998)
G. Hirst, D. St-Onge, Lexical chains as representations of context for the detection and correction of malapropisms. WordNet Electron. Lex. Database 305, 305–332 (1998)
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms-Secund Edition. (McGraw-Hill, 2001)
D. Gupta, K. Vani, C.K. Singh, Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection, in International Conference on Advances in Computing, Communications and Informatics (ICACCI, 2014) pp. 2694–2699
N. Werro, Fuzzy Classification of Online Customers (Thesis University of Fribourg (Switzerland), Fuzzy Management Methods, 2008)
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, D. McClosky, The Stanford CoreNLP natural language processing toolkit, in Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60 (2014)
H. Ezzikouri, M. Erritali, M. Oukessou, Semantic similarity/relatedness for cross language plagiarism detection. Indones. J. Electr. Eng. Comput. Sci. 1(2), 371–374 (2016)
S. Alzahrani, N. Salim, Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler Harman 1176, 1–8 (2010)
R. Yerra, Y.-K. Ng, A sentence-based copy detection approach for web documents, in International Conference on Fuzzy Systems and Knowledge Discovery. vol. 2005, pp. 557–570 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ezzikouri, H., Oukessou, M., Erritali, M., Madani, Y. (2019). Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce. In: Melliani, S., Castillo, O. (eds) Recent Advances in Intuitionistic Fuzzy Logic Systems. Studies in Fuzziness and Soft Computing, vol 372. Springer, Cham. https://doi.org/10.1007/978-3-030-02155-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-02155-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02154-2
Online ISBN: 978-3-030-02155-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)