A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts

Ahangarbahan, Hamid; Montazer, Gholam Ali

doi:10.1007/978-3-319-19258-1_43

A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts

Hamid Ahangarbahan¹⁶ &
Gholam Ali Montazer¹⁶

Conference paper
First Online: 01 January 2015

2005 Accesses
2 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9094))

Abstract

A variety of methods and metrics have been offered so far to measure the extent of similarity among various documents and plagiarism detection systems. However, most of them do not take ambiguity inherent in natural language into account. Therefore, in this paper, a new method taking lexical and semantic features and similarity measures into consideration has been proposed. In the first step, after preprocessing and removing stop word, a text was divided into two parts: general and domain-specific knowledge words. Then, the mixed lexical and semantic fuzzy inference system was designed to assess text similarity. The proposed method was evaluated on Persian paper abstracts of International Conference on e-Learning and e-Teaching (ICELET) Corpus and using IT domain knowledge ontology. The results indicated that the proposed method can achieve a rate of 79% in terms of precision and can detect 83% of the plagiarism cases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, Y., Hodges, J.: Document clustering with semantic analysis. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences. HICSS 2006, vol. 3. IEEE (2006)
Google Scholar
Osman, A.H., Salim, N., Binwahlan, M.S., Alteeb, R., Abuobieda, A.: An improved plagiarism detection scheme based on semantic role labeling. Applied Soft Computing 12(5), 1493–1502 (2012)
Article Google Scholar
Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman (2010)
Google Scholar
Gupta, R., et al.: UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment. SemEval 2014, 785 (2014)
Google Scholar
El-Alfy, E.-S.M., et al.: Boosting paraphrase detection through textual similarity metrics with abductive networks. Applied Soft Computing 26, 444–453 (2015)
Article Google Scholar
Zesch, D.B.T., Gurevych, I.: Text reuse detection using a composition of text similarity measures. In: Proceedings of COLING, vol. 1, pp. 167–184 (2012)
Google Scholar
Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)
Chapter Google Scholar
Kent, C.K., Salim, N.: Features based text similarity detection. arXiv preprint arXiv:1001.3487 (2010)
Google Scholar
Joy, M., Luck, M.: Plagiarism in programming assignments. IEEE Transactions on Education 42(2), 129–133 (1999)
Article Google Scholar
Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning. Springer, US (1974)
Book Google Scholar
Huang, Y.-P., et al.: An intelligent approach to detecting the bad credit card accounts. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications. ACTA Press (2007)
Google Scholar
Rutkowski, L., Cpalka, K.: Flexible neuro-fuzzy systems. IEEE Transactions on Neural Networks 14(3), 554–574 (2003)
Article Google Scholar
Metzler, D., et al.: Similarity measures for tracking information flow. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM (2005)
Google Scholar
Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)
Google Scholar
Rus, V.R.B., Lintean, M.: On paraphrase identification corpora. In: Proceeding on the International Conference on Language Resources and Evaluation (LREC 2014) (2014)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI. 6 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, Tarbiat Modares University, Tehran, Iran
Hamid Ahangarbahan & Gholam Ali Montazer

Authors

Hamid Ahangarbahan
View author publications
You can also search for this author in PubMed Google Scholar
Gholam Ali Montazer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gholam Ali Montazer .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Malaga, Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Vilanova i la Geltrú, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahangarbahan, H., Montazer, G.A. (2015). A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2015. Lecture Notes in Computer Science(), vol 9094. Springer, Cham. https://doi.org/10.1007/978-3-319-19258-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-19258-1_43
Published: 06 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19257-4
Online ISBN: 978-3-319-19258-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics