Abstract
A variety of methods and metrics have been offered so far to measure the extent of similarity among various documents and plagiarism detection systems. However, most of them do not take ambiguity inherent in natural language into account. Therefore, in this paper, a new method taking lexical and semantic features and similarity measures into consideration has been proposed. In the first step, after preprocessing and removing stop word, a text was divided into two parts: general and domain-specific knowledge words. Then, the mixed lexical and semantic fuzzy inference system was designed to assess text similarity. The proposed method was evaluated on Persian paper abstracts of International Conference on e-Learning and e-Teaching (ICELET) Corpus and using IT domain knowledge ontology. The results indicated that the proposed method can achieve a rate of 79% in terms of precision and can detect 83% of the plagiarism cases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wang, Y., Hodges, J.: Document clustering with semantic analysis. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences. HICSS 2006, vol. 3. IEEE (2006)
Osman, A.H., Salim, N., Binwahlan, M.S., Alteeb, R., Abuobieda, A.: An improved plagiarism detection scheme based on semantic role labeling. Applied Soft Computing 12(5), 1493–1502 (2012)
Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman (2010)
Gupta, R., et al.: UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment. SemEval 2014, 785 (2014)
El-Alfy, E.-S.M., et al.: Boosting paraphrase detection through textual similarity metrics with abductive networks. Applied Soft Computing 26, 444–453 (2015)
Zesch, D.B.T., Gurevych, I.: Text reuse detection using a composition of text similarity measures. In: Proceedings of COLING, vol. 1, pp. 167–184 (2012)
Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)
Kent, C.K., Salim, N.: Features based text similarity detection. arXiv preprint arXiv:1001.3487 (2010)
Joy, M., Luck, M.: Plagiarism in programming assignments. IEEE Transactions on Education 42(2), 129–133 (1999)
Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning. Springer, US (1974)
Huang, Y.-P., et al.: An intelligent approach to detecting the bad credit card accounts. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications. ACTA Press (2007)
Rutkowski, L., Cpalka, K.: Flexible neuro-fuzzy systems. IEEE Transactions on Neural Networks 14(3), 554–574 (2003)
Metzler, D., et al.: Similarity measures for tracking information flow. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM (2005)
Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)
Rus, V.R.B., Lintean, M.: On paraphrase identification corpora. In: Proceeding on the International Conference on Language Resources and Evaluation (LREC 2014) (2014)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI. 6 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ahangarbahan, H., Montazer, G.A. (2015). A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2015. Lecture Notes in Computer Science(), vol 9094. Springer, Cham. https://doi.org/10.1007/978-3-319-19258-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-19258-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19257-4
Online ISBN: 978-3-319-19258-1
eBook Packages: Computer ScienceComputer Science (R0)