Skip to main content

A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9094))

Abstract

A variety of methods and metrics have been offered so far to measure the extent of similarity among various documents and plagiarism detection systems. However, most of them do not take ambiguity inherent in natural language into account. Therefore, in this paper, a new method taking lexical and semantic features and similarity measures into consideration has been proposed. In the first step, after preprocessing and removing stop word, a text was divided into two parts: general and domain-specific knowledge words. Then, the mixed lexical and semantic fuzzy inference system was designed to assess text similarity. The proposed method was evaluated on Persian paper abstracts of International Conference on e-Learning and e-Teaching (ICELET) Corpus and using IT domain knowledge ontology. The results indicated that the proposed method can achieve a rate of 79% in terms of precision and can detect 83% of the plagiarism cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, Y., Hodges, J.: Document clustering with semantic analysis. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences. HICSS 2006, vol. 3. IEEE (2006)

    Google Scholar 

  2. Osman, A.H., Salim, N., Binwahlan, M.S., Alteeb, R., Abuobieda, A.: An improved plagiarism detection scheme based on semantic role labeling. Applied Soft Computing 12(5), 1493–1502 (2012)

    Article  Google Scholar 

  3. Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman (2010)

    Google Scholar 

  4. Gupta, R., et al.: UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment. SemEval 2014, 785 (2014)

    Google Scholar 

  5. El-Alfy, E.-S.M., et al.: Boosting paraphrase detection through textual similarity metrics with abductive networks. Applied Soft Computing 26, 444–453 (2015)

    Article  Google Scholar 

  6. Zesch, D.B.T., Gurevych, I.: Text reuse detection using a composition of text similarity measures. In: Proceedings of COLING, vol. 1, pp. 167–184 (2012)

    Google Scholar 

  7. Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Kent, C.K., Salim, N.: Features based text similarity detection. arXiv preprint arXiv:1001.3487 (2010)

    Google Scholar 

  9. Joy, M., Luck, M.: Plagiarism in programming assignments. IEEE Transactions on Education 42(2), 129–133 (1999)

    Article  Google Scholar 

  10. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning. Springer, US (1974)

    Book  Google Scholar 

  11. Huang, Y.-P., et al.: An intelligent approach to detecting the bad credit card accounts. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications. ACTA Press (2007)

    Google Scholar 

  12. Rutkowski, L., Cpalka, K.: Flexible neuro-fuzzy systems. IEEE Transactions on Neural Networks 14(3), 554–574 (2003)

    Article  Google Scholar 

  13. Metzler, D., et al.: Similarity measures for tracking information flow. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM (2005)

    Google Scholar 

  14. Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

    Google Scholar 

  15. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)

    Google Scholar 

  16. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)

    Google Scholar 

  17. Rus, V.R.B., Lintean, M.: On paraphrase identification corpora. In: Proceeding on the International Conference on Language Resources and Evaluation (LREC 2014) (2014)

    Google Scholar 

  18. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI. 6 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gholam Ali Montazer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ahangarbahan, H., Montazer, G.A. (2015). A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2015. Lecture Notes in Computer Science(), vol 9094. Springer, Cham. https://doi.org/10.1007/978-3-319-19258-1_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19258-1_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19257-4

  • Online ISBN: 978-3-319-19258-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics