Skip to main content

Identification of Plagiarism Using Syntactic and Semantic Filters

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Abstract

We present a work on detection of manual paraphrasing in documents in comparison with a set of source documents. Manual paraphrasing is a realistic type of plagiarism, where the obfuscation is introduced manually in documents. We have used PAN-PC-10 data set to develop and evaluate our algorithm. The proposed approach consists of two steps, namely, identification of probable plagiarized passages using dice similarity measure and filtering the obtained passages using syntactic rules and lexical semantic features extracted from obfuscation patterns. The algorithm works at sentence level. The results are encouraging in difficult cases of plagiarism that most of the existing approaches fail to detect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection: Lab Report for PAN at CLEF 2010. In: Notebook Papers of Labs and Workshops CLEF 2010, Padua, Italy (2010)

    Google Scholar 

  2. Brill, E.: Some Advances in transformation Based Part of Speech Tagging. In: Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI 1994), Seattle, WA (1994)

    Google Scholar 

  3. Chong, M. and Specia. L.: Lexical Generalisation for Word-level Matching in Plagiarism Detection. In: Recent Advances in Natural Language Processing, pp 704–709, Hissar, Bulgaria, (2011)

    Google Scholar 

  4. Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  5. Lalitha Devi, S., Ram, V.S., Rao, P.R.K.: Resolution of Pronominal Anaphors using Linear and Tree CRFs. In: 8th DAARC, Faro, Portugal (2011)

    Google Scholar 

  6. Aimmanee, P.: Automatic Plaiarism Detection Using Word-Sentence Based S-gram. Chiang Mai Journal of Science 38 (special issue), 1–7 (2011)

    Google Scholar 

  7. Palkovskii, Y., Belov, A., Muzyka, I.: Using WordNet-based Semantic Similarity Measurement in External Plagiarism Detection - Notebook for PAN at CLEF (2011)

    Google Scholar 

  8. Potthast, M., Hagen, M., Gollub, T., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection. In: Forner, P., Navigli, R., Tufis, D. (eds.), Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain, September 23-26 (2013)

    Google Scholar 

  9. Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.), CLEF 2012 Evaluation Labs and Workshop – Working Notes Papers (September 2012)

    Google Scholar 

  10. Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 11 Labs and Workshops (2011)

    Google Scholar 

  11. Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proc. of the 23rd Int. Conf. on Computational Linguistics, COLING 2010, Beijing, China, August 23-27, pp. 997–1005 (2010)

    Google Scholar 

  12. Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler, M., Harman, D., Pianta, E. (eds.), Notebook Papers of CLEF 10 Labs and Workshops (September 2010)

    Google Scholar 

  13. Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: NAACL 2001, Pittsburgh, PA, pp. 40–47 (2001)

    Google Scholar 

  14. Stamatatos, E.: Plagiarism Detection Using Stopword n-grams. Journal of the American Society for Information Science and Technology 62(12), 2512–2527 (2011)

    Article  Google Scholar 

  15. Uzuner, O., Katz, B., Nahnsen, T.: Using Syntactic Information to Identify Plagiarism. In: 2nd Workshop on Building Educational Applications using NLP (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ram, R.V.S., Stamatatos, E., Devi, S.L. (2014). Identification of Plagiarism Using Syntactic and Semantic Filters. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics