Abstract
We present a work on detection of manual paraphrasing in documents in comparison with a set of source documents. Manual paraphrasing is a realistic type of plagiarism, where the obfuscation is introduced manually in documents. We have used PAN-PC-10 data set to develop and evaluate our algorithm. The proposed approach consists of two steps, namely, identification of probable plagiarized passages using dice similarity measure and filtering the obtained passages using syntactic rules and lexical semantic features extracted from obfuscation patterns. The algorithm works at sentence level. The results are encouraging in difficult cases of plagiarism that most of the existing approaches fail to detect.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection: Lab Report for PAN at CLEF 2010. In: Notebook Papers of Labs and Workshops CLEF 2010, Padua, Italy (2010)
Brill, E.: Some Advances in transformation Based Part of Speech Tagging. In: Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI 1994), Seattle, WA (1994)
Chong, M. and Specia. L.: Lexical Generalisation for Word-level Matching in Plagiarism Detection. In: Recent Advances in Natural Language Processing, pp 704–709, Hissar, Bulgaria, (2011)
Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)
Lalitha Devi, S., Ram, V.S., Rao, P.R.K.: Resolution of Pronominal Anaphors using Linear and Tree CRFs. In: 8th DAARC, Faro, Portugal (2011)
Aimmanee, P.: Automatic Plaiarism Detection Using Word-Sentence Based S-gram. Chiang Mai Journal of Science 38 (special issue), 1–7 (2011)
Palkovskii, Y., Belov, A., Muzyka, I.: Using WordNet-based Semantic Similarity Measurement in External Plagiarism Detection - Notebook for PAN at CLEF (2011)
Potthast, M., Hagen, M., Gollub, T., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection. In: Forner, P., Navigli, R., Tufis, D. (eds.), Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain, September 23-26 (2013)
Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.), CLEF 2012 Evaluation Labs and Workshop – Working Notes Papers (September 2012)
Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 11 Labs and Workshops (2011)
Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proc. of the 23rd Int. Conf. on Computational Linguistics, COLING 2010, Beijing, China, August 23-27, pp. 997–1005 (2010)
Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler, M., Harman, D., Pianta, E. (eds.), Notebook Papers of CLEF 10 Labs and Workshops (September 2010)
Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: NAACL 2001, Pittsburgh, PA, pp. 40–47 (2001)
Stamatatos, E.: Plagiarism Detection Using Stopword n-grams. Journal of the American Society for Information Science and Technology 62(12), 2512–2527 (2011)
Uzuner, O., Katz, B., Nahnsen, T.: Using Syntactic Information to Identify Plagiarism. In: 2nd Workshop on Building Educational Applications using NLP (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ram, R.V.S., Stamatatos, E., Devi, S.L. (2014). Identification of Plagiarism Using Syntactic and Semantic Filters. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)