Skip to main content

Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

Abstract

Intrinsic plagiarism detection deals with the task of finding plagiarized sections of text documents without using a reference corpus. This paper describes a novel approach to this task by processing and analyzing the grammar of a suspicious document. The main idea is to split a text into single sentences and to calculate grammar trees. To find suspicious sentences, these grammar trees are compared in a distance matrix by using the pq-gram-distance, an alternative for the tree edit distance. Finally, significantly different sentences regarding their grammar and with respect to the Gaussian normal distribution are marked as suspicious.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Augsten, N., Böhlen, M., Gamper, J.: The pq-Gram Distance between Ordered Labeled Trees. ACM Transactions on Database Systems (2010)

    Google Scholar 

  2. Bille, P.: A survey on tree edit distance and related problems. Theoretical Computuer Science 337, 217–239 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Catherine De Marneffe, M., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)

    Google Scholar 

  4. Karlgren, J.: Stylistic Experiments For Information Retrieval. PhD thesis, Swedish Institute for Computer Science (2000)

    Google Scholar 

  5. Kestemont, M., Luyckx, K., Daelemans, W.: Intrinsic Plagiarism Detection Using Character Trigram Distance Scores. In: CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands (2011)

    Google Scholar 

  6. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, Stroudsburg, PA, USA, vol. 1, pp. 423–430 (2003)

    Google Scholar 

  7. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Comp. Linguistics Linguistics (June 1993)

    Google Scholar 

  8. Oberreuter, G., L’Huillier, G., Ríos, S.A., Velásquez, J.D.: Approaches for Intrinsic and External Plagiarism Detection. In: CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands (2011)

    Google Scholar 

  9. Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 11 Labs and Workshops (2011)

    Google Scholar 

  10. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China (August 2010)

    Google Scholar 

  11. Seaward, L., Matwin, S.: Intrinsic Plagiarism Detection using Complexity Analysis. In: CLEF (Notebook Papers/Labs/Workshop) (2009)

    Google Scholar 

  12. Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: CLEF (Notebook Papers/Labs/Workshop) (2009)

    Google Scholar 

  13. Stamatatos, E., Kokkinakis, G., Fakotakis, N.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26, 471–495 (2000)

    Article  Google Scholar 

  14. Stevenson, M., Gaizauskas, R.: Experiments on sentence boundary detection. In: Proc. of the 6th Conference on Applied Natural Language Processing, ANLC 2000, Stroudsburg, PA, USA, pp. 84–89 (2000)

    Google Scholar 

  15. The Stanford Parser, http://nlp.stanford.edu/software/lex-parser.shtml (visited January 2012)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tschuggnall, M., Specht, G. (2012). Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31178-9_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31177-2

  • Online ISBN: 978-3-642-31178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics