Skip to main content

Robust Plagiary Detection Using Semantic Compression Augmented SHAPD

  • Conference paper
Computational Collective Intelligence. Technologies and Applications (ICCCI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7653))

Included in the following conference series:

Abstract

This work presents results of the ongoing novel research in the area of semantic networks, plagiarism detection and general natural language processing. Results presented here demonstrate that the semantic compression is a valuable addition to the existing methods used in plagiary detection. The application of the semantic compression boosts the efficiency of Sentence Hashing Algorithm for Plagiarism Detection (SHAPD) and authors’ implementation of the w-shingling algorithm. There were also test with use of the traditional Vector Space Model method that demonstrated that this technique is not well suited for plagiary detection contrary to general beliefs. All the experiments were performed on a generally available corpus built so that such analysis can be comparable to efforts of other research teams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  2. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8-13), 1157–1166 (1997)

    Article  Google Scholar 

  3. Burrows, S., Tahaghoghi, S.M.M., Zobel, J.: Efficient plagiarism detection for large code repositories. Software: Practice and Experience 37(2), 151–175 (2007)

    Article  Google Scholar 

  4. Ceglarek, D., Haniewicz, K.: Fast Plagiarism Detection by Sentence Hashing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 30–37. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantically Enhanced Intellectual Property Protection System - SEIPro2S. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 449–459. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Towards Knowledge Acquisition with WiSENet. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 75–84. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 380–388. ACM, New York (2002)

    Chapter  Google Scholar 

  9. Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected combinatorial research problems. Technical report, Stanford University, Stanford, CA, USA (1972)

    Google Scholar 

  10. Clough, P., Stevenson, M.: A Corpus of Plagiarised Short Answers (2009) (Online; accessed April 2, 2012)

    Google Scholar 

  11. Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: EMNLP, pp. 897–906. ACL (2008)

    Google Scholar 

  12. Grozea, C., Gehl, C., Popescu, M.: Encoplot: Pairwise sequence matching in linear time applied to plagiarism detection. In: Time, pp. 10–18 (2009)

    Google Scholar 

  13. Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20, 350–353 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  14. Irving, R.W.: Plagiarism and collusion detection using the smith-waterman algorithm. Technical report, University of Glasgow, Department of Computing Science (2004)

    Google Scholar 

  15. Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies, CompSysTech 2007, pp. 40:1–40:6. ACM, New York (2007)

    Google Scholar 

  16. Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, WTEC 1994, pp. 2–2. USENIX Association, Berkeley (1994)

    Google Scholar 

  17. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  18. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  19. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38, 39–41 (1995)

    Article  Google Scholar 

  20. Mozgovoy, M., Fredriksson, K., White, D., Joy, M., Sutinen, E.: Fast Plagiarism Detection System. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 267–270. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Mozgovoy, M., Karakovskiy, S., Klyuev, V.: Fast and reliable plagiarism detection system. In: 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE 2007, pp. S4H-11–S4H-14 (October 2007)

    Google Scholar 

  22. Ota, T., Masuyama, S.: Automatic plagiarism detection among term papers. In: Proceedings of the 3rd International Universal Communication Symposium, IUCS 2009, pp. 395–399. ACM, New York (2009)

    Chapter  Google Scholar 

  23. Shivakumar, N., Garcia-Molina, H.: Scam: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceglarek, D., Haniewicz, K., Rutkowski, W. (2012). Robust Plagiary Detection Using Semantic Compression Augmented SHAPD. In: Nguyen, NT., Hoang, K., JČ©drzejowicz, P. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science(), vol 7653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34630-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34630-9_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34629-3

  • Online ISBN: 978-3-642-34630-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics