Robust Plagiary Detection Using Semantic Compression Augmented SHAPD

Ceglarek, Dariusz; Haniewicz, Konstanty; Rutkowski, Wojciech

doi:10.1007/978-3-642-34630-9_32

Dariusz Ceglarek²²,
Konstanty Haniewicz²³ &
Wojciech Rutkowski²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7653))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1152 Accesses
5 Citations
1 Altmetric

Abstract

This work presents results of the ongoing novel research in the area of semantic networks, plagiarism detection and general natural language processing. Results presented here demonstrate that the semantic compression is a valuable addition to the existing methods used in plagiary detection. The application of the semantic compression boosts the efficiency of Sentence Hashing Algorithm for Plagiarism Detection (SHAPD) and authors’ implementation of the w-shingling algorithm. There were also test with use of the traditional Vector Space Model method that demonstrated that this technique is not well suited for plagiary detection contrary to general beliefs. All the experiments were performed on a generally available corpus built so that such analysis can be comparable to efforts of other research teams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8-13), 1157–1166 (1997)
Article Google Scholar
Burrows, S., Tahaghoghi, S.M.M., Zobel, J.: Efficient plagiarism detection for large code repositories. Software: Practice and Experience 37(2), 151–175 (2007)
Article Google Scholar
Ceglarek, D., Haniewicz, K.: Fast Plagiarism Detection by Sentence Hashing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 30–37. Springer, Heidelberg (2012)
Chapter Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantically Enhanced Intellectual Property Protection System - SEIPro2S. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 449–459. Springer, Heidelberg (2009)
Chapter Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)
Chapter Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Towards Knowledge Acquisition with WiSENet. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 75–84. Springer, Heidelberg (2011)
Chapter Google Scholar
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 380–388. ACM, New York (2002)
Chapter Google Scholar
Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected combinatorial research problems. Technical report, Stanford University, Stanford, CA, USA (1972)
Google Scholar
Clough, P., Stevenson, M.: A Corpus of Plagiarised Short Answers (2009) (Online; accessed April 2, 2012)
Google Scholar
Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: EMNLP, pp. 897–906. ACL (2008)
Google Scholar
Grozea, C., Gehl, C., Popescu, M.: Encoplot: Pairwise sequence matching in linear time applied to plagiarism detection. In: Time, pp. 10–18 (2009)
Google Scholar
Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20, 350–353 (1977)
Article MathSciNet MATH Google Scholar
Irving, R.W.: Plagiarism and collusion detection using the smith-waterman algorithm. Technical report, University of Glasgow, Department of Computing Science (2004)
Google Scholar
Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies, CompSysTech 2007, pp. 40:1–40:6. ACM, New York (2007)
Google Scholar
Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, WTEC 1994, pp. 2–2. USENIX Association, Berkeley (1994)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)
Article MathSciNet MATH Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38, 39–41 (1995)
Article Google Scholar
Mozgovoy, M., Fredriksson, K., White, D., Joy, M., Sutinen, E.: Fast Plagiarism Detection System. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 267–270. Springer, Heidelberg (2005)
Chapter Google Scholar
Mozgovoy, M., Karakovskiy, S., Klyuev, V.: Fast and reliable plagiarism detection system. In: 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE 2007, pp. S4H-11–S4H-14 (October 2007)
Google Scholar
Ota, T., Masuyama, S.: Automatic plagiarism detection among term papers. In: Proceedings of the 3rd International Universal Communication Symposium, IUCS 2009, pp. 395–399. ACM, New York (2009)
Chapter Google Scholar
Shivakumar, N., Garcia-Molina, H.: Scam: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Poznan School of Banking, Poland
Dariusz Ceglarek
Poznan University of Economics, Poland
Konstanty Haniewicz
Ciber, Poland
Wojciech Rutkowski

Authors

Dariusz Ceglarek
View author publications
You can also search for this author in PubMed Google Scholar
Konstanty Haniewicz
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Rutkowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Ngoc-Thanh Nguyen
University of Information Technology, National Vietnam University VNU-HCM, Ho Chi Minh city, Vietnam
Kiem Hoang
Gdynia Maritime University, Str. Morska 81-87, 81-225, Gdynia, Poland
Piotr Jȩdrzejowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceglarek, D., Haniewicz, K., Rutkowski, W. (2012). Robust Plagiary Detection Using Semantic Compression Augmented SHAPD. In: Nguyen, NT., Hoang, K., Jȩdrzejowicz, P. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science(), vol 7653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34630-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-34630-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34629-3
Online ISBN: 978-3-642-34630-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics