Towards the Detection of Cross-Language Source Code Reuse

Flores, Enrique; Barrón-Cedeño, Alberto; Rosso, Paolo; Moreno, Lidia

doi:10.1007/978-3-642-22327-3_31

Towards the Detection of Cross-Language Source Code Reuse

Enrique Flores¹⁹,
Alberto Barrón-Cedeño¹⁹,
Paolo Rosso¹⁹ &
…
Lidia Moreno¹⁹

Conference paper

1848 Accesses
19 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Abstract

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C+ +, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)
Google Scholar
Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)
Article Google Scholar
Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)
Google Scholar
Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)
Article MATH Google Scholar
Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)
Google Scholar
Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)
Article Google Scholar
Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Spain
Enrique Flores, Alberto Barrón-Cedeño, Paolo Rosso & Lidia Moreno

Authors

Enrique Flores
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Barrón-Cedeño
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar
Lidia Moreno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Flores, E., Barrón-Cedeño, A., Rosso, P., Moreno, L. (2011). Towards the Detection of Cross-Language Source Code Reuse. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics