On the Use of Similarity Search to Detect Fake Scientific Papers

Williams, Kyle; Giles, C. Lee

doi:10.1007/978-3-319-25087-8_32

On the Use of Similarity Search to Detect Fake Scientific Papers

Kyle Williams¹⁷ &
C. Lee Giles^17,18

Conference paper
First Online: 17 October 2015

1087 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9371))

Abstract

Fake scientific papers have recently become of interest within the academic community as a result of the identification of fake papers in the digital libraries of major academic publishers [8]. Detecting and removing these papers is important for many reasons. We describe an investigation into the use of similarity search for detecting fake scientific papers by comparing several methods for signature construction and similarity scoring and describe a pseudo-relevance feedback technique that can be used to improve the effectiveness of these methods. Experiments on a dataset of 40,000 computer science papers show that precision, recall and MAP scores of 0.96, 0.99 and 0.99, respectively, can be achieved, thereby demonstrating the usefulness of similarity search in detecting fake scientific papers and ranking them highly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. Computer Networks and ISDN Systems 29(8–13), 1157–1166 (1997)
Article Google Scholar
Butler, D.: Investigating journals: The dark side of publishing. Nature 495(7442), 433–435 (2013)
Article Google Scholar
Gad-el Hak, M.: Publish or perish - an ailing enterprise? Physics Today 57(3), 61–62 (2004)
Article Google Scholar
Labbé, C., Labbé, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics 94(1), 379–396 (2012)
Article Google Scholar
Manku, G., Jain, A., Sarma, A.D.: Detecting near-duplicates for web crawling. In: WWW, pp. 141–149 (2007)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: EMNLP, vol. 3, pp. 1318–1327 (2009)
Google Scholar
Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., Stein, B.: Overview of the 6th international competition on plagiarism detection. In: CLEF (2014)
Google Scholar
Van Noorden, R.: Publishers withdraw more than 120 gibberish papers. Nature, February 2014
Google Scholar
Williams, K., Giles, C.L.: Near duplicate detection in an academic digital library. In: DocEng, pp. 91–94 (2013)
Google Scholar
Xiong, J., Huang, T.: An effective method to identify machine automatically generated paper. In: KESE, pp. 101–102. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Sciences and Technology, The Pennsylvania State University, University Park, State College, PA, 16802, USA
Kyle Williams & C. Lee Giles
Computer Science and Engineering, The Pennsylvania State University, University Park, State College, PA, 16802, USA
C. Lee Giles

Authors

Kyle Williams
View author publications
You can also search for this author in PubMed Google Scholar
C. Lee Giles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyle Williams .

Editor information

Editors and Affiliations

ISTI-CNR, Pisa, Italy
Giuseppe Amato
University of Strathclyde, Glasgow, United Kingdom
Richard Connor
ISTI-CNR, Pisa, Italy
Fabrizio Falchi
ISTI-CNR, Pisa, Italy
Claudio Gennaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Williams, K., Giles, C.L. (2015). On the Use of Similarity Search to Detect Fake Scientific Papers. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-25087-8_32
Published: 17 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25086-1
Online ISBN: 978-3-319-25087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics