Skip to main content

On the Use of Similarity Search to Detect Fake Scientific Papers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9371))

Abstract

Fake scientific papers have recently become of interest within the academic community as a result of the identification of fake papers in the digital libraries of major academic publishers [8]. Detecting and removing these papers is important for many reasons. We describe an investigation into the use of similarity search for detecting fake scientific papers by comparing several methods for signature construction and similarity scoring and describe a pseudo-relevance feedback technique that can be used to improve the effectiveness of these methods. Experiments on a dataset of 40,000 computer science papers show that precision, recall and MAP scores of 0.96, 0.99 and 0.99, respectively, can be achieved, thereby demonstrating the usefulness of similarity search in detecting fake scientific papers and ranking them highly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. Computer Networks and ISDN Systems 29(8–13), 1157–1166 (1997)

    Article  Google Scholar 

  2. Butler, D.: Investigating journals: The dark side of publishing. Nature 495(7442), 433–435 (2013)

    Article  Google Scholar 

  3. Gad-el Hak, M.: Publish or perish - an ailing enterprise? Physics Today 57(3), 61–62 (2004)

    Article  Google Scholar 

  4. Labbé, C., Labbé, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics 94(1), 379–396 (2012)

    Article  Google Scholar 

  5. Manku, G., Jain, A., Sarma, A.D.: Detecting near-duplicates for web crawling. In: WWW, pp. 141–149 (2007)

    Google Scholar 

  6. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: EMNLP, vol. 3, pp. 1318–1327 (2009)

    Google Scholar 

  7. Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., Stein, B.: Overview of the 6th international competition on plagiarism detection. In: CLEF (2014)

    Google Scholar 

  8. Van Noorden, R.: Publishers withdraw more than 120 gibberish papers. Nature, February 2014

    Google Scholar 

  9. Williams, K., Giles, C.L.: Near duplicate detection in an academic digital library. In: DocEng, pp. 91–94 (2013)

    Google Scholar 

  10. Xiong, J., Huang, T.: An effective method to identify machine automatically generated paper. In: KESE, pp. 101–102. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyle Williams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Williams, K., Giles, C.L. (2015). On the Use of Similarity Search to Detect Fake Scientific Papers. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25087-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25086-1

  • Online ISBN: 978-3-319-25087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics