Detection of Computer-Generated Papers Using One-Class SVM and Cluster Approaches

Avros, Renata; Volkovich, Zeev

doi:10.1007/978-3-319-96133-0_4

Renata Avros¹⁴ &
Zeev Volkovich¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10935))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

2048 Accesses
2 Citations

Abstract

The paper presents a novel methodology intended to distinguish between real and artificially generated manuscripts. The approach employs inherent differences between the human and artificially generated wring styles. Taking into account the nature of the generation process, we suggest that the human style is essentially more “diverse” and “rich” in comparison with an artificial one. In order to assess dissimilarities between fake and real papers, a distance between writing styles is evaluated via the dynamic dissimilarity methodology. From this standpoint, the generated papers are much similar in their own style and significantly differ from the human written documents. A set of fake documents is captured as the training data so that a real document is expected to appear as an outlier in relation to this collection. Thus, we analyze the proposed task in the context of the one-class classification using a one-class SVM approach compared with a clustering base procedure. The provided numerical experiments demonstrate very high ability of the proposed methodology to recognize artificially generated papers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lavoie, A., Krishnamoorthy, M.: Algorithmic detection of computer generated text. arXiv:1008.0706, August 2010
Labbe, C., Labbe, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics 94(1), 379–396 (2013)
Google Scholar
Fahrenberg, U., et al.: Measuring global similarity between texts. In: Besacier, L., Dediu, A.-H., Martín-Vide, C. (eds.) SLSP 2014. LNCS (LNAI), vol. 8791, pp. 220–232. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11397-5_17
Chapter Google Scholar
Xiong, J., Huang, T.: An effective method to identify machine automatically generated paper. In: Pacific-Asia Conference on Knowledge Engineering and Software Engineering, KESE 2009, pp. 101–102. IEEE (2009)
Google Scholar
Dalkilic, M.M., Clark, W.T., Costello, J.C., Radivojac, P.: Using compression to identify classes of inauthentic texts. In: Proceedings of the 2006 SIAM Conference on Data Mining (2006)
Chapter Google Scholar
Amancio, D.R.: Comparing the topological properties of real and artificially generated scientific manuscripts. Scientometrics 105(3), 1763–1779 (2015)
Article MathSciNet Google Scholar
Williams, K., Giles, C.L.: On the use of similarity search to detect fake scientific papers. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 332–338. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_32
Chapter Google Scholar
Nguyen, M.T., Labbe, C.: Engineering a tool to detect automatically generated papers. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) BIR@ECIR, ser. CEUR Workshop Proceedings, vol. 1567, pp. 54–62. CEURWS.org (2016)
Google Scholar
Volkovich, Z., Granichin, O., Redkin, O., Bernikova, O.: Modeling and visualization of media in Arabic. J. Informetr. 10(2), 439–453 (2016)
Article Google Scholar
Volkovich, Z.: A time series model of the writing process. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition. LNCS (LNAI), vol. 9729, pp. 128–142. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_10
Chapter Google Scholar
Volkovich, Z., Avros, R.: Text classification using a novel time series based methodology. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, York, United Kingdom, 5–7 September 2016 (2016). Procedia Comput. Sci. 96, 53–62 (2016)
Article Google Scholar
Korenblat, K., Volkovich, Z.: Approach for identification of artificially generated texts. In: HUSO 2017: In the Third International Conference on Human and Social Analytics (2017)
Google Scholar
Amelin, K., Granichin, O., Kizhaeva, N., Volkovich, Z.: Patterning of writing style evolution by means of dynamic similarity. Pattern Recogn. 77, 45–64 (2018)
Article Google Scholar
Kendall, M.G., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)
MATH Google Scholar
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS 1999), pp. 582–588. MIT Press, Cambridge (1999)
Google Scholar
Harmer, J.: How to Teach Writing. Pearson Education, Delhi (2006)
Google Scholar
www.arXiv.org/archive/cs. Accessed 2 July 2017
Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval, vol. 1, no. 3, pp. 33–334 (2006)
Google Scholar
Binongo, J.: Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance 6(2), 9–17 (2003)
Article MathSciNet Google Scholar
Hughes, J.M., Foti, N.J., Krakauer, D.C., Rockmore, D.N.: Quantitative patterns of stylistic influence in the evolution of literature. Proc. Natl. Acad. Sci. 109, 7682–7686 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, ORT Braude College, 21982, Karmiel, Israel
Renata Avros & Zeev Volkovich

Authors

Renata Avros
View author publications
You can also search for this author in PubMed Google Scholar
Zeev Volkovich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeev Volkovich .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avros, R., Volkovich, Z. (2018). Detection of Computer-Generated Papers Using One-Class SVM and Cluster Approaches. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-96133-0_4
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96132-3
Online ISBN: 978-3-319-96133-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics