Abstract
The paper presents a novel methodology intended to distinguish between real and artificially generated manuscripts. The approach employs inherent differences between the human and artificially generated wring styles. Taking into account the nature of the generation process, we suggest that the human style is essentially more “diverse” and “rich” in comparison with an artificial one. In order to assess dissimilarities between fake and real papers, a distance between writing styles is evaluated via the dynamic dissimilarity methodology. From this standpoint, the generated papers are much similar in their own style and significantly differ from the human written documents. A set of fake documents is captured as the training data so that a real document is expected to appear as an outlier in relation to this collection. Thus, we analyze the proposed task in the context of the one-class classification using a one-class SVM approach compared with a clustering base procedure. The provided numerical experiments demonstrate very high ability of the proposed methodology to recognize artificially generated papers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lavoie, A., Krishnamoorthy, M.: Algorithmic detection of computer generated text. arXiv:1008.0706, August 2010
Labbe, C., Labbe, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics 94(1), 379–396 (2013)
Fahrenberg, U., et al.: Measuring global similarity between texts. In: Besacier, L., Dediu, A.-H., Martín-Vide, C. (eds.) SLSP 2014. LNCS (LNAI), vol. 8791, pp. 220–232. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11397-5_17
Xiong, J., Huang, T.: An effective method to identify machine automatically generated paper. In: Pacific-Asia Conference on Knowledge Engineering and Software Engineering, KESE 2009, pp. 101–102. IEEE (2009)
Dalkilic, M.M., Clark, W.T., Costello, J.C., Radivojac, P.: Using compression to identify classes of inauthentic texts. In: Proceedings of the 2006 SIAM Conference on Data Mining (2006)
Amancio, D.R.: Comparing the topological properties of real and artificially generated scientific manuscripts. Scientometrics 105(3), 1763–1779 (2015)
Williams, K., Giles, C.L.: On the use of similarity search to detect fake scientific papers. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 332–338. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_32
Nguyen, M.T., Labbe, C.: Engineering a tool to detect automatically generated papers. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) BIR@ECIR, ser. CEUR Workshop Proceedings, vol. 1567, pp. 54–62. CEURWS.org (2016)
Volkovich, Z., Granichin, O., Redkin, O., Bernikova, O.: Modeling and visualization of media in Arabic. J. Informetr. 10(2), 439–453 (2016)
Volkovich, Z.: A time series model of the writing process. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition. LNCS (LNAI), vol. 9729, pp. 128–142. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_10
Volkovich, Z., Avros, R.: Text classification using a novel time series based methodology. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, York, United Kingdom, 5–7 September 2016 (2016). Procedia Comput. Sci. 96, 53–62 (2016)
Korenblat, K., Volkovich, Z.: Approach for identification of artificially generated texts. In: HUSO 2017: In the Third International Conference on Human and Social Analytics (2017)
Amelin, K., Granichin, O., Kizhaeva, N., Volkovich, Z.: Patterning of writing style evolution by means of dynamic similarity. Pattern Recogn. 77, 45–64 (2018)
Kendall, M.G., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS 1999), pp. 582–588. MIT Press, Cambridge (1999)
Harmer, J.: How to Teach Writing. Pearson Education, Delhi (2006)
www.arXiv.org/archive/cs. Accessed 2 July 2017
Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval, vol. 1, no. 3, pp. 33–334 (2006)
Binongo, J.: Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance 6(2), 9–17 (2003)
Hughes, J.M., Foti, N.J., Krakauer, D.C., Rockmore, D.N.: Quantitative patterns of stylistic influence in the evolution of literature. Proc. Natl. Acad. Sci. 109, 7682–7686 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Avros, R., Volkovich, Z. (2018). Detection of Computer-Generated Papers Using One-Class SVM and Cluster Approaches. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-96133-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96132-3
Online ISBN: 978-3-319-96133-0
eBook Packages: Computer ScienceComputer Science (R0)