Abstract
A combination of three hypothesis methods has been proposed for authorship attribution. These are: the Student’s t-test, the Kolmogorov-Smirnov’s test and the chi-square test. This combination of methods can perfect adequacy of authorship attribution. With the help of these methods, the quantity of phoneme groups is minimized. The texts can be differentiated by one or three phoneme groups. The minimization of phoneme groups facilitates authorship attribution. On the basis of the results obtained, statistical models have been built. The models show author-differentiating capability of phoneme groups in authorship attribution. This way the highest author-differentiating capability of phoneme groups is determined. New software realizes the models applied. The program makes transforming a text into transcription faster. The transcription variants of words available in the program are used.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Argamon, S., Koppel, M., Pennebaker, J., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)
Lytvyn, V.: Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology. Eastern Eur. J. Enterp. Technol. 4/2(88), 10–18 (2017)
Mubin, S.T., Rajesh, S.P.: Authorship identification with multi sequence word selection method. In: Thermal Stresses—Advanced Theory and Applications, pp. 653–661 (2019).https://doi.org/10.1007/978-3-030-16657-1_61
Jamak, A., Alen, S., Can, M.: Principal component analysis for authorship attribution. Bus. Syst. Res. 3(2), 49–56 (2012)
Zhao, Y., Zobel, J.: Searching with style: authorship attribution in classic literature. In: Proceedings of the Thirtieth Australasian Conference on Computer Science, Australian Computer Society, vol. 62, pp. 59–68 (2007)
Vysotska, V., Lytvyn, V., Hrendus, M., Kubinska, S., Brodyak, O.: Method of textual information authorship analysis based on stylometry. In: Proceedings of the XIII-th Scientific and Technical Conference, CSIT, Lviv, pp. 9–16 (2018)
Bisikalo, O.: Sentence syntactic analysis application to keywords identification Ukrainian texts. Radioelectronics Comput. Sci. Control 3(38), 54–65 (2016)
Altmann, G., Levickij, V., Perebyinis, V.: Problems of Quantitative Linguistics. Ruta, Chernivtsy (2005)
Ontika, N.N., Kabir, Md.F., Ashraful, I., Ahmed, E., Huda, M.N.: A computational approach to author identification from Bengali song lyrics. In: Proceedings of International Joint Conference on Computational Intelligence, pp. 359–369 (2019). https://doi.org/10.1007/978-981-13-7564-4_31
Burrows, J.: Delta: a measure of stylistic difference and a guide to likely authorship. Literary Linguist. Comput. 17(3), 267–287 (2002)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-based authorship attribution without lexical measures. Comput. Humanit. 35/2, 193–214 (2001)
Koppel, M.: Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Shulzinger, E., Bormashenko, E.: On the universal quantitative pattern of the distribution of initial characters in general dictionaries: the exponential distribution is valid for various languages. J. Quant. Linguist. 24, 273–288 (2017)
Hausner, M.: Elementary Probability Theory. Springer, Boston (1995)
Martindale, C.: On the utility of content analysis in author attribution: the federalist. Comput. Humanit. 29(4), 259–270 (1995)
Waheed, A., Imran, S.B., Shabana, R.: Design and implementation of a machine learning-based authorship identification model. Sci. Program. 1–14 (2019). https://doi.org/10.1155/2019/9431073
Ivanov, L., Aebig, A., Meerman, S.: Lexical stress-based authorship attribution with accurate pronunciation patterns selection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech, and Dialogue, pp. 67–75. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_7
Kolmogorov, A.N.: Foundations of the Theory of Probability. Chelsea Publishing (1950)
Kolmogorov, A.N.: Mathematics and its Historical Development. Nauka, Moscow (1991). Edited by V. A. Uspensky
Gnedenko, B.V., Kolmogorov, A.N.: Limit Distributions for Sums of Independent Variables. Addison-Wesley, Boston (1968)
Khomytska, I., Teslyuk, V.: Authorship and style attribution by statistical methods of style differentiation on the phonological level. In: Shakhovska, N. (ed.) Advances in Intelligent Systems and Computing III, Lviv, vol. 871, pp. 105–118 (2018)
Gomez, P.C.: Statistical Methods in Language and Linguistic Research. University of Murcia, Spain (2013)
Watanabe, S.: Probability Theory and Mathematical Statistics. Springer, Heidelberg (1988)
Gries, Th.S.: Statistics for Linguistics with R: A Practical Introduction (Trends in Linguistics: Studies & Monographs), p. 348 (2009)
Kornai, A.: A Mathematical Linguistics. Springer, London (2008). https://doi.org/10.1007/978-1-84628-986-6
Khomytska, I., Teslyuk, V., Holovatyy, A., Morushko, O.: Development of methods, models and means for the author attribution of a text. Eastern Eur. J. Enterp. Technol. 3/2(93), 41–46 (2018)
Rozanov, I.A., Silverman, R.A.: Probability Theory: A Concise Course. Dover Publications Inc., New York (2007)
Jorgensen, P.E.T.: Analysis and Probability. Springer, New York (2006)
Bhattacharya, R., Waymire, E.C.: A Basic Course in Probability Theory, 2nd edn. Springer, Cham (2017)
Khomytska, I., Teslyuk, V., Labinska, L.: Program system of authorship attribution of texts on the phonological level. In: Proceedings of the XIV-th Scientific-Practical Conference “Problems and Perspectives of Development of Economics, Enterprise and Computer Technologies in Ukraine”, pp. 15–16. LPNU, Lviv (2018)
Bobalo, Yu., Seniv, M., Yakovyna, V., Symets, I.: Method of reliability block diagram visualization and automated construction of technical system operability condition. In: Shakhovska, N. (ed.) Advances in Intelligent Systems and Computing III, vol. 871, pp. 599–610 (2019)
Batyuk, A., Voityshyn, V., Verhun, V.: Software architecture design of the real-time processes monitoring platform. In: IEEE Second International Conference on Data Stream Mining & Processing (DSMP 2018), Lviv, Ukraine, pp. 98–101 (2018)
Shakhovska, N., Vysotska, V., Chyrun, L.: Intelligent systems design of distance learning realization for modern youth promotion and involvement in independent scientific researches. In: Shakhovska, N. (ed.) Advances in Intelligent Systems and Computing, vol. 512, pp. 175–198 (2016)
Denysyuk, P.: Usage of XML for fluidic MEMS database design. In: International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH 2006), Lviv-Polyana, p. 148 (2007)
Teorey, T.J., Lightstone, S.S., et al.: Database Design: Know it all, 1st edn. Morgan Kaufmann Publishers, Burlington (2009)
Horstmann, C.: Java SE 9. Core Java SE 9 for the Impatient. “Bильямc” (2018)
Khomytska, I., Teslyuk, V.: Mathematical methods applied for authorship attribution on the phonological level. In: Proceedings of International Scientific Conference Computer Sciences and Information Technologies (CSIT-2019), vol. 3, pp. 7–11 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Khomytska, I., Teslyuk, V. (2020). Statistical Models for Authorship Attribution. In: Shakhovska, N., Medykovskyy, M.O. (eds) Advances in Intelligent Systems and Computing IV. CSIT 2019. Advances in Intelligent Systems and Computing, vol 1080. Springer, Cham. https://doi.org/10.1007/978-3-030-33695-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-33695-0_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33694-3
Online ISBN: 978-3-030-33695-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)