Abstract
We address the question of predicting the time when a book was written using the Google Books Ngram corpus. This prediction could be useful for authorship and plagiarism detection, identification of literary movements, and forensic document examination. We propose an unsupervised approach and compare this with four baseline measures on a dataset consisting of 36 books written between 1551 and 1969. The proposed approach could be applicable to other languages as long as corpora of those languages similar to the Google Books Ngram are available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akiva, N.: Authorship and plagiarism detection using binary bow features. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF (Online Working Notes/Labs/Workshop) (2012)
Amancio, D.R., Oliveira, O.N., da Fontoura Costa, L.: Identification of literary movements using complex networks to represent texts. New Journal of Physics 14, 043029 (2012)
A simplified guide to forensic document examination (2013), http://www.crime-scene-investigator.net/SimplifiedGuideQuestionedDocuments.pdf (accessed: February 7, 2015)
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, T.G.B., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331, 176â182 (2011)
Lin, Y., Michel, J.B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, ACL 2012, pp. 169â174. Association for Computational Linguistics, Stroudsburg (2012)
Barufaldi, B., Santana, E., Filho, J., van der Poel, J., Marques, M., Batista, L.: Text classification by literary period using ppm-c data compression. In: 2009 Seventh Brazilian Symposium in Information and Human Language Technology (STIL), pp. 125â133 (2009)
Kim, S., Kim, H., Weninger, T., Han, J.: Authorship classification: A syntactic tree mining approach. In: Proceedings of the ACM SIGKDD Workshop on Useful Patterns, UP 2010, pp. 65â73. ACM, New York (2010)
Kessler, B., Numberg, G., SchÃŒtze, H.: Automatic detection of text genre. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, ACL 1998, pp. 32â38. Association for Computational Linguistics, Stroudsburg (1997)
Thisted, R., Efron, B.: Did Shakespeare write a newly-discovered poem? Biometrika 74, 445â455 (1987)
Thompson, J.R., Rasp, J.: Did C. S. Lewis write The Dark Tower?: An examination of the small-sample properties of the Thisted-Efron tests of authorship. Austrian Journal of Statistics 38, 71â82 (2009)
Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical report, Google Research (2006)
http://www.goodreads.com/ (accessed: January 15, 2015)
https://www.gutenberg.org/ (accessed: January 15, 2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Islam, A., Mei, J., Milios, E.E., Kešelj, V. (2015). When was Macbeth Written? Mapping Book to Time. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-18111-0_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)