Abstract
Latent Dirichlet Allocation (LDA) is a probabilistic framework by which we may assume each word carries probability distribution to each topic and a topic carries a distribution to each document. By putting all the documents together into one collection by each author, it is possible to identify authors. Here we show that author identification is fully reliable within a framework of LDA independent of documents domains by learning incomplete and massive documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Griffiths, T.L., Steyvers, M.: Finding Scientific Topics. Proc. National Academy of Sciences 101 (2004)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999)
Holmes, D., Forsyth, R.: The Federalist revised: New directions in authorship attribution. Literary and Linguistic Computing 10-2, 111–127 (1995)
Nakayama, M., Miura, T.: Identifying Topics by using Word Distribution. In: Proc. PACRIM (2007)
Rosen-Zvi, M., Griffiths, Steyvers, M., Smyth, T.: The author-topic model for authors and documents. In: UAI 2004 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shirai, M., Miura, T. (2011). On Domain Independence of Author Identification. In: Yin, H., Wang, W., Rayward-Smith, V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23878-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-23878-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23877-2
Online ISBN: 978-3-642-23878-9
eBook Packages: Computer ScienceComputer Science (R0)