Abstract
Topic modeling techniques are used to discover the topics for (unlabeled) texts. As such, they are considered as unsupervised learning techniques which try to learn the patterns inside the text by considering words as observations. In this context, latent Dirichlet allocation (LDA) is a Bayesian topic modeling approach which has useful properties particularly for practical applications (Blei et al. 2003). In this section, we go through LDA by first reviewing the Dirichlet distribution, which is the basic distribution used in LDA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balakrishnan, N., & Nevzorov, V. (2004). A primer on statistical distributions. John Wiley & Sons.
Bishop, C. M. (2006). Pattern recognition and machine learning. Secaucus, New York: Springer.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Brown, L. D. (1986). Fundamentals of statistical exponential families: With applications in statistical decision theory. Hayworth, CA: Institute of Mathematical Statistics.
Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Conference on Applied Natural Language Processing (ANLP’88), Austin, TX.
Darmois, G. (1935). Sur les lois de probabilité à estimation exhaustive. Comptes Rendus de l’Acad’emie des Sciences Paris, 260, 1265–1266.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.
Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594–604), 309–368.
Fox, E. B. (2009). Bayesian nonparametric learning of complex dynamical phenomena. Ph.D. thesis, Massachusetts Institute of Technology.
Hazewinkel, M. (Ed.). (2002). Encyclopedia of mathematics. Berlin: Springer.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99), Stockholm.
Huang, J. (2005). Maximum likelihood estimation of Dirichlet distribution parameters. CMU Technique Report.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing (2nd ed.). Upper Saddle River, NJ: Prentice-Hall.
Koopman, B. O. (1936). On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society, 39, 399–409.
Kotz, S., Johnson, N., & Balakrishnan, N. (2000). Continuous multivariate distributions: Models and applications (Vol. 1). New York: Wiley-Interscience.
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Neapolitan, R. (2004). Learning Bayesian networks. Upper Saddle River, NJ: Pearson Prentice Hall.
Neapolitan, R. (2009). Probabilistic methods for bionformatics: With an introduction to Bayesian networks. New York: Morgan Kaufmann.
Pitman, E. (1936). Sufficient statistics and intrinsic accuracy. Proceedings of the Cambridge Philosophical Society, 32, 567–579.
Rabiner, L. R. (1990). A tutorial on hidden Markov models and selected applications in speech recognition. In Readings in speech recognition (pp. 267–296). San Francisco: Morgan Kaufmann Publishers.
Robert, C. P., & Casella, G. (2005). Monte Carlo statistical methods. Springer texts in statistics. Secaucus, New York: Springer.
Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. New York: Prentice Hall.
Sudderth, E. B. (2006). Graphical models for visual object recognition and tracking. Ph.D. thesis, Massachusetts Institute of Technology.
Welch, L. (2003). Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter, 53(4), 1–10.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 The Authors
About this chapter
Cite this chapter
Chinaei, H., Chaib-draa, B. (2016). A Few Words on Topic Modeling. In: Building Dialogue POMDPs from Expert Dialogues. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-26200-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-26200-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26198-0
Online ISBN: 978-3-319-26200-0
eBook Packages: EngineeringEngineering (R0)