Skip to main content

A Few Words on Topic Modeling

  • Chapter
  • First Online:
Building Dialogue POMDPs from Expert Dialogues

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

Topic modeling techniques are used to discover the topics for (unlabeled) texts. As such, they are considered as unsupervised learning techniques which try to learn the patterns inside the text by considering words as observations. In this context, latent Dirichlet allocation (LDA) is a Bayesian topic modeling approach which has useful properties particularly for practical applications (Blei et al. 2003). In this section, we go through LDA by first reviewing the Dirichlet distribution, which is the basic distribution used in LDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Balakrishnan, N., & Nevzorov, V. (2004). A primer on statistical distributions. John Wiley & Sons.

    Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Secaucus, New York: Springer.

    MATH  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Brown, L. D. (1986). Fundamentals of statistical exponential families: With applications in statistical decision theory. Hayworth, CA: Institute of Mathematical Statistics.

    MATH  Google Scholar 

  • Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Conference on Applied Natural Language Processing (ANLP’88), Austin, TX.

    Google Scholar 

  • Darmois, G. (1935). Sur les lois de probabilité à estimation exhaustive. Comptes Rendus de l’Acad’emie des Sciences Paris, 260, 1265–1266.

    Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594–604), 309–368.

    Article  MATH  Google Scholar 

  • Fox, E. B. (2009). Bayesian nonparametric learning of complex dynamical phenomena. Ph.D. thesis, Massachusetts Institute of Technology.

    Google Scholar 

  • Hazewinkel, M. (Ed.). (2002). Encyclopedia of mathematics. Berlin: Springer.

    Google Scholar 

  • Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99), Stockholm.

    Google Scholar 

  • Huang, J. (2005). Maximum likelihood estimation of Dirichlet distribution parameters. CMU Technique Report.

    Google Scholar 

  • Jurafsky, D., & Martin, J. H. (2009). Speech and language processing (2nd ed.). Upper Saddle River, NJ: Prentice-Hall.

    Google Scholar 

  • Koopman, B. O. (1936). On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society, 39, 399–409.

    Article  MathSciNet  Google Scholar 

  • Kotz, S., Johnson, N., & Balakrishnan, N. (2000). Continuous multivariate distributions: Models and applications (Vol. 1). New York: Wiley-Interscience.

    Book  Google Scholar 

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Neapolitan, R. (2004). Learning Bayesian networks. Upper Saddle River, NJ: Pearson Prentice Hall.

    Google Scholar 

  • Neapolitan, R. (2009). Probabilistic methods for bionformatics: With an introduction to Bayesian networks. New York: Morgan Kaufmann.

    Google Scholar 

  • Pitman, E. (1936). Sufficient statistics and intrinsic accuracy. Proceedings of the Cambridge Philosophical Society, 32, 567–579.

    Google Scholar 

  • Rabiner, L. R. (1990). A tutorial on hidden Markov models and selected applications in speech recognition. In Readings in speech recognition (pp. 267–296). San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Robert, C. P., & Casella, G. (2005). Monte Carlo statistical methods. Springer texts in statistics. Secaucus, New York: Springer.

    Google Scholar 

  • Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. New York: Prentice Hall.

    Google Scholar 

  • Sudderth, E. B. (2006). Graphical models for visual object recognition and tracking. Ph.D. thesis, Massachusetts Institute of Technology.

    Google Scholar 

  • Welch, L. (2003). Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter, 53(4), 1–10.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 The Authors

About this chapter

Cite this chapter

Chinaei, H., Chaib-draa, B. (2016). A Few Words on Topic Modeling. In: Building Dialogue POMDPs from Expert Dialogues. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-26200-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26200-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26198-0

  • Online ISBN: 978-3-319-26200-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics