Skip to main content

Probabilistic Models for Text Mining

  • Chapter
  • First Online:
Mining Text Data

Abstract

A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Ahmed and E. Xing. Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. Uncertainty in Artificial Intelligence, 2010.

    Google Scholar 

  2. A. Ahmed and E. P. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications t evolutionary clustering. In SDM, pages 219–230, 2008.

    Google Scholar 

  3. C. Andrieu, N. De Freitas, A. Doucet, and M. Jordan. An introduction to mcmc for machine learning. Machine learning, 50(1):5–43, 2003.

    Article  MATH  Google Scholar 

  4. D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 25–32, New York, NY, USA, 2009. ACM.

    Google Scholar 

  5. L. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164– 171, 1970.

    Article  MathSciNet  MATH  Google Scholar 

  6. D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34(1):211–231, 1999.

    Article  MATH  Google Scholar 

  7. J. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.

    Google Scholar 

  8. C. Bishop. Pattern recognition and machine learning. Springer, New York, 2006.

    MATH  Google Scholar 

  9. D. M. Blei, T. L. Griffiths, and M. I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, Aug 2009.

    Google Scholar 

  10. D. M. Blei and M. I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121–144, 2005.

    MathSciNet  Google Scholar 

  11. D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.

    MATH  Google Scholar 

  12. S. Borman. The expectation maximization algorithm: A short tutorial. Unpublished Technical report, 2004. Available online at http://www.seanborman.com/publications.

    Google Scholar 

  13. M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Discriminative learning over constrained latent representations. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), 6, 2010.

    Google Scholar 

  14. M.-W. Chang, N. Rizzolo, and D. Roth. Integer linear programming in nlp – constrained conditional models. Tutorial, NAACL, 2010.

    Google Scholar 

  15. H. Chen. Parallel implementations of probabilistic latent semantic analysis on graphic processing units. Computer science, University of Illinois at Urbana–Champaign, 2011.

    Google Scholar 

  16. S. Chhabra, W. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In ICDM Conference, pages 347–350, 2004.

    Google Scholar 

  17. C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.

    Google Scholar 

  18. A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51:661–703, November 2009.

    Article  MathSciNet  MATH  Google Scholar 

  19. F. Cozman. Generalizing variable elimination in bayesian networks. In Workshop on Probabilistic Reasoning in Artificial Intelligence, pages 27–32, 2000.

    Google Scholar 

  20. L. de Campos, J. Fern´andez-Luna, and J. Huete. Bayesian networks and information retrieval: an introduction to the special issue. Information processing & management, 40(5):727–733, 2004.

    Google Scholar 

  21. F. Dellaert. The expectation maximization algorithm. Technical report, 2002.

    Google Scholar 

  22. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  23. J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating nonlocal information into information extraction systems by gibbs sampling. In ACL, 2005.

    Google Scholar 

  24. G. Forney Jr. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.

    Article  MathSciNet  Google Scholar 

  25. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence, volume 16, pages 1300–1309, 1999.

    Google Scholar 

  26. K. Ganchev, J. A. Gra,ca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001–2049, Aug. 2010.

    Google Scholar 

  27. T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In NIPS, pages 475–482, 2005.

    Google Scholar 

  28. T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228–5235, 2004.

    Google Scholar 

  29. T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, 1999.

    Google Scholar 

  30. T. Hofmann. Probabilistic latent semantic indexing. In ACM SIGIR Conference, pages 50–57, 1999.

    Google Scholar 

  31. C. Hong, W. Chen, W. Zheng, J. Shan, Y. Chen, and Y. Zhang. Parallelization and characterization of probabilistic latent semantic analysis. International Conference on Parallel Processing, 0:628– 635, 2008.

    Google Scholar 

  32. M. I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.

    MATH  Google Scholar 

  33. M. I. Jordan. Dirichlet processes, chinese restaurant processes and all that. Tutorial presentation at the NIPS Conference, 2005.

    Google Scholar 

  34. C. T. Kelley. Iterative methods for optimization. Frontiers in Applied Mathematics, SIAM, 1999.

    Book  MATH  Google Scholar 

  35. R. Kindermann, J. Snell, and A. M. Society. Markov random fields and their applications. American Mathematical Society Providence, RI, 1980.

    Google Scholar 

  36. D. Koller and N. Friedman. Probabilistic graphical models. MIT press, 2009. [37] J. Kupiec. Robust part-of-speech tagging using a hidden markov model. Computer Speech & Language, 6(3):225–242, 1992.

    Google Scholar 

  37. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289, 2001.

    Google Scholar 

  38. J.-M. Marin, K. L. Mengersen, and C. Robert. Bayesian modelling and inference on mixtures of distributions. In D. Dey and C. Rao, editors, Handbook of Statistics: Volume 25. Elsevier, 2005.

    Google Scholar 

  39. A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 188–191. Association for Computational Linguistics, 2003.

    Google Scholar 

  40. Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW Conference, 2008.

    Google Scholar 

  41. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW Conference, pages 171–180, 2007.

    Google Scholar 

  42. Q. Mei and C. Zhai. A mixture model for contextual text mining. In ACM KDD Conference, pages 649–655, 2006.

    Google Scholar 

  43. D. Metzler and W. Croft. A markov random field model for term dependencies. In ACM SIGIR Conference, pages 472–479, 2005.

    Google Scholar 

  44. T. Minka. Expectation propagation for approximate bayesian inference. In Uncertainty in Artificial Intelligence, volume 17, pages 362–369, 2001.

    Google Scholar 

  45. K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, volume 9, pages 467–475, 1999.

    Google Scholar 

  46. R. Nallapati, W. Cohen, and J. Lafferty. Parallelized variational em for latent dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 349–354, 2007.

    Google Scholar 

  47. R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.

    Article  MathSciNet  Google Scholar 

  48. D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS Conference, 2007.

    Google Scholar 

  49. K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 39:103–134, May 2000.

    Article  MATH  Google Scholar 

  50. P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning, pages 81–89. 2010.

    Google Scholar 

  51. J. Pitman and M. Yor. The Two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  52. I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In ACM KDD Conference, pages 569–577, 2008.

    Google Scholar 

  53. V. Punyakanok, D. Roth, W. Yih, and D. Zimak. Learning and inference over constrained output. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1124– 1129, 2005.

    Google Scholar 

  54. L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257– 286, 1989.

    Article  Google Scholar 

  55. L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, pages 4–15, January 1986.

    Google Scholar 

  56. C. E. Rasmussen. The infinite gaussian mixture model. In In Advances in Neural Information Processing Systems 12, volume 12, pages 554–560, 2000.

    Google Scholar 

  57. C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

    Google Scholar 

  58. M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107–136, 2006.

    Google Scholar 

  59. D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In International Conference on Machine Learning (ICML), pages 737–744, 2005.

    Google Scholar 

  60. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  61. J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994.

    MathSciNet  MATH  Google Scholar 

  62. F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 134–141, 2003.

    Google Scholar 

  63. Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493–502, 2009.

    Google Scholar 

  64. C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 95–130, 2006.

    Google Scholar 

  65. Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 985–992, 2006.

    Google Scholar 

  66. Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.

    Google Scholar 

  67. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  68. R. Thibaux and M. I. Jordan. Hierarchical beta processes and the indian buffet process. Journal of Machine Learning Research – Proceedings Track, 2:564–571, 2007.

    Google Scholar 

  69. Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management, pages 301–314, 2009.

    Google Scholar 

  70. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In ACM KDD Conference, pages 743– 748, 2004.

    Google Scholar 

  71. J. Zhang, Y. Song, C. Zhang, and S. Liu. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In ACM KDD Conference, pages 1079–1088, New York, NY, USA, 2010. ACM.

    Google Scholar 

  72. X. Zhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical report, Carnegie Mellon University, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yizhou Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sun, Y., Deng, H., Han, J. (2012). Probabilistic Models for Text Mining. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3223-4_8

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-3222-7

  • Online ISBN: 978-1-4614-3223-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics