Abstract
A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Ahmed and E. Xing. Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. Uncertainty in Artificial Intelligence, 2010.
A. Ahmed and E. P. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications t evolutionary clustering. In SDM, pages 219–230, 2008.
C. Andrieu, N. De Freitas, A. Doucet, and M. Jordan. An introduction to mcmc for machine learning. Machine learning, 50(1):5–43, 2003.
D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 25–32, New York, NY, USA, 2009. ACM.
L. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164– 171, 1970.
D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34(1):211–231, 1999.
J. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.
C. Bishop. Pattern recognition and machine learning. Springer, New York, 2006.
D. M. Blei, T. L. Griffiths, and M. I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, Aug 2009.
D. M. Blei and M. I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121–144, 2005.
D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.
S. Borman. The expectation maximization algorithm: A short tutorial. Unpublished Technical report, 2004. Available online at http://www.seanborman.com/publications.
M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Discriminative learning over constrained latent representations. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), 6, 2010.
M.-W. Chang, N. Rizzolo, and D. Roth. Integer linear programming in nlp – constrained conditional models. Tutorial, NAACL, 2010.
H. Chen. Parallel implementations of probabilistic latent semantic analysis on graphic processing units. Computer science, University of Illinois at Urbana–Champaign, 2011.
S. Chhabra, W. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In ICDM Conference, pages 347–350, 2004.
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51:661–703, November 2009.
F. Cozman. Generalizing variable elimination in bayesian networks. In Workshop on Probabilistic Reasoning in Artificial Intelligence, pages 27–32, 2000.
L. de Campos, J. Fern´andez-Luna, and J. Huete. Bayesian networks and information retrieval: an introduction to the special issue. Information processing & management, 40(5):727–733, 2004.
F. Dellaert. The expectation maximization algorithm. Technical report, 2002.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating nonlocal information into information extraction systems by gibbs sampling. In ACL, 2005.
G. Forney Jr. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence, volume 16, pages 1300–1309, 1999.
K. Ganchev, J. A. Gra,ca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001–2049, Aug. 2010.
T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In NIPS, pages 475–482, 2005.
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228–5235, 2004.
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, 1999.
T. Hofmann. Probabilistic latent semantic indexing. In ACM SIGIR Conference, pages 50–57, 1999.
C. Hong, W. Chen, W. Zheng, J. Shan, Y. Chen, and Y. Zhang. Parallelization and characterization of probabilistic latent semantic analysis. International Conference on Parallel Processing, 0:628– 635, 2008.
M. I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.
M. I. Jordan. Dirichlet processes, chinese restaurant processes and all that. Tutorial presentation at the NIPS Conference, 2005.
C. T. Kelley. Iterative methods for optimization. Frontiers in Applied Mathematics, SIAM, 1999.
R. Kindermann, J. Snell, and A. M. Society. Markov random fields and their applications. American Mathematical Society Providence, RI, 1980.
D. Koller and N. Friedman. Probabilistic graphical models. MIT press, 2009. [37] J. Kupiec. Robust part-of-speech tagging using a hidden markov model. Computer Speech & Language, 6(3):225–242, 1992.
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289, 2001.
J.-M. Marin, K. L. Mengersen, and C. Robert. Bayesian modelling and inference on mixtures of distributions. In D. Dey and C. Rao, editors, Handbook of Statistics: Volume 25. Elsevier, 2005.
A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 188–191. Association for Computational Linguistics, 2003.
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW Conference, 2008.
Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW Conference, pages 171–180, 2007.
Q. Mei and C. Zhai. A mixture model for contextual text mining. In ACM KDD Conference, pages 649–655, 2006.
D. Metzler and W. Croft. A markov random field model for term dependencies. In ACM SIGIR Conference, pages 472–479, 2005.
T. Minka. Expectation propagation for approximate bayesian inference. In Uncertainty in Artificial Intelligence, volume 17, pages 362–369, 2001.
K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, volume 9, pages 467–475, 1999.
R. Nallapati, W. Cohen, and J. Lafferty. Parallelized variational em for latent dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 349–354, 2007.
R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS Conference, 2007.
K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 39:103–134, May 2000.
P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning, pages 81–89. 2010.
J. Pitman and M. Yor. The Two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In ACM KDD Conference, pages 569–577, 2008.
V. Punyakanok, D. Roth, W. Yih, and D. Zimak. Learning and inference over constrained output. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1124– 1129, 2005.
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257– 286, 1989.
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, pages 4–15, January 1986.
C. E. Rasmussen. The infinite gaussian mixture model. In In Advances in Neural Information Processing Systems 12, volume 12, pages 554–560, 2000.
C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107–136, 2006.
D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In International Conference on Machine Learning (ICML), pages 737–744, 2005.
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization, 1998.
J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994.
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 134–141, 2003.
Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493–502, 2009.
C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 95–130, 2006.
Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 985–992, 2006.
Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
R. Thibaux and M. I. Jordan. Hierarchical beta processes and the indian buffet process. Journal of Machine Learning Research – Proceedings Track, 2:564–571, 2007.
Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management, pages 301–314, 2009.
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In ACM KDD Conference, pages 743– 748, 2004.
J. Zhang, Y. Song, C. Zhang, and S. Liu. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In ACM KDD Conference, pages 1079–1088, New York, NY, USA, 2010. ACM.
X. Zhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical report, Carnegie Mellon University, 2005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sun, Y., Deng, H., Han, J. (2012). Probabilistic Models for Text Mining. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_8
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3223-4_8
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)