Probabilistic Models for Text Mining

Sun, Yizhou; Deng, Hongbo; Han, Jiawei

doi:10.1007/978-1-4614-3223-4_8

Yizhou Sun³,
Hongbo Deng³ &
Jiawei Han³

19k Accesses
6 Citations

Abstract

A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Ahmed and E. Xing. Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. Uncertainty in Artificial Intelligence, 2010.
Google Scholar
A. Ahmed and E. P. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications t evolutionary clustering. In SDM, pages 219–230, 2008.
Google Scholar
C. Andrieu, N. De Freitas, A. Doucet, and M. Jordan. An introduction to mcmc for machine learning. Machine learning, 50(1):5–43, 2003.
Article MATH Google Scholar
D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 25–32, New York, NY, USA, 2009. ACM.
Google Scholar
L. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164– 171, 1970.
Article MathSciNet MATH Google Scholar
D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34(1):211–231, 1999.
Article MATH Google Scholar
J. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.
Google Scholar
C. Bishop. Pattern recognition and machine learning. Springer, New York, 2006.
MATH Google Scholar
D. M. Blei, T. L. Griffiths, and M. I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, Aug 2009.
Google Scholar
D. M. Blei and M. I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121–144, 2005.
MathSciNet Google Scholar
D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.
MATH Google Scholar
S. Borman. The expectation maximization algorithm: A short tutorial. Unpublished Technical report, 2004. Available online at http://www.seanborman.com/publications.
Google Scholar
M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Discriminative learning over constrained latent representations. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), 6, 2010.
Google Scholar
M.-W. Chang, N. Rizzolo, and D. Roth. Integer linear programming in nlp – constrained conditional models. Tutorial, NAACL, 2010.
Google Scholar
H. Chen. Parallel implementations of probabilistic latent semantic analysis on graphic processing units. Computer science, University of Illinois at Urbana–Champaign, 2011.
Google Scholar
S. Chhabra, W. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In ICDM Conference, pages 347–350, 2004.
Google Scholar
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.
Google Scholar
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51:661–703, November 2009.
Article MathSciNet MATH Google Scholar
F. Cozman. Generalizing variable elimination in bayesian networks. In Workshop on Probabilistic Reasoning in Artificial Intelligence, pages 27–32, 2000.
Google Scholar
L. de Campos, J. Fern´andez-Luna, and J. Huete. Bayesian networks and information retrieval: an introduction to the special issue. Information processing & management, 40(5):727–733, 2004.
Google Scholar
F. Dellaert. The expectation maximization algorithm. Technical report, 2002.
Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
MathSciNet MATH Google Scholar
J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating nonlocal information into information extraction systems by gibbs sampling. In ACL, 2005.
Google Scholar
G. Forney Jr. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.
Article MathSciNet Google Scholar
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence, volume 16, pages 1300–1309, 1999.
Google Scholar
K. Ganchev, J. A. Gra,ca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001–2049, Aug. 2010.
Google Scholar
T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In NIPS, pages 475–482, 2005.
Google Scholar
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228–5235, 2004.
Google Scholar
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, 1999.
Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In ACM SIGIR Conference, pages 50–57, 1999.
Google Scholar
C. Hong, W. Chen, W. Zheng, J. Shan, Y. Chen, and Y. Zhang. Parallelization and characterization of probabilistic latent semantic analysis. International Conference on Parallel Processing, 0:628– 635, 2008.
Google Scholar
M. I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.
MATH Google Scholar
M. I. Jordan. Dirichlet processes, chinese restaurant processes and all that. Tutorial presentation at the NIPS Conference, 2005.
Google Scholar
C. T. Kelley. Iterative methods for optimization. Frontiers in Applied Mathematics, SIAM, 1999.
Book MATH Google Scholar
R. Kindermann, J. Snell, and A. M. Society. Markov random fields and their applications. American Mathematical Society Providence, RI, 1980.
Google Scholar
D. Koller and N. Friedman. Probabilistic graphical models. MIT press, 2009. [37] J. Kupiec. Robust part-of-speech tagging using a hidden markov model. Computer Speech & Language, 6(3):225–242, 1992.
Google Scholar
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289, 2001.
Google Scholar
J.-M. Marin, K. L. Mengersen, and C. Robert. Bayesian modelling and inference on mixtures of distributions. In D. Dey and C. Rao, editors, Handbook of Statistics: Volume 25. Elsevier, 2005.
Google Scholar
A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 188–191. Association for Computational Linguistics, 2003.
Google Scholar
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW Conference, 2008.
Google Scholar
Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW Conference, pages 171–180, 2007.
Google Scholar
Q. Mei and C. Zhai. A mixture model for contextual text mining. In ACM KDD Conference, pages 649–655, 2006.
Google Scholar
D. Metzler and W. Croft. A markov random field model for term dependencies. In ACM SIGIR Conference, pages 472–479, 2005.
Google Scholar
T. Minka. Expectation propagation for approximate bayesian inference. In Uncertainty in Artificial Intelligence, volume 17, pages 362–369, 2001.
Google Scholar
K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, volume 9, pages 467–475, 1999.
Google Scholar
R. Nallapati, W. Cohen, and J. Lafferty. Parallelized variational em for latent dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 349–354, 2007.
Google Scholar
R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.
Article MathSciNet Google Scholar
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS Conference, 2007.
Google Scholar
K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 39:103–134, May 2000.
Article MATH Google Scholar
P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning, pages 81–89. 2010.
Google Scholar
J. Pitman and M. Yor. The Two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.
Article MathSciNet MATH Google Scholar
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In ACM KDD Conference, pages 569–577, 2008.
Google Scholar
V. Punyakanok, D. Roth, W. Yih, and D. Zimak. Learning and inference over constrained output. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1124– 1129, 2005.
Google Scholar
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257– 286, 1989.
Article Google Scholar
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, pages 4–15, January 1986.
Google Scholar
C. E. Rasmussen. The infinite gaussian mixture model. In In Advances in Neural Information Processing Systems 12, volume 12, pages 554–560, 2000.
Google Scholar
C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
Google Scholar
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107–136, 2006.
Google Scholar
D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In International Conference on Machine Learning (ICML), pages 737–744, 2005.
Google Scholar
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization, 1998.
Google Scholar
J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994.
MathSciNet MATH Google Scholar
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 134–141, 2003.
Google Scholar
Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493–502, 2009.
Google Scholar
C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 95–130, 2006.
Google Scholar
Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 985–992, 2006.
Google Scholar
Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.
Google Scholar
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
Article MathSciNet MATH Google Scholar
R. Thibaux and M. I. Jordan. Hierarchical beta processes and the indian buffet process. Journal of Machine Learning Research – Proceedings Track, 2:564–571, 2007.
Google Scholar
Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management, pages 301–314, 2009.
Google Scholar
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In ACM KDD Conference, pages 743– 748, 2004.
Google Scholar
J. Zhang, Y. Song, C. Zhang, and S. Liu. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In ACM KDD Conference, pages 1079–1088, New York, NY, USA, 2010. ACM.
Google Scholar
X. Zhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical report, Carnegie Mellon University, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA
Yizhou Sun, Hongbo Deng & Jiawei Han

Authors

Yizhou Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yizhou Sun .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
Charu C. Aggarwal
at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA
ChengXiang Zhai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, Y., Deng, H., Han, J. (2012). Probabilistic Models for Text Mining. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3223-4_8
Published: 07 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics