Learning Continuous Latent Variable Models with Bregman Divergences

Wang, Shaojun; Schuurmans, Dale

doi:10.1007/978-3-540-39624-6_16

Shaojun Wang⁴ &
Dale Schuurmans⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2842))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

385 Accesses
3 Citations

Abstract

We present a class of unsupervised statistical learning algorithms that are formulated in terms of minimizing Bregman divergences— a family of generalized entropy measures defined by convex functions. We obtain novel training algorithms that extract hidden latent structure by minimizing a Bregman divergence on training data, subject to a set of non-linear constraints which consider hidden variables. An alternating minimization procedure with nested iterative scaling is proposed to find feasible solutions for the resulting constrained optimization problem. The convergence of this algorithm along with its information geometric properties are characterized.

Index Terms — statistical machine learning, unsupervised learning, Bregman divergence, information geometry, alternating minimization, forward projection, backward projection, iterative scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauschke, H., Borwein, J.: Joint and Separate Convexity of the Bregman Distance. In: Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, pp. 23–36. Elsevier, Amsterdam (2001)
Google Scholar
Borwein, J., Lewis, A.: Duality relationships for entropy-like minimization problems. SIAM J. Control Optim. 29(2), 325–338 (1991)
Article MATH MathSciNet Google Scholar
Borwein, J., Lewis, A.: Convex Analysis and Nolinear Optimization. Springer, Heidelberg (2000)
Google Scholar
Bregman, L.: The relaxation method of finding the common point of convex sets and its applications to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967)
Article Google Scholar
Buja, A., Stuetzle, W.: Degrees of Boosting (2002) (manuscript)
Google Scholar
Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections with applications to split feasibility and Kullback-Leibler distance minimization. Annals of Operations Research 105, 77–98 (2001)
Article MATH MathSciNet Google Scholar
Censor, Y., Zenios, S.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, Oxford (1997)
MATH Google Scholar
Collins, M., Schapire, R., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1–3), 253–285 (2002)
Article MATH Google Scholar
Csiszar, I., Tusnady, G.: Information geometry and alternating minimization procedures. Statistics and Decisions 1, 205–237 (1984)
MathSciNet Google Scholar
Csiszar, I.: Why least squares and maximum entropy? The Annals of Statisics 19(4), 2032–2066 (1991)
Article MATH MathSciNet Google Scholar
Csiszar, I.: Generalized projections for non-negative functions. Acta Mathematica Hungarica 68(1–2), 161–185 (1995)
Article MATH MathSciNet Google Scholar
Csiszar, I.: Maxent, mathematics, and information theory. In: Hanson, K., Silver, R. (eds.) Maximum Entropy and Bayesian Methods, pp. 35–50. Kluwer, Dordrecht (1996)
Google Scholar
Della Pietra, S., Della Pietra, V., Lafferty, J.: Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109, CMU (2001)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood estimation from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Eggermont, P., LaRiccia, V.: On EM-like algorithms for minimum distance estimation. Technical Report, Mathematical Sciences, University of Delaware (1998)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics 28(2), 337–407 (2000)
Article MATH MathSciNet Google Scholar
Johnson, R., Shore, J.: Which is the better entropy expression for speech processing: -S log S or log S? IEEE Transactions on Acoustics, Speech, and Signal Processing 32(1), 129–137 (1984)
Article Google Scholar
Lafferty, J., Della Pietra, S., Della Pietra, V.: Statistical learning algorithms based on Bregman distances. In: Canadian Workshop on Info. Theory, pp. 77–80 (1997)
Google Scholar
Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: Annual Conference on Computational Learning Theory, pp. 125–133 (1999)
Google Scholar
Lebanon, G., Lafferty, J.: Boosting and maximum likelihood for exponential models. Advances in Neural Information Processing Systems (NIPS) 14 (2002)
Google Scholar
Luenberger, D.: Optimization by Vector Space Methods. John Wiley & Sons, Chichester (1969)
MATH Google Scholar
Vapnik, V.: The Natural of Statistical Learning Theory. Springer, Heidelberg (2000)
Google Scholar
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. To appear in Annals of Statistics (2004)
Google Scholar
Wang, S., Schuurmans, D., Zhao, Y.: The latent maximum entropy principle (2002) (manuscript)
Google Scholar
Wang, S., Schuurmans, D., Ghodsi, A., Rosenthal, J.: Unsupervised Boosting with the Latent Maximum Entropy Principle (2003) (manuscript)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Toronto, Canada
Shaojun Wang
School of Computer Science, University of Waterloo, Canada
Dale Schuurmans

Authors

Shaojun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavaldá
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, 060-8628, Sapporo, Japan
Klaus P. Jantke
,
Eiji Takimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Schuurmans, D. (2003). Learning Continuous Latent Variable Models with Bregman Divergences. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-39624-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20291-2
Online ISBN: 978-3-540-39624-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics