Skip to main content

Learning Continuous Latent Variable Models with Bregman Divergences

  • Conference paper
Algorithmic Learning Theory (ALT 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2842))

Included in the following conference series:

Abstract

We present a class of unsupervised statistical learning algorithms that are formulated in terms of minimizing Bregman divergences— a family of generalized entropy measures defined by convex functions. We obtain novel training algorithms that extract hidden latent structure by minimizing a Bregman divergence on training data, subject to a set of non-linear constraints which consider hidden variables. An alternating minimization procedure with nested iterative scaling is proposed to find feasible solutions for the resulting constrained optimization problem. The convergence of this algorithm along with its information geometric properties are characterized.

Index Terms — statistical machine learning, unsupervised learning, Bregman divergence, information geometry, alternating minimization, forward projection, backward projection, iterative scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauschke, H., Borwein, J.: Joint and Separate Convexity of the Bregman Distance. In: Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, pp. 23–36. Elsevier, Amsterdam (2001)

    Google Scholar 

  2. Borwein, J., Lewis, A.: Duality relationships for entropy-like minimization problems. SIAM J. Control Optim. 29(2), 325–338 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  3. Borwein, J., Lewis, A.: Convex Analysis and Nolinear Optimization. Springer, Heidelberg (2000)

    Google Scholar 

  4. Bregman, L.: The relaxation method of finding the common point of convex sets and its applications to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967)

    Article  Google Scholar 

  5. Buja, A., Stuetzle, W.: Degrees of Boosting (2002) (manuscript)

    Google Scholar 

  6. Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections with applications to split feasibility and Kullback-Leibler distance minimization. Annals of Operations Research 105, 77–98 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  7. Censor, Y., Zenios, S.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, Oxford (1997)

    MATH  Google Scholar 

  8. Collins, M., Schapire, R., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1–3), 253–285 (2002)

    Article  MATH  Google Scholar 

  9. Csiszar, I., Tusnady, G.: Information geometry and alternating minimization procedures. Statistics and Decisions 1, 205–237 (1984)

    MathSciNet  Google Scholar 

  10. Csiszar, I.: Why least squares and maximum entropy? The Annals of Statisics 19(4), 2032–2066 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  11. Csiszar, I.: Generalized projections for non-negative functions. Acta Mathematica Hungarica 68(1–2), 161–185 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  12. Csiszar, I.: Maxent, mathematics, and information theory. In: Hanson, K., Silver, R. (eds.) Maximum Entropy and Bayesian Methods, pp. 35–50. Kluwer, Dordrecht (1996)

    Google Scholar 

  13. Della Pietra, S., Della Pietra, V., Lafferty, J.: Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109, CMU (2001)

    Google Scholar 

  14. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood estimation from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  15. Eggermont, P., LaRiccia, V.: On EM-like algorithms for minimum distance estimation. Technical Report, Mathematical Sciences, University of Delaware (1998)

    Google Scholar 

  16. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics 28(2), 337–407 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  17. Johnson, R., Shore, J.: Which is the better entropy expression for speech processing: -S log S or log S? IEEE Transactions on Acoustics, Speech, and Signal Processing 32(1), 129–137 (1984)

    Article  Google Scholar 

  18. Lafferty, J., Della Pietra, S., Della Pietra, V.: Statistical learning algorithms based on Bregman distances. In: Canadian Workshop on Info. Theory, pp. 77–80 (1997)

    Google Scholar 

  19. Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: Annual Conference on Computational Learning Theory, pp. 125–133 (1999)

    Google Scholar 

  20. Lebanon, G., Lafferty, J.: Boosting and maximum likelihood for exponential models. Advances in Neural Information Processing Systems (NIPS) 14 (2002)

    Google Scholar 

  21. Luenberger, D.: Optimization by Vector Space Methods. John Wiley & Sons, Chichester (1969)

    MATH  Google Scholar 

  22. Vapnik, V.: The Natural of Statistical Learning Theory. Springer, Heidelberg (2000)

    Google Scholar 

  23. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. To appear in Annals of Statistics (2004)

    Google Scholar 

  24. Wang, S., Schuurmans, D., Zhao, Y.: The latent maximum entropy principle (2002) (manuscript)

    Google Scholar 

  25. Wang, S., Schuurmans, D., Ghodsi, A., Rosenthal, J.: Unsupervised Boosting with the Latent Maximum Entropy Principle (2003) (manuscript)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, S., Schuurmans, D. (2003). Learning Continuous Latent Variable Models with Bregman Divergences. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39624-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20291-2

  • Online ISBN: 978-3-540-39624-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics