Maximum Entropy Distribution Estimation with Generalized Regularization

Dudík, Miroslav; Schapire, Robert E.

doi:10.1007/11776420_12

Miroslav Dudík²⁰ &
Robert E. Schapire²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

International Conference on Computational Learning Theory

2849 Accesses
12 Citations

Abstract

We present a unified and complete account of maximum entropy distribution estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special cases, we can easily derive performance guarantees for many known regularization types, including ℓ₁, ℓ₂, \(\ell_{\rm 2}^{\rm 2}\) and ℓ₁ + \(\ell_{\rm 2}^{\rm 2}\) style regularization. Furthermore, our general approach enables us to use information about the structure of the feature space or about sample selection bias to derive entirely new regularization functions with superior guarantees. We propose an algorithm solving a large and general subclass of generalized maxent problems, including all discussed in the paper, and prove its convergence. Our approach generalizes techniques based on information geometry and Bregman divergences as well as those based more directly on compactness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Article MathSciNet Google Scholar
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Della Pietra, S., Della Pietra, V., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 1–13 (1997)
Article Google Scholar
Phillips, S.J., Dudík, M., Schapire, R.E.: A ME approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Lau, R.: Adaptive statistical language modeling. Master’s thesis, MIT Department of Electrical Engineering and Computer Science (1994)
Google Scholar
Chen, S.F., Rosenfeld, R.: A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)
Article Google Scholar
Lebanon, G., Lafferty, J.: Boosting and maximum likelihood for exponential models. Technical Report CMU-CS-01-144, CMU School of Computer Science (2001)
Google Scholar
Zhang, T.: Class-size independent generalization analysis of some discriminative multi-category classification. Advances in Neural Information Processing Systems 17 (2005)
Google Scholar
Goodman, J.: Exponential priors for maximum entropy models. In: Conference of the North American Chapter of the Association for Computational Linguistics (2004)
Google Scholar
Kazama, J., Tsujii, J.: Evaluation and extension of ME models with inequality constraints. In: Conference on Empirical Methods in Natural Language Processing, pp. 137–144 (2003)
Google Scholar
Dudík, M., Phillips, S.J., Schapire, R.E.: Performance guarantees for regularized maximum entropy density estimation. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 472–486. Springer, Heidelberg (2004)
Chapter Google Scholar
Altun, Y., Smola, A.J.: Unifying divergence minimization and statistical inference via convex duality. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 139–153. Springer, Heidelberg (2006)
Chapter Google Scholar
Dudík, M., Schapire, R.E., Phillips, S.J.: Correcting sample selection bias in ME density estimation. Advances in Neural Information Processing Systems 18 (2006)
Google Scholar
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1), 253–285 (2002)
Article MATH Google Scholar
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Article MathSciNet MATH Google Scholar
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 49–55 (2002)
Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 957–968 (2005)
Article Google Scholar
Ng, A.Y.: Feature selection, L ₁ vs. L ₂ regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Newman, W.: Extension to the ME method. IEEE Trans. on Inf. Th. IT-23(1), 89–93 (1977)
Article Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, 08540
Miroslav Dudík & Robert E. Schapire

Authors

Miroslav Dudík
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Schapire
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA and Department of Economics, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dudík, M., Schapire, R.E. (2006). Maximum Entropy Distribution Estimation with Generalized Regularization. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_12

Download citation

DOI: https://doi.org/10.1007/11776420_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics