Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

Schmidt, Mark; Fung, Glenn; Rosales, Rómer

doi:10.1007/978-3-540-74958-5_28

Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

Mark Schmidt¹,
Glenn Fung² &
Rómer Rosales²

Conference paper

7752 Accesses
101 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Abstract

L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.

Download to read the full chapter text

Chapter PDF

References

Chen, C., Mangasarian, O.L.: A class of smoothing functions for nonlinear and mixed complementarity problems. Comput. Optim. Appl 5(2), 97–138 (1996)
MATH MathSciNet Google Scholar
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)
MATH MathSciNet Google Scholar
Efron, B., Johnstone, I., Hastie, T., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
MATH MathSciNet Google Scholar
Figueiredo, M.: Adapative sparseness for supervised learning. IEEE. Trans. Pattern. Anal. Mach. Intell. 25(9), 1150–1159 (2003)
Article Google Scholar
Freund, R.M., Mizuno, S.: Interior point methods: Current status and future directions. Optima 51, 1–9 (1996)
Google Scholar
Fu, W.: Penalized regressions: The bridge versus the LASSO. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
Article Google Scholar
Gafni, E., Bertsekas, D.: Two-metric projection methods for constrained optimization. SIAM J. Contr. Optim. 22(6), 936–964 (1984)
Article MATH MathSciNet Google Scholar
Garcia Palomares, U.M., Mangasarian, O.L.: Superlinearly convergent Quasi–Newton algorithms for nonlinearly constrained optimization problems. Math. Program. 11, 1–13 (1976)
Article MATH MathSciNet Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Article MATH Google Scholar
Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV (2003)
Google Scholar
Lee, S.-I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: AAAI (2006)
Google Scholar
Lee, Y.-J., Mangasarian, O.L.: SSVM: A smooth support vector machine. Comput. Optim. Appl. 20, 5–22 (2001)
Article MATH MathSciNet Google Scholar
Ng, A.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, pp. 78–85. ACM Press, New York (2004)
Chapter Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
MATH Google Scholar
Perkins, S., Lacker, K., Theiler, J.: Grafting: Fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)
Article MATH MathSciNet Google Scholar
Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288
Google Scholar
Weston, J., Elisseeff, A., Scholkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Article MATH Google Scholar
Zhao, P., Yu, B.: On model selection consistency of LASSO. J. Mach. Learn. Res. 7, 2541–2567 (2007)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science University of British Columbia,
Mark Schmidt
IKM CKS, Siemens Medical Solutions, USA
Glenn Fung & Rómer Rosales

Authors

Mark Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Glenn Fung
View author publications
You can also search for this author in PubMed Google Scholar
Rómer Rosales
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidt, M., Fung, G., Rosales, R. (2007). Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics