Abstract
L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.
Chapter PDF
References
Chen, C., Mangasarian, O.L.: A class of smoothing functions for nonlinear and mixed complementarity problems. Comput. Optim. Appl 5(2), 97–138 (1996)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)
Efron, B., Johnstone, I., Hastie, T., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Figueiredo, M.: Adapative sparseness for supervised learning. IEEE. Trans. Pattern. Anal. Mach. Intell. 25(9), 1150–1159 (2003)
Freund, R.M., Mizuno, S.: Interior point methods: Current status and future directions. Optima 51, 1–9 (1996)
Fu, W.: Penalized regressions: The bridge versus the LASSO. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
Gafni, E., Bertsekas, D.: Two-metric projection methods for constrained optimization. SIAM J. Contr. Optim. 22(6), 936–964 (1984)
Garcia Palomares, U.M., Mangasarian, O.L.: Superlinearly convergent Quasi–Newton algorithms for nonlinearly constrained optimization problems. Math. Program. 11, 1–13 (1976)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV (2003)
Lee, S.-I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: AAAI (2006)
Lee, Y.-J., Mangasarian, O.L.: SSVM: A smooth support vector machine. Comput. Optim. Appl. 20, 5–22 (2001)
Ng, A.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, pp. 78–85. ACM Press, New York (2004)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
Perkins, S., Lacker, K., Theiler, J.: Grafting: Fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)
Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288
Weston, J., Elisseeff, A., Scholkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Zhao, P., Yu, B.: On model selection consistency of LASSO. J. Mach. Learn. Res. 7, 2541–2567 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidt, M., Fung, G., Rosales, R. (2007). Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)