Fast Projections onto ℓ1,q -Norm Balls for Grouped Feature Selection

Sra, Suvrit

doi:10.1007/978-3-642-23808-6_20

Suvrit Sra²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6913))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5818 Accesses
8 Citations

Abstract

Joint sparsity is widely acknowledged as a powerful structural cue for performing feature selection in setups where variables are expected to demonstrate “grouped” behavior. Such grouped behavior is commonly modeled by Group-Lasso or Multitask Lasso-type problems, where feature selection is effected via ℓ_1,q-mixed-norms. Several particular formulations for modeling groupwise sparsity have received substantial attention in the literature; and in some cases, efficient algorithms are also available. Surprisingly, for constrained formulations of fundamental importance (e.g., regression with an ℓ_1, ∞-norm constraint), highly scalable methods seem to be missing. We address this deficiency by presenting a method based on spectral projected-gradient (SPG) that can tackle ℓ_1,q-constrained convex regression problems. The most crucial component of our method is an algorithm for projecting onto ℓ_1,q-norm balls. We present several numerical results which show that our methods attain up to 30X speedups on large ℓ_1, ∞-multitask lasso problems. Even more dramatic are the gains for just the ℓ_1, ∞-projection subproblem: we observe almost three orders of magnitude speedups compared against the currently standard method.

Download to read the full chapter text

Chapter PDF

A unified analysis of convex and non-convex $$\ell _p$$ -ball projection problems

Article 04 September 2022

Toward a unified theory of sparse dimensionality reduction in Euclidean space

Article 01 July 2015

Online optimization for max-norm regularization

Article 07 February 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bach, F.: Structured sparsity-inducing norms through submodular functions. In: NIPS (2010)
Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Convex optimization with sparsity-inducing norms. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)
Google Scholar
Bach, F.R.: Consistency of the Group Lasso and Multiple Kernel Learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
MATH MathSciNet Google Scholar
Barzilai, J., Borwein, J.M.: Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis 8(1), 141–148 (1988)
Article MATH MathSciNet Google Scholar
van den Berg, E., Schmidt, M., Friedlander, M.P., Murphy, K.: Group sparsity via linear-time projection. Tech. Rep. TR-2008-09, Univ. British Columbia (June 2008)
Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmon (1999)
MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone Spectral Projected Gradient Methods on Convex Sets. SIAM J. Opt. 10(4), 1196–1211 (2000)
Article MATH MathSciNet Google Scholar
Combettes, P.L., Pesquet, J.: Proximal Splitting Methods in Signal Processing. arXiv:0912.3522v4 (May 2010)
Google Scholar
Dai, Y.H., Fletcher, R.: Projected Barzilai-Borwein Methods for Large-scale Box-constrained Quadratic Programming. Numerische Mathematik 100(1), 21–47 (2005)
Article MATH MathSciNet Google Scholar
Donoho, D.: Denoising by soft-thresholding. IEEE Tran. Inf. Theory 41(3), 613–627 (2002)
Article MATH Google Scholar
Duchi, J., Singer, Y.: Online and Batch Learning using Forward-Backward Splitting. JMLR (September 2009)
Google Scholar
Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)
MATH MathSciNet Google Scholar
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: KDD (2004)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv:1001.0736v1 [math.ST] (January 2010)
Google Scholar
Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal Methods for Sparse Hierarchical Dictionary Learning. In: ICML (2010)
Google Scholar
Kim, D., Sra, S., Dhillon, I.S.: A scalable trust-region algorithm with application to mixed-norm regression. In: Int. Conf. Machine Learning (ICML) (2010)
Google Scholar
Kiwiel, K.: On Linear-Time Algorithms for the Continuous Quadratic Knapsack Problem. Journal of Optimization Theory and Applications 134, 549–554 (2007)
Article MATH MathSciNet Google Scholar
Kowalski, M.: Sparse regression using mixed norms. Applied and Computational Harmonic Analysis 27(3), 303–324 (2009)
Article MATH MathSciNet Google Scholar
Liu, H., Palatucci, M., Zhang, J.: Blockwise Coordinate Descent Procedures for the Multi-task Lasso, with Applications to Neural Semantic Basis Discovery. In: Int. Conf. Machine Learning (June 2009)
Google Scholar
Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University (2009), http://www.public.asu.edu/~jye02/Software/SLEP
Liu, J., Ye, J.: Efficient L1/Lq Norm Regularization. arXiv:1009.4766v1 (2010)
Google Scholar
Liu, J., Ye, J.: Moreau-Yosida Regularization for Grouped Tree Structure Learning. In: NIPS (2010)
Google Scholar
Liu, J., Ye, J.: Efficient Euclidean projections in linear time. In: ICML (June 2009)
Google Scholar
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Network Flow Algorithms for Structured Sparsity. In: NIPS (2010)
Google Scholar
Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of ℝⁿ. J. Optim. Theory Appl. 50(1), 195–200 (1986)
Article MATH MathSciNet Google Scholar
Obonzinski, G., Taskar, B., Jordan, M.: Multi-task feature selection. Tech. rep., UC Berkeley (June 2006)
Google Scholar
Patriksson, M.: A survey on a classic core problem in operations research. Tech. Rep. 2005:33, Chalmers University of Technology and Göteborg University (October 2005)
Google Scholar
Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An Efficient Projection for ℓ_1, ∞ Regularization. In: ICML (2009)
Google Scholar
Rakotomamonjy, A., Flamary, R., Gasso, G., Canu, S.: ℓ_p − ℓ_q penalty for sparse linear and sparse multiple kernel multi-task learning. Tech. Rep. hal-00509608, version 1, INSA-Rouen (2010)
Google Scholar
Rice, U.: Compressive sensing resources (October 2010), http://dsp.rice.edu/cs
Rish, I., Grabarnik, G.: Sparse modeling: ICML 2010 tutorial. Online (June 2010)
Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton Univ. Press, Princeton (1970)
Book MATH Google Scholar
Schmidt, M., van den Berg, E., Friedlander, M., Murphy, K.: Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm. In: AISTATS (2009)
Google Scholar
Similä, T., Tikka, J.: Input selection and shrinkage in multiresponse linear regression. Comp. Stat. & Data Analy. 52(1), 406–422 (2007)
Article MATH MathSciNet Google Scholar
Tropp, J.A.: Algorithms for simultaneous sparse approximation, Part II: Convex relaxation. Signal Proc. 86(3), 589–602 (2006)
Article MATH Google Scholar
Turlach, B.A., Venables, W.N., Wright, S.J.: Simultaneous Variable Selection. Technometrics 27, 349–363 (2005)
Article MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model Selection and Estimation in Regression with Grouped Variables. Tech. Rep. 1095, Univ. of Wisconsin, Dept. of Stat. (2004)
Google Scholar
Zhang, Y., Yeung, D.Y., Xu, Q.: Probabilistic Multi-Task Feature Selection. In: NIPS (2010)
Google Scholar
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

MPI for Intellingent Systems, 72076, Tübingen, Germany
Suvrit Sra

Authors

Suvrit Sra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sra, S. (2011). Fast Projections onto ℓ_1,q-Norm Balls for Grouped Feature Selection. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-23808-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23807-9
Online ISBN: 978-3-642-23808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Projections onto ℓ_1,q-Norm Balls for Grouped Feature Selection

Abstract

Chapter PDF

Similar content being viewed by others

A unified analysis of convex and non-convex $$\ell _p$$ -ball projection problems

Toward a unified theory of sparse dimensionality reduction in Euclidean space

Online optimization for max-norm regularization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast Projections onto ℓ1,q -Norm Balls for Grouped Feature Selection

Abstract

Chapter PDF

Similar content being viewed by others

A unified analysis of convex and non-convex $$\ell _p$$ -ball projection problems

Toward a unified theory of sparse dimensionality reduction in Euclidean space

Online optimization for max-norm regularization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Fast Projections onto ℓ_1,q-Norm Balls for Grouped Feature Selection