Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

Peng, Haoruo; Wang, Zhengyu; Chang, Edward Y.; Zhou, Shuchang; Zhang, Zhihua

doi:10.1007/978-3-642-33460-3_41

Haoruo Peng^20,21,
Zhengyu Wang^20,22,
Edward Y. Chang²⁰,
Shuchang Zhou²⁰ &
…
Zhihua Zhang^20,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7523))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4733 Accesses
4 Citations

Abstract

Penalized logistic regression (PLR) is a widely used supervised learning model. In this paper, we consider its applications in large-scale data problems and resort to a stochastic primal-dual approach for solving PLR. In particular, we employ a random sampling technique in the primal step and a multiplicative weights method in the dual step. This technique leads to an optimization method with sublinear dependency on both the volume and dimensionality of training data. We develop concrete algorithms for PLR with ℓ₂-norm and ℓ₁-norm penalties, respectively. Experimental results over several large-scale and high-dimensional datasets demonstrate both efficiency and accuracy of our algorithms.

Download to read the full chapter text

Chapter PDF

Penalized robust estimators in sparse logistic regression

Article 12 November 2021

Ana M. Bianco, Graciela Boente & Gonzalo Chebi

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Article 24 December 2020

Huamei Huang, Yujing Gao, … Bo Li

Stochastic DCA for Sparse Multiclass Logistic Regression

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta algorithm and applications (2005), Preliminary draft of paper available online at http://www.cs.princeton.edu/~arora/pubs/MWsurvey.pdf (manuscript)
Balakrishnan, S., Madigan, D.: Algorithms for sparse linear classifiers in the massive data setting. The Journal of Machine Learning Research 9, 313–337 (2008)
MATH Google Scholar
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. In: Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 449–457. IEEE Computer Society (2010)
Google Scholar
Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. Arxiv preprint arXiv:1204.0566 (2012)
Google Scholar
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)
Article Google Scholar
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
Genkin, A., Lewis, D.D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Article MathSciNet Google Scholar
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17, pp. 545–552 (2004)
Google Scholar
Hastie, T., Tishirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
MATH Google Scholar
Hazan, E., Koren, T.: Optimal algorithms for ridge and lasso regression with partially observed attributes. Arxiv preprint arXiv:1108.4559 (2011)
Google Scholar
Hazan, E., Koren, T., Srebro, N.: Beating sgd: Learning svms in sublinear time. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
Hogan, C., Cassell, L., Foglesong, J., Kordas, J., Nemanic, M., Richmond, G.: The livermore distributed storage system: Requirements and overview. In: Tenth IEEE Symposium on Mass Storage Systems Digest of Papers, pp. 6–17. IEEE (1990)
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145. Lawrence Erlbaum Associates Ltd. (1995)
Google Scholar
Panda, D.K.: Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms. In: IPPS: 9th International Parallel Processing Symposium, pp. 652–659. IEEE Computer Society Press (1995)
Google Scholar
Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large scale l1-regularized logistic regression. Journal of Machine Learning Research 1, 8888 (2008)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Google Scholar
Tsumoto, S.: Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences 162(2), 65–80 (2004)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
MATH Google Scholar
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research 11, 2543–2596 (2010)
MATH Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 116. ACM (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Research Beijing, Beijing, China, 100084
Haoruo Peng, Zhengyu Wang, Edward Y. Chang, Shuchang Zhou & Zhihua Zhang
Department of Computer Science and Technology, Tsinghua University, Beijing, China, 100084
Haoruo Peng
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China, 100084
Zhengyu Wang
College of Computer Science and Technology, Zhejiang University, Zhejiang, China, 310027
Zhihua Zhang

Authors

Haoruo Peng
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Edward Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shuchang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK
Peter A. Flach , Tijl De Bie & Nello Cristianini , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, H., Wang, Z., Chang, E.Y., Zhou, S., Zhang, Z. (2012). Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-33460-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33459-7
Online ISBN: 978-3-642-33460-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

Abstract

Chapter PDF

Similar content being viewed by others

Penalized robust estimators in sparse logistic regression

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Stochastic DCA for Sparse Multiclass Logistic Regression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

Abstract

Chapter PDF

Similar content being viewed by others

Penalized robust estimators in sparse logistic regression

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Stochastic DCA for Sparse Multiclass Logistic Regression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation