Abstract
Penalized logistic regression (PLR) is a widely used supervised learning model. In this paper, we consider its applications in large-scale data problems and resort to a stochastic primal-dual approach for solving PLR. In particular, we employ a random sampling technique in the primal step and a multiplicative weights method in the dual step. This technique leads to an optimization method with sublinear dependency on both the volume and dimensionality of training data. We develop concrete algorithms for PLR with ℓ2-norm and ℓ1-norm penalties, respectively. Experimental results over several large-scale and high-dimensional datasets demonstrate both efficiency and accuracy of our algorithms.
Chapter PDF
Similar content being viewed by others
Keywords
- Logistic Regression
- Test Error
- Neural Information Processing System
- Stochastic Gradient Descent
- Machine Learn Research
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta algorithm and applications (2005), Preliminary draft of paper available online at http://www.cs.princeton.edu/~arora/pubs/MWsurvey.pdf (manuscript)
Balakrishnan, S., Madigan, D.: Algorithms for sparse linear classifiers in the massive data setting. The Journal of Machine Learning Research 9, 313–337 (2008)
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. In: Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 449–457. IEEE Computer Society (2010)
Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. Arxiv preprint arXiv:1204.0566 (2012)
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Advances in Neural Information Processing Systems (2011)
Genkin, A., Lewis, D.D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17, pp. 545–552 (2004)
Hastie, T., Tishirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
Hazan, E., Koren, T.: Optimal algorithms for ridge and lasso regression with partially observed attributes. Arxiv preprint arXiv:1108.4559 (2011)
Hazan, E., Koren, T., Srebro, N.: Beating sgd: Learning svms in sublinear time. In: Advances in Neural Information Processing Systems (2011)
Hogan, C., Cassell, L., Foglesong, J., Kordas, J., Nemanic, M., Richmond, G.: The livermore distributed storage system: Requirements and overview. In: Tenth IEEE Symposium on Mass Storage Systems Digest of Papers, pp. 6–17. IEEE (1990)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145. Lawrence Erlbaum Associates Ltd. (1995)
Panda, D.K.: Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms. In: IPPS: 9th International Parallel Processing Symposium, pp. 652–659. IEEE Computer Society Press (1995)
Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large scale l1-regularized logistic regression. Journal of Machine Learning Research 1, 8888 (2008)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Tsumoto, S.: Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences 162(2), 65–80 (2004)
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research 11, 2543–2596 (2010)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 116. ACM (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peng, H., Wang, Z., Chang, E.Y., Zhou, S., Zhang, Z. (2012). Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-33460-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33459-7
Online ISBN: 978-3-642-33460-3
eBook Packages: Computer ScienceComputer Science (R0)