Abstract
Sublinear primal-dual algorithm (SUPDA) is a well established sublinear time algorithm. However, SUPDA performs the primal step in every iteration, which is unnecessary since the overall regret of SUPDA is dominated by the dual step. To improve the efficiency of SUPDA, we propose an improved SUPDA (ISUPDA), and apply ISUPDA to linear support vector machines, which yields an improved sublinear primal-dual algorithm for linear support vector machines (ISUPDA-SVM). Specifically, different from SUPDA that conducts the primal step in every iteration, ISUPDA executes the primal step with a probability at each iteration, which can reduce the time complexity of SUPDA. We prove that the expected regret of ISUPDA is still dominated by the dual step and hence ISUPDA guarantees the convergence. We further convert linear support vector machines into saddle-point forms in order to apply ISUPDA to linear support vector machines, and provide the theoretical guarantee of the quality of solution and efficiency for ISUPDA-SVM. Comparison experiments on multiple datasets demonstrate that ISUPDA outperforms SUPDA and that ISUPDA-SVM is an efficient algorithm for linear support vector machines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9(3), 1369–1398 (2008)
Cherkassky, V.: The nature of statistical learning theory. IEEE Trans. Neural Netw. Learn. Syst. 8(6), 1–30 (1997)
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23:1–23:49 (2012)
Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 943–950 (2012)
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1080–1088 (2011)
Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69(2), 169–192 (2007)
Hazan, E., Koren, T.: Linear regression with limited observation. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1865–1872 (2012)
Hazan, E., Koren, T., Srebro, N.: Beating SGD: learning svms in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1233–1241 (2011)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1137–1145 (1995)
Peng, H., Wang, Z., Chang, E.Y., Zhou, S., Zhang, Z.: Sublinear algorithms for penalized logistic regression in massive datasets. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 553–568. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33460-3_41
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 807–814 (2007)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)
Slavakis, K., Kim, S., Mateos, G., Giannakis, G.B.: Stochastic approximation vis-a-vis online learning for big data analytics. IEEE Sig. Process. Mag. 31(6), 124–129 (2014)
Wang, W., Peng, Z., Liu, Z., Zhu, T., Hong, X.: Learning the influence probabilities based on multipolar factors in social network. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 512–524. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25159-2_46
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11(1), 2543–2596 (2010)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 9–16 (2004)
Acknowledgments
The work was supported in part by the National Natural Science Foundation of China under grant No. 61673293.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Proof of Lemma 3
Proof
We first give a lemma from [8] used in the proof.
Lemma 5
Consider functions \(f_1,\ldots ,f_T\), the norm of their gradients is bounded by \(G\ge \max _t\max _{\varvec{\omega }\in \mathbb {R}^d}\Vert \nabla f_t(\varvec{\omega })\Vert \). Let \(\varvec{\omega }_{t+1}\leftarrow \arg \max _{\varvec{\omega }\in \mathbb {R}^d}\sum _{\tau =1}^{t}f_{\tau }(\varvec{\omega }).\) Then
Consider the sequence of functions \(\widehat{f}_t\), let \(\widehat{f}_t=\frac{f_t}{\alpha }\) with probability \(\alpha \), and with probability \(1-\alpha \), \(\widehat{f}_t=\varvec{0}\), where \(\varvec{0}\) denotes the all-zero function. Then, their gradients are bounded by \(\frac{G}{a}\). Applying Lemma 5 on \(\widehat{f}_t\) we have
which ends the proof.
1.2 Proof of Theorem 1
Proof
We first give some basic lemmas from [12] used in the proof.
Lemma 6
For any \(\sqrt{\frac{\log (n)}{T}}\le \eta \le \frac{1}{6}\), the following two formulas hold with probability al least \(1-O(\frac{1}{n})\)
Lemma 7
With probability al least \(\frac{3}{4}\),
Let \(\varvec{\omega }^*,\xi ^*\) be the optimal solutions of problem (7). According to Lemma 3, we have \(G=1\) for the optimization objective function of problem (7), so we have
The expectation is taken for updating \(\varvec{\omega }_t\), denoted as \(c_t\). Let \(\gamma ^*\) be the optimal value of problem (7), we have
So we obtain
Applying Lemma 2 to the clipped vector \(\varvec{v}_t\) we have
From Lemma 6, with probability at least \(1-O(1/n)\) we obtain
According to Lemma 7, with probability \(\frac{3}{4}-O(1/n)\ge \frac{1}{2}\),
Taking expectation with respect to \(c_t\), with probability al least \(\frac{1}{2}\) we obtain
Combining (8),(9) and dividing by T, with probability al least \(\frac{1}{2}\) we have
and using our choice for T, \(\eta \) and \(\alpha \), with probability at least \(\frac{1}{2}\),
which implies that \((\bar{\varvec{\omega }},\bar{\varvec{\xi }})\) forms an expected \(\varepsilon \)-approximate solution to problem (7).
Next, we analyse the runtime.
For the objective function of problem (7), we have \(G=1\) in Lemma 3, and combining the time complexity of ISUPDA presented in (4), we can obtain the runtime of ISUPDA-SVM
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Gu, M., Liao, S. (2018). Improved Sublinear Primal-Dual Algorithm for Support Vector Machines. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-99247-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99246-4
Online ISBN: 978-3-319-99247-1
eBook Packages: Computer ScienceComputer Science (R0)