Improved Sublinear Primal-Dual Algorithm for Support Vector Machines

Gu, Ming; Liao, Shizhong

doi:10.1007/978-3-319-99247-1_30

Improved Sublinear Primal-Dual Algorithm for Support Vector Machines

Ming Gu¹⁶ &
Shizhong Liao¹⁶

Conference paper
First Online: 11 August 2018

1379 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11062))

Abstract

Sublinear primal-dual algorithm (SUPDA) is a well established sublinear time algorithm. However, SUPDA performs the primal step in every iteration, which is unnecessary since the overall regret of SUPDA is dominated by the dual step. To improve the efficiency of SUPDA, we propose an improved SUPDA (ISUPDA), and apply ISUPDA to linear support vector machines, which yields an improved sublinear primal-dual algorithm for linear support vector machines (ISUPDA-SVM). Specifically, different from SUPDA that conducts the primal step in every iteration, ISUPDA executes the primal step with a probability at each iteration, which can reduce the time complexity of SUPDA. We prove that the expected regret of ISUPDA is still dominated by the dual step and hence ISUPDA guarantees the convergence. We further convert linear support vector machines into saddle-point forms in order to apply ISUPDA to linear support vector machines, and provide the theoretical guarantee of the quality of solution and efficiency for ISUPDA-SVM. Comparison experiments on multiple datasets demonstrate that ISUPDA outperforms SUPDA and that ISUPDA-SVM is an efficient algorithm for linear support vector machines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Article MathSciNet Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Article Google Scholar
Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9(3), 1369–1398 (2008)
MathSciNet MATH Google Scholar
Cherkassky, V.: The nature of statistical learning theory. IEEE Trans. Neural Netw. Learn. Syst. 8(6), 1–30 (1997)
Google Scholar
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23:1–23:49 (2012)
Article MathSciNet Google Scholar
Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 943–950 (2012)
Google Scholar
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1080–1088 (2011)
Google Scholar
Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69(2), 169–192 (2007)
Article Google Scholar
Hazan, E., Koren, T.: Linear regression with limited observation. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1865–1872 (2012)
Google Scholar
Hazan, E., Koren, T., Srebro, N.: Beating SGD: learning svms in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1233–1241 (2011)
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1137–1145 (1995)
Google Scholar
Peng, H., Wang, Z., Chang, E.Y., Zhou, S., Zhang, Z.: Sublinear algorithms for penalized logistic regression in massive datasets. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 553–568. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33460-3_41
Chapter Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 807–814 (2007)
Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)
MathSciNet MATH Google Scholar
Slavakis, K., Kim, S., Mateos, G., Giannakis, G.B.: Stochastic approximation vis-a-vis online learning for big data analytics. IEEE Sig. Process. Mag. 31(6), 124–129 (2014)
Article Google Scholar
Wang, W., Peng, Z., Liu, Z., Zhu, T., Hong, X.: Learning the influence probabilities based on multipolar factors in social network. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 512–524. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25159-2_46
Chapter Google Scholar
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11(1), 2543–2596 (2010)
MathSciNet MATH Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 9–16 (2004)
Google Scholar

Download references

Acknowledgments

The work was supported in part by the National Natural Science Foundation of China under grant No. 61673293.

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, 300350, China
Ming Gu & Shizhong Liao

Authors

Ming Gu
View author publications
You can also search for this author in PubMed Google Scholar
Shizhong Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shizhong Liao .

Editor information

Editors and Affiliations

University of Bristol, Bristol, United Kingdom
Weiru Liu
Università di Trento, Povo, Italy
Fausto Giunchiglia
Jilin University, Changchun, China
Bo Yang

Appendix

1.1 Proof of Lemma 3

Proof

We first give a lemma from [8] used in the proof.

Lemma 5

Consider functions $f_1,\ldots ,f_T$, the norm of their gradients is bounded by $G\ge \max _t\max _{\varvec{\omega }\in \mathbb {R}^d}\Vert \nabla f_t(\varvec{\omega })\Vert $. Let $\varvec{\omega }_{t+1}\leftarrow \arg \max _{\varvec{\omega }\in \mathbb {R}^d}\sum _{\tau =1}^{t}f_{\tau }(\varvec{\omega }).$ Then

$$ \max _{\varvec{\omega }\in \mathbb {R}^d}\sum _{t=1}^{T}f_t(\varvec{\omega })-\sum _{t=1}^{T}f_t(\varvec{\omega }_t)\le 2G\log T. $$

Consider the sequence of functions $\widehat{f}_t$, let $\widehat{f}_t=\frac{f_t}{\alpha }$ with probability $\alpha $, and with probability $1-\alpha $, $\widehat{f}_t=\varvec{0}$, where $\varvec{0}$ denotes the all-zero function. Then, their gradients are bounded by $\frac{G}{a}$. Applying Lemma 5 on $\widehat{f}_t$ we have

$$\mathbb {E}\left[ \sum _{t=1}^{T}f_t(\varvec{\omega }^*)-\sum _{t=1}^{T}f_t(y_t)\right] =\mathbb {E}\left[ \sum _{t=1}^{T}\widehat{f}_t(\varvec{\omega }^*)-\sum _{t=1}^{T}\widehat{f}_t(\varvec{\omega }_t)\right] \le \frac{2G}{\alpha }\log T,$$

which ends the proof.

1.2 Proof of Theorem 1

Proof

We first give some basic lemmas from [12] used in the proof.

Lemma 6

For any $\sqrt{\frac{\log (n)}{T}}\le \eta \le \frac{1}{6}$, the following two formulas hold with probability al least $1-O(\frac{1}{n})$

$$\begin{aligned} \max _{i\in [n]}\sum _{t\in [T]}v_t(i)-\sum _{t\in [T]}\left[ \varvec{x}_i^{T }\varvec{\omega }_t+\xi _t(i)\right] \le 4\eta T,\\ \left| \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t-\sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right| \le 4\eta T. \end{aligned}$$

Lemma 7

With probability al least $\frac{3}{4}$,

$$\sum _{t=1}^{T}\varvec{p}^{\mathrm {T}}_t\varvec{v}^2_t\le 48T.$$

Let $\varvec{\omega }^*,\xi ^*$ be the optimal solutions of problem (7). According to Lemma 3, we have $G=1$ for the optimization objective function of problem (7), so we have

$$ \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }^*+\varvec{\xi }^*)\right] -\frac{2}{\alpha }\log T. $$

The expectation is taken for updating $\varvec{\omega }_t$, denoted as $c_t$. Let $\gamma ^*$ be the optimal value of problem (7), we have

$$ \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }^*+\varvec{\xi }^*)\right] \ge T\gamma ^*.$$

So we obtain

$$\begin{aligned} \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge T\gamma ^*-\frac{2}{\alpha }\log T. \end{aligned}$$

(8)

Applying Lemma 2 to the clipped vector $\varvec{v}_t$ we have

$$ \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t\le \min _{i\in [n]}\sum _{t\in [T]}v_t(i)+\frac{\log n}{\eta }+\eta \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t^2. $$

From Lemma 6, with probability at least $1-O(1/n)$ we obtain

$$ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\le \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)+\frac{\log n}{\eta }+\eta \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t^2+8\eta T. $$

According to Lemma 7, with probability $\frac{3}{4}-O(1/n)\ge \frac{1}{2}$,

$$ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\le \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)+\frac{\log n}{\eta }+56\eta T. $$

Taking expectation with respect to $c_t$, with probability al least $\frac{1}{2}$ we obtain

$$\begin{aligned} \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \le \mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] +\frac{\log n}{\eta }+56\eta T. \end{aligned}$$

(9)

Combining (8),(9) and dividing by T, with probability al least $\frac{1}{2}$ we have

$$ \frac{1}{T}\mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge \gamma ^*-\frac{2}{T\alpha }\log T-\frac{\log n}{T\eta }-56\eta , $$

and using our choice for T, $\eta $ and $\alpha $, with probability at least $\frac{1}{2}$,

$$ \mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}(\varvec{X} \bar{\varvec{\omega }}_t+\bar{\varvec{\xi }}_t)\right] \ge \gamma ^*-\varepsilon , $$

which implies that $(\bar{\varvec{\omega }},\bar{\varvec{\xi }})$ forms an expected $\varepsilon $-approximate solution to problem (7).

Next, we analyse the runtime.

For the objective function of problem (7), we have $G=1$ in Lemma 3, and combining the time complexity of ISUPDA presented in (4), we can obtain the runtime of ISUPDA-SVM

$$O\left( \varepsilon ^{-2}n\log n+\varepsilon ^{-2}d\sqrt{\frac{\log n}{T}}\log T\right) .$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, M., Liao, S. (2018). Improved Sublinear Primal-Dual Algorithm for Support Vector Machines. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-99247-1_30
Published: 11 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99246-4
Online ISBN: 978-3-319-99247-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Lemma 3

Proof

Lemma 5

1.2 Proof of Theorem 1

Proof

Lemma 6

Lemma 7

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation