Skip to main content

Improved Sublinear Primal-Dual Algorithm for Support Vector Machines

  • Conference paper
  • First Online:
  • 1379 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11062))

Abstract

Sublinear primal-dual algorithm (SUPDA) is a well established sublinear time algorithm. However, SUPDA performs the primal step in every iteration, which is unnecessary since the overall regret of SUPDA is dominated by the dual step. To improve the efficiency of SUPDA, we propose an improved SUPDA (ISUPDA), and apply ISUPDA to linear support vector machines, which yields an improved sublinear primal-dual algorithm for linear support vector machines (ISUPDA-SVM). Specifically, different from SUPDA that conducts the primal step in every iteration, ISUPDA executes the primal step with a probability at each iteration, which can reduce the time complexity of SUPDA. We prove that the expected regret of ISUPDA is still dominated by the dual step and hence ISUPDA guarantees the convergence. We further convert linear support vector machines into saddle-point forms in order to apply ISUPDA to linear support vector machines, and provide the theoretical guarantee of the quality of solution and efficiency for ISUPDA-SVM. Comparison experiments on multiple datasets demonstrate that ISUPDA outperforms SUPDA and that ISUPDA-SVM is an efficient algorithm for linear support vector machines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)

    Article  MathSciNet  Google Scholar 

  2. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  3. Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9(3), 1369–1398 (2008)

    MathSciNet  MATH  Google Scholar 

  4. Cherkassky, V.: The nature of statistical learning theory. IEEE Trans. Neural Netw. Learn. Syst. 8(6), 1–30 (1997)

    Google Scholar 

  5. Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23:1–23:49 (2012)

    Article  MathSciNet  Google Scholar 

  6. Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 943–950 (2012)

    Google Scholar 

  7. Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1080–1088 (2011)

    Google Scholar 

  8. Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69(2), 169–192 (2007)

    Article  Google Scholar 

  9. Hazan, E., Koren, T.: Linear regression with limited observation. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1865–1872 (2012)

    Google Scholar 

  10. Hazan, E., Koren, T., Srebro, N.: Beating SGD: learning svms in sublinear time. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1233–1241 (2011)

    Google Scholar 

  11. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1137–1145 (1995)

    Google Scholar 

  12. Peng, H., Wang, Z., Chang, E.Y., Zhou, S., Zhang, Z.: Sublinear algorithms for penalized logistic regression in massive datasets. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 553–568. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33460-3_41

    Chapter  Google Scholar 

  13. Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 807–814 (2007)

    Google Scholar 

  14. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  15. Slavakis, K., Kim, S., Mateos, G., Giannakis, G.B.: Stochastic approximation vis-a-vis online learning for big data analytics. IEEE Sig. Process. Mag. 31(6), 124–129 (2014)

    Article  Google Scholar 

  16. Wang, W., Peng, Z., Liu, Z., Zhu, T., Hong, X.: Learning the influence probabilities based on multipolar factors in social network. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 512–524. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25159-2_46

    Chapter  Google Scholar 

  17. Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11(1), 2543–2596 (2010)

    MathSciNet  MATH  Google Scholar 

  18. Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 9–16 (2004)

    Google Scholar 

Download references

Acknowledgments

The work was supported in part by the National Natural Science Foundation of China under grant No. 61673293.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shizhong Liao .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Lemma 3

Proof

We first give a lemma from [8] used in the proof.

Lemma 5

Consider functions \(f_1,\ldots ,f_T\), the norm of their gradients is bounded by \(G\ge \max _t\max _{\varvec{\omega }\in \mathbb {R}^d}\Vert \nabla f_t(\varvec{\omega })\Vert \). Let \(\varvec{\omega }_{t+1}\leftarrow \arg \max _{\varvec{\omega }\in \mathbb {R}^d}\sum _{\tau =1}^{t}f_{\tau }(\varvec{\omega }).\) Then

$$ \max _{\varvec{\omega }\in \mathbb {R}^d}\sum _{t=1}^{T}f_t(\varvec{\omega })-\sum _{t=1}^{T}f_t(\varvec{\omega }_t)\le 2G\log T. $$

Consider the sequence of functions \(\widehat{f}_t\), let \(\widehat{f}_t=\frac{f_t}{\alpha }\) with probability \(\alpha \), and with probability \(1-\alpha \), \(\widehat{f}_t=\varvec{0}\), where \(\varvec{0}\) denotes the all-zero function. Then, their gradients are bounded by \(\frac{G}{a}\). Applying Lemma 5 on \(\widehat{f}_t\) we have

$$\mathbb {E}\left[ \sum _{t=1}^{T}f_t(\varvec{\omega }^*)-\sum _{t=1}^{T}f_t(y_t)\right] =\mathbb {E}\left[ \sum _{t=1}^{T}\widehat{f}_t(\varvec{\omega }^*)-\sum _{t=1}^{T}\widehat{f}_t(\varvec{\omega }_t)\right] \le \frac{2G}{\alpha }\log T,$$

which ends the proof.

1.2 Proof of Theorem 1

Proof

We first give some basic lemmas from [12] used in the proof.

Lemma 6

For any \(\sqrt{\frac{\log (n)}{T}}\le \eta \le \frac{1}{6}\), the following two formulas hold with probability al least \(1-O(\frac{1}{n})\)

$$\begin{aligned} \max _{i\in [n]}\sum _{t\in [T]}v_t(i)-\sum _{t\in [T]}\left[ \varvec{x}_i^{T }\varvec{\omega }_t+\xi _t(i)\right] \le 4\eta T,\\ \left| \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t-\sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right| \le 4\eta T. \end{aligned}$$

Lemma 7

With probability al least \(\frac{3}{4}\),

$$\sum _{t=1}^{T}\varvec{p}^{\mathrm {T}}_t\varvec{v}^2_t\le 48T.$$

Let \(\varvec{\omega }^*,\xi ^*\) be the optimal solutions of problem (7). According to Lemma 3, we have \(G=1\) for the optimization objective function of problem (7), so we have

$$ \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }^*+\varvec{\xi }^*)\right] -\frac{2}{\alpha }\log T. $$

The expectation is taken for updating \(\varvec{\omega }_t\), denoted as \(c_t\). Let \(\gamma ^*\) be the optimal value of problem (7), we have

$$ \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }^*+\varvec{\xi }^*)\right] \ge T\gamma ^*.$$

So we obtain

$$\begin{aligned} \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge T\gamma ^*-\frac{2}{\alpha }\log T. \end{aligned}$$
(8)

Applying Lemma 2 to the clipped vector \(\varvec{v}_t\) we have

$$ \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t\le \min _{i\in [n]}\sum _{t\in [T]}v_t(i)+\frac{\log n}{\eta }+\eta \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t^2. $$

From Lemma 6, with probability at least \(1-O(1/n)\) we obtain

$$ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\le \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)+\frac{\log n}{\eta }+\eta \sum _{t\in [T]}\varvec{p}_t^{T }\varvec{v}_t^2+8\eta T. $$

According to Lemma 7, with probability \(\frac{3}{4}-O(1/n)\ge \frac{1}{2}\),

$$ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\le \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)+\frac{\log n}{\eta }+56\eta T. $$

Taking expectation with respect to \(c_t\), with probability al least \(\frac{1}{2}\) we obtain

$$\begin{aligned} \mathbb {E}_{\{c_t\}}\left[ \sum _{t\in [T]}\varvec{p}_t^{T }(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \le \mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] +\frac{\log n}{\eta }+56\eta T. \end{aligned}$$
(9)

Combining (8),(9) and dividing by T, with probability al least \(\frac{1}{2}\) we have

$$ \frac{1}{T}\mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}\sum _{t\in [T]}(\varvec{X} \varvec{\omega }_t+\varvec{\xi }_t)\right] \ge \gamma ^*-\frac{2}{T\alpha }\log T-\frac{\log n}{T\eta }-56\eta , $$

and using our choice for T, \(\eta \) and \(\alpha \), with probability at least \(\frac{1}{2}\),

$$ \mathbb {E}_{\{c_t\}}\left[ \min _{i\in [n]}(\varvec{X} \bar{\varvec{\omega }}_t+\bar{\varvec{\xi }}_t)\right] \ge \gamma ^*-\varepsilon , $$

which implies that \((\bar{\varvec{\omega }},\bar{\varvec{\xi }})\) forms an expected \(\varepsilon \)-approximate solution to problem (7).

Next, we analyse the runtime.

For the objective function of problem (7), we have \(G=1\) in Lemma 3, and combining the time complexity of ISUPDA presented in (4), we can obtain the runtime of ISUPDA-SVM

$$O\left( \varepsilon ^{-2}n\log n+\varepsilon ^{-2}d\sqrt{\frac{\log n}{T}}\log T\right) .$$

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gu, M., Liao, S. (2018). Improved Sublinear Primal-Dual Algorithm for Support Vector Machines. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99247-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99246-4

  • Online ISBN: 978-3-319-99247-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics