Skip to main content

A Cost Based Reweighted Scheme of Principal Support Vector Machine

  • Conference paper
  • First Online:
Book cover Topics in Nonparametric Statistics

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 74))

Abstract

Principal Support Vector Machine (PSVM) is a recently proposed method that uses Support Vector Machines to achieve linear and nonlinear sufficient dimension reduction under a unified framework. In this work, a reweighted scheme is used to improve the performance of the algorithm. We present basic theoretical results and demonstrate the effectiveness of the reweighted algorithm through simulations and real data application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). http://archive.ics.uci.edu/ml

  2. Cook, R.D.: Principal Hessian directions revisited (with discussion). J. Am. Stat. Assoc. 93, 84–100 (1998a)

    Article  MATH  Google Scholar 

  3. Cook, R.D.: Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley, New York (1998b)

    Book  MATH  Google Scholar 

  4. Cook, R.D., Weisberg, S.: Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 316–342 (1991)

    Article  Google Scholar 

  5. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)

    Google Scholar 

  6. Ein-Dor, P., Feldmesser, J.: Attributes of the performance of central processing units: a relative performance prediction model. Commun. ACM 30(4), 308–317 (1987)

    Article  Google Scholar 

  7. Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Stat. 4, 1871–1905 (2009)

    Article  MathSciNet  Google Scholar 

  8. Lee, K.K., Gunn, S.R., Harris, C.J., Reed, P.A.S.: Classification of imbalanced data with transparent kernels. In: Proceedings of International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2410–2415, Washington, D.C. (2001)

    Google Scholar 

  9. Li, K.-C.: Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)

    Article  MATH  Google Scholar 

  10. Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 86, 316–342 (1992)

    Article  Google Scholar 

  11. Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)

    Article  MATH  Google Scholar 

  12. Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  13. Li, B., Artemiou, A., Li, L.: Principal support vector machine for linear and nonlinear sufficient dimension reduction. Ann. Stat. 39, 3182–3210 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  14. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  15. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI ’99), Workshop ML3, Stockholm, pp. 55–60

    Google Scholar 

  16. Weisberg, S.: Dimension reduction regression in R. J. Stat. Softw. 7(1) (2002) (Online)

    Google Scholar 

  17. Wu, H.M.: Kernel sliced inverse regression with applications on classification. J. Comput. Graph. Stat. 17, 590–610 (2008)

    Article  Google Scholar 

  18. Yeh, Y.-R., Huang, S.-Y., Lee, Y.-Y.: Nonlinear dimension reduction with Kernel sliced inverse regression. IEEE Trans. Knowl. Data Eng. 21, 1590–1603 (2009)

    Article  Google Scholar 

  19. Zhu, L.X., Miao, B., Peng, H.: On sliced inverse regression with large dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Andreas Artemiou is supported in part by NSF grant DMS-12-07651. The authors would like to help the editors and the referees for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Artemiou .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof of Theorem 1.

Without loss of generality, assume that \(E\boldsymbol{X} =\boldsymbol{ 0}\). First we note that for \(i = 1,-1\)

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) =& E\{E[\left (\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right )\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ =& E\{E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ \end{array}$$

where the last equality holds because \(\lambda _{\tilde{Y }}\) is positive. Since the function aa + is convex, by Jensen’s inequality we have

$$\displaystyle\begin{array}{rcl} E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}] \geq & \{E[\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}^{+}& {}\\ =& \{\lambda _{\tilde{Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+} & {}\\ \end{array}$$

where the equality follows from the model assumption (1.1). Thus

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) \geq E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.&{}\end{array}$$
(1.12)

On the first term now we have:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{\varSigma }\boldsymbol{\psi } =\mathrm{ var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}) =& \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E[\mathrm{var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]& \\ \geq & \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]. &{}\end{array}$$
(1.13)

Combining (1.12) and (1.13),

$$\displaystyle\begin{array}{rcl} L_{R}(\boldsymbol{\psi },t) \geq \mathrm{ var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.& {}\\ \end{array}$$

By the definition of the linearity condition in the theorem \(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) = \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{P}_{\boldsymbol{\beta }}^{\mathsf{T}}(\boldsymbol{\varSigma })\boldsymbol{X}\) and therefore the right-hand side of the inequality is equal to \(L_{R}(\boldsymbol{P}_{\boldsymbol{\beta }}(\boldsymbol{\varSigma })\boldsymbol{\psi },t)\). If \(\boldsymbol{\psi }\) is not in the CDRS then the inequality is strict which implies \(\boldsymbol{\psi }\) is not the minimizer. □ 

Proof of Theorem 2.

Following the same argument as in Vapnik [14] it can be shown that minimizing (1.9) is equivalent tov

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} &\mbox{ minimizing}\ \ \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi }\ \ \mbox{ over}\ \ (\boldsymbol{\zeta },t,\boldsymbol{\xi }) \\ &\mbox{ subject to}\ \ \ \boldsymbol{\xi } \geq \boldsymbol{ 0},\ \ \tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) \geq \boldsymbol{ 1} -\boldsymbol{\xi }.\end{array} & &{}\end{array}$$
(1.14)

The Lagrangian function of this problem is

$$\displaystyle\begin{array}{rcl} L(\boldsymbol{c},t,\boldsymbol{\xi },\boldsymbol{\alpha },\boldsymbol{\beta }) = \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) -\boldsymbol{ 1} + \boldsymbol{\xi }] -\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{\xi }.& &{}\end{array}$$
(1.15)

where \(\boldsymbol{\xi } = (\xi _{1},\ldots,\xi _{n})\). Let \((\boldsymbol{\zeta }^{{\ast}},\boldsymbol{\xi }^{{\ast}},t^{{\ast}})\) be a solution to problem (1.14). Using the Kuhn–Tucker Theorem, one can show that minimizing over \((\boldsymbol{\zeta },t,\boldsymbol{\xi })\) is similar as maximizing over \((\boldsymbol{\alpha },\boldsymbol{\beta })\). So, differentiating with respect to \(\boldsymbol{\zeta }\), t, and \(\boldsymbol{\xi }\) to obtain the system of equations:

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \partial L/\partial \boldsymbol{\zeta } = 2\boldsymbol{\zeta } -\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}) =\boldsymbol{ 0}\quad \\ \partial L/\partial t = \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \partial L/\partial \boldsymbol{\xi } = \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}. \quad \end{array} \right.& &{}\end{array}$$
(1.16)

Substitute the last two equations above into (1.15) to obtain

$$\displaystyle\begin{array}{rcl} \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z}) -\boldsymbol{ 1}]& &{}\end{array}$$
(1.17)

Now substitute the first equation in (1.16) (\(\boldsymbol{\zeta } = \frac{1} {2}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y})\)) in the above:

$$\displaystyle\begin{array}{rcl} \boldsymbol{1}^{\mathsf{T}}\boldsymbol{\alpha } -\frac{1} {4}(\boldsymbol{\alpha } \odot \tilde{ y})^{\mathsf{T}}\boldsymbol{Z}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}).& &{}\end{array}$$
(1.18)

Thus to minimize (1.15) we need to maximize (1.18) over the constraints

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}.\quad \end{array} \right.& &{}\end{array}$$
(1.19)

which are equivalent to the constraints in (1.10). □ 

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Artemiou, A., Shu, M. (2014). A Cost Based Reweighted Scheme of Principal Support Vector Machine. In: Akritas, M., Lahiri, S., Politis, D. (eds) Topics in Nonparametric Statistics. Springer Proceedings in Mathematics & Statistics, vol 74. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0569-0_1

Download citation

Publish with us

Policies and ethics