A Cost Based Reweighted Scheme of Principal Support Vector Machine

Artemiou, Andreas; Shu, Min

doi:10.1007/978-1-4939-0569-0_1

Andreas Artemiou⁴ &
Min Shu⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 74))

1701 Accesses
4 Citations

Abstract

Principal Support Vector Machine (PSVM) is a recently proposed method that uses Support Vector Machines to achieve linear and nonlinear sufficient dimension reduction under a unified framework. In this work, a reweighted scheme is used to improve the performance of the algorithm. We present basic theoretical results and demonstrate the effectiveness of the reweighted algorithm through simulations and real data application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). http://archive.ics.uci.edu/ml
Cook, R.D.: Principal Hessian directions revisited (with discussion). J. Am. Stat. Assoc. 93, 84–100 (1998a)
Article MATH Google Scholar
Cook, R.D.: Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley, New York (1998b)
Book MATH Google Scholar
Cook, R.D., Weisberg, S.: Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 316–342 (1991)
Article Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)
Google Scholar
Ein-Dor, P., Feldmesser, J.: Attributes of the performance of central processing units: a relative performance prediction model. Commun. ACM 30(4), 308–317 (1987)
Article Google Scholar
Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Stat. 4, 1871–1905 (2009)
Article MathSciNet Google Scholar
Lee, K.K., Gunn, S.R., Harris, C.J., Reed, P.A.S.: Classification of imbalanced data with transparent kernels. In: Proceedings of International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2410–2415, Washington, D.C. (2001)
Google Scholar
Li, K.-C.: Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)
Article MATH Google Scholar
Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 86, 316–342 (1992)
Article Google Scholar
Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)
Article MATH Google Scholar
Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)
Article MATH MathSciNet Google Scholar
Li, B., Artemiou, A., Li, L.: Principal support vector machine for linear and nonlinear sufficient dimension reduction. Ann. Stat. 39, 3182–3210 (2011)
Article MATH MathSciNet Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI ’99), Workshop ML3, Stockholm, pp. 55–60
Google Scholar
Weisberg, S.: Dimension reduction regression in R. J. Stat. Softw. 7(1) (2002) (Online)
Google Scholar
Wu, H.M.: Kernel sliced inverse regression with applications on classification. J. Comput. Graph. Stat. 17, 590–610 (2008)
Article Google Scholar
Yeh, Y.-R., Huang, S.-Y., Lee, Y.-Y.: Nonlinear dimension reduction with Kernel sliced inverse regression. IEEE Trans. Knowl. Data Eng. 21, 1590–1603 (2009)
Article Google Scholar
Zhu, L.X., Miao, B., Peng, H.: On sliced inverse regression with large dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

Andreas Artemiou is supported in part by NSF grant DMS-12-07651. The authors would like to help the editors and the referees for their valuable comments.

Author information

Authors and Affiliations

Department of Mathematical Sciences, Michigan Technological University, 306 Fisher Hall, 1400 Townsend Drive, Houghton, MI, 49931, USA
Andreas Artemiou
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11794-3600, USA
Min Shu

Authors

Andreas Artemiou
View author publications
You can also search for this author in PubMed Google Scholar
Min Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Artemiou .

Editor information

Editors and Affiliations

Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania, USA
Michael G. Akritas
Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
S. N. Lahiri
Department of Mathematics, University of California, San Diego, California, USA
Dimitris N. Politis

Appendix

Proof of Theorem 1.

Without loss of generality, assume that $E\boldsymbol{X} =\boldsymbol{ 0}$. First we note that for $i = 1,-1$

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) =& E\{E[\left (\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right )\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ =& E\{E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ \end{array}$$

where the last equality holds because $\lambda _{\tilde{Y }}$ is positive. Since the function a ↦ a ⁺ is convex, by Jensen’s inequality we have

$$\displaystyle\begin{array}{rcl} E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}] \geq & \{E[\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}^{+}& {}\\ =& \{\lambda _{\tilde{Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+} & {}\\ \end{array}$$

where the equality follows from the model assumption (1.1). Thus

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) \geq E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.&{}\end{array}$$

(1.12)

On the first term now we have:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{\varSigma }\boldsymbol{\psi } =\mathrm{ var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}) =& \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E[\mathrm{var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]& \\ \geq & \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]. &{}\end{array}$$

(1.13)

Combining (1.12) and (1.13),

$$\displaystyle\begin{array}{rcl} L_{R}(\boldsymbol{\psi },t) \geq \mathrm{ var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.& {}\\ \end{array}$$

By the definition of the linearity condition in the theorem $E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) = \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{P}_{\boldsymbol{\beta }}^{\mathsf{T}}(\boldsymbol{\varSigma })\boldsymbol{X}$ and therefore the right-hand side of the inequality is equal to $L_{R}(\boldsymbol{P}_{\boldsymbol{\beta }}(\boldsymbol{\varSigma })\boldsymbol{\psi },t)$. If $\boldsymbol{\psi }$ is not in the CDRS then the inequality is strict which implies $\boldsymbol{\psi }$ is not the minimizer. □

Proof of Theorem 2.

Following the same argument as in Vapnik [14] it can be shown that minimizing (1.9) is equivalent tov

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} &\mbox{ minimizing}\ \ \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi }\ \ \mbox{ over}\ \ (\boldsymbol{\zeta },t,\boldsymbol{\xi }) \\ &\mbox{ subject to}\ \ \ \boldsymbol{\xi } \geq \boldsymbol{ 0},\ \ \tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) \geq \boldsymbol{ 1} -\boldsymbol{\xi }.\end{array} & &{}\end{array}$$

(1.14)

The Lagrangian function of this problem is

$$\displaystyle\begin{array}{rcl} L(\boldsymbol{c},t,\boldsymbol{\xi },\boldsymbol{\alpha },\boldsymbol{\beta }) = \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) -\boldsymbol{ 1} + \boldsymbol{\xi }] -\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{\xi }.& &{}\end{array}$$

(1.15)

where $\boldsymbol{\xi } = (\xi _{1},\ldots,\xi _{n})$. Let $(\boldsymbol{\zeta }^{{\ast}},\boldsymbol{\xi }^{{\ast}},t^{{\ast}})$ be a solution to problem (1.14). Using the Kuhn–Tucker Theorem, one can show that minimizing over $(\boldsymbol{\zeta },t,\boldsymbol{\xi })$ is similar as maximizing over $(\boldsymbol{\alpha },\boldsymbol{\beta })$. So, differentiating with respect to $\boldsymbol{\zeta }$, t, and $\boldsymbol{\xi }$ to obtain the system of equations:

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \partial L/\partial \boldsymbol{\zeta } = 2\boldsymbol{\zeta } -\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}) =\boldsymbol{ 0}\quad \\ \partial L/\partial t = \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \partial L/\partial \boldsymbol{\xi } = \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}. \quad \end{array} \right.& &{}\end{array}$$

(1.16)

Substitute the last two equations above into (1.15) to obtain

$$\displaystyle\begin{array}{rcl} \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z}) -\boldsymbol{ 1}]& &{}\end{array}$$

(1.17)

Now substitute the first equation in (1.16) ($\boldsymbol{\zeta } = \frac{1} {2}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y})$) in the above:

$$\displaystyle\begin{array}{rcl} \boldsymbol{1}^{\mathsf{T}}\boldsymbol{\alpha } -\frac{1} {4}(\boldsymbol{\alpha } \odot \tilde{ y})^{\mathsf{T}}\boldsymbol{Z}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}).& &{}\end{array}$$

(1.18)

Thus to minimize (1.15) we need to maximize (1.18) over the constraints

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}.\quad \end{array} \right.& &{}\end{array}$$

(1.19)

which are equivalent to the constraints in (1.10). □

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Artemiou, A., Shu, M. (2014). A Cost Based Reweighted Scheme of Principal Support Vector Machine. In: Akritas, M., Lahiri, S., Politis, D. (eds) Topics in Nonparametric Statistics. Springer Proceedings in Mathematics & Statistics, vol 74. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0569-0_1

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0569-0_1
Published: 11 October 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0568-3
Online ISBN: 978-1-4939-0569-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Cost Based Reweighted Scheme of Principal Support Vector Machine

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proof of Theorem 1.

Proof of Theorem 2.

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation