Abstract
Principal Support Vector Machine (PSVM) is a recently proposed method that uses Support Vector Machines to achieve linear and nonlinear sufficient dimension reduction under a unified framework. In this work, a reweighted scheme is used to improve the performance of the algorithm. We present basic theoretical results and demonstrate the effectiveness of the reweighted algorithm through simulations and real data application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). http://archive.ics.uci.edu/ml
Cook, R.D.: Principal Hessian directions revisited (with discussion). J. Am. Stat. Assoc. 93, 84–100 (1998a)
Cook, R.D.: Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley, New York (1998b)
Cook, R.D., Weisberg, S.: Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 316–342 (1991)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)
Ein-Dor, P., Feldmesser, J.: Attributes of the performance of central processing units: a relative performance prediction model. Commun. ACM 30(4), 308–317 (1987)
Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Stat. 4, 1871–1905 (2009)
Lee, K.K., Gunn, S.R., Harris, C.J., Reed, P.A.S.: Classification of imbalanced data with transparent kernels. In: Proceedings of International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2410–2415, Washington, D.C. (2001)
Li, K.-C.: Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)
Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 86, 316–342 (1992)
Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)
Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)
Li, B., Artemiou, A., Li, L.: Principal support vector machine for linear and nonlinear sufficient dimension reduction. Ann. Stat. 39, 3182–3210 (2011)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI ’99), Workshop ML3, Stockholm, pp. 55–60
Weisberg, S.: Dimension reduction regression in R. J. Stat. Softw. 7(1) (2002) (Online)
Wu, H.M.: Kernel sliced inverse regression with applications on classification. J. Comput. Graph. Stat. 17, 590–610 (2008)
Yeh, Y.-R., Huang, S.-Y., Lee, Y.-Y.: Nonlinear dimension reduction with Kernel sliced inverse regression. IEEE Trans. Knowl. Data Eng. 21, 1590–1603 (2009)
Zhu, L.X., Miao, B., Peng, H.: On sliced inverse regression with large dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)
Acknowledgements
Andreas Artemiou is supported in part by NSF grant DMS-12-07651. The authors would like to help the editors and the referees for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof of Theorem 1.
Without loss of generality, assume that \(E\boldsymbol{X} =\boldsymbol{ 0}\). First we note that for \(i = 1,-1\)
where the last equality holds because \(\lambda _{\tilde{Y }}\) is positive. Since the function a ↦ a + is convex, by Jensen’s inequality we have
where the equality follows from the model assumption (1.1). Thus
On the first term now we have:
By the definition of the linearity condition in the theorem \(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) = \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{P}_{\boldsymbol{\beta }}^{\mathsf{T}}(\boldsymbol{\varSigma })\boldsymbol{X}\) and therefore the right-hand side of the inequality is equal to \(L_{R}(\boldsymbol{P}_{\boldsymbol{\beta }}(\boldsymbol{\varSigma })\boldsymbol{\psi },t)\). If \(\boldsymbol{\psi }\) is not in the CDRS then the inequality is strict which implies \(\boldsymbol{\psi }\) is not the minimizer. □
Proof of Theorem 2.
Following the same argument as in Vapnik [14] it can be shown that minimizing (1.9) is equivalent tov
The Lagrangian function of this problem is
where \(\boldsymbol{\xi } = (\xi _{1},\ldots,\xi _{n})\). Let \((\boldsymbol{\zeta }^{{\ast}},\boldsymbol{\xi }^{{\ast}},t^{{\ast}})\) be a solution to problem (1.14). Using the Kuhn–Tucker Theorem, one can show that minimizing over \((\boldsymbol{\zeta },t,\boldsymbol{\xi })\) is similar as maximizing over \((\boldsymbol{\alpha },\boldsymbol{\beta })\). So, differentiating with respect to \(\boldsymbol{\zeta }\), t, and \(\boldsymbol{\xi }\) to obtain the system of equations:
Substitute the last two equations above into (1.15) to obtain
Now substitute the first equation in (1.16) (\(\boldsymbol{\zeta } = \frac{1} {2}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y})\)) in the above:
Thus to minimize (1.15) we need to maximize (1.18) over the constraints
which are equivalent to the constraints in (1.10). □
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this paper
Cite this paper
Artemiou, A., Shu, M. (2014). A Cost Based Reweighted Scheme of Principal Support Vector Machine. In: Akritas, M., Lahiri, S., Politis, D. (eds) Topics in Nonparametric Statistics. Springer Proceedings in Mathematics & Statistics, vol 74. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0569-0_1
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0569-0_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0568-3
Online ISBN: 978-1-4939-0569-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)