Summary
A new method for nonparametric regression data analysis by analyzing the sensitivity of normally large perturbations with the Principal Hessian Directions (PHD) method (Li 1992) is introduced, combining the merits of effective dimension reduction and visualization. We develop techniques for detecting perturbed points without knowledge of the functional form of the regression model when a small percentage of observations is subject to normally large values. The main feature in our proposed method is to estimate the deviation angle of the PHD direction. The basic idea is to recursively trim out perturbed points which cause larger directional deviations. Our multiple trimming method always reduces the pattern-ambiguity of geometric shape information about the regression surface. Several simulations with empirical results are reported.
Similar content being viewed by others
References
Chaudhuri, P., Huang, M. C., Loh, W. Y. and Yao, R. (1994), “Piecewise-polynomial regression trees,” Statistica Sinica, 4, 143–167.
Cheng, C. S. and Li, K. C. (1995), “A study of the method of Principal Hessian Direction for analysis of data from designed experiments,” Statistica Sinica, 5, 617–639.
Cook, R. D. (1998), “Principal Hessian Directions Revisited,” (with discussion), J. Amer. Stat. Assoc., 93, 84–100.
Cook, R. D. and Weisberg, S. (1989), “Regression diagnostics with dynamic graphics,” (with discussion), Technometrics, 31, 277–308.
Duan, N. and Li, K. C. (1991), “Slicing regression: A link-free regression method,” Ann. Statist., 19, 505–530.
Filliben, J. J. and Li, K. C. (1997), “A systematic approach to the analysis of complex interaction patterns in two-level factorial designs,” Technometrics, 39, 286–297.
Hall, P. and Li, K. C. (1993), “On almost linearity of low dimensional projections from high dimensional data,” Ann. Statist., 21, 867–889.
Hsing, T. and Carroll, R. J. (1992), “An asymptotic theory for sliced inverse regression,” Ann. Statist., 20, 1040–1061.
Kurt, E., Anthony, R. and Herbert, S. W. (1979), Statistical Methods for Digital Computers, Vol. 3, John Wiley.
Li, K. C. (1991), “Sliced inverse regression for dimension reduction,” (with discussion), J. Amer. Stat. Assoc., 86, 316–342.
Li, K. C. (1992), “On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma,” J. Amer. Stat. Assoc., 87, 1025–1039.
Li, K. C., Lue, H. H. and Chen, C. H. (2000), “Interactive Tree-structured Regression via Principal Hessian Directions,” J. Amer. Stat. Assoc., 95, 547–560.
Lue, H. H. (1994), “Principal-Hessian-direction-based regression trees,” unpublished Ph.D. dissertation, Department of Math., University of California, Los Angeles.
Weisberg, S. (1985), Applied Linear Regression, John Wiley.
Tierney, L. (1990), LISP-STAT: an object-oriented environment for statistical computing and dynamic graphics, New York: John Wiley & Sons.
Acknowledgment
This research was supported in part by the National Science Council, R.O.C. grant #NSC86-2115-M-130-002.
Author information
Authors and Affiliations
Appendix
Appendix
A. Proof of Lemma 3.1
To proceed this proof, a straightforward expression for the residuals with case i deleted leads to \(\widehat{r}_j^{\left( i \right)} = {y_j} - x_j^\prime {\left( {{\rm{X}}_{\left( i \right)}^\prime {{\rm{X}}_{\left( i \right)}}} \right)^{ - 1}}{\rm{X}}_{\left( i \right)}^\prime {y_{\left( i \right)}}\), where X is a full rank matrix with n rows and (p+1) columns (a column of ones included), X(i) is the (n-1) by (p+1) matrix obtained from X by deleting the ith row, and \(x_i^\prime \) is the ith row of X. To simplify this expression, we apply the equality
(see, Weisberg 1985). As a result, we have, thus completing the proof. \(\widehat{r}_j^{\left( i \right)} = {\widehat{r}_j} + {{{h_{ji}}} \over {1 - {h_{ii}}}}{\widehat{r}_i}\)
B. Proof of Corollary 3.1
To obtain this result, without loss of generality we assume \(\overline {\bf{x}} = 0\). It is straightforward to express \({\widehat{\rm{\Sigma }}_{{r^{\left( i \right)}}{{\bf{x}}_{\left( i \right)}}{{\bf{x}}_{\left( i \right)}}}} = {1 \over {n - 1}}\left( {\sum\nolimits_{j = 1}^n {\widehat{r}_j^{\left( i \right)}{{\bf{x}}_j}{\bf{x}}_j^\prime - \widehat{r}_i^{\left( i \right)}{{\bf{x}}_i}{\bf{x}}_i^\prime } } \right)\), where xj is a p-dimensional random vector for j = 1, ⋯, n. To simplify this expression, we apply Lemma 3.1. Then we have
thus completing the proof.
C. Proof of Theorem 3.1
To obtain this result, without loss of generality we assume that \({\widehat{\rm{\Sigma }}_{\bf{x}}} = {\rm{I}}\) and \(\left\| {{{\widehat{b}}_{rj}}} \right\| = 1.\,\,\,{\rm{Let}}\,\,{\rm{\Delta }}\,{\widehat{b}_{rj}}\) be the vector component of \(\widehat{b}_{rj}^{\left( i \right)}\) orthogonal to \({{{\widehat{b}}_{rj}}}\), then there exists a constant c such that \(c\widehat{b}_{rj}^{\left( i \right)} = {\widehat{b}_{rj}} + {\rm{\Delta }}{\widehat{b}_{rj}}\) and \(\cos \hat \theta _{rj}^{(i)} = (c\hat b_{rj}^{(i)},{\hat b_{rj}})/\left\| {c\hat b_{rj}^{(i)}} \right\|\left\| {{{\hat b}_{rj}}} \right\|\) where (·, ·) denotes as an inner product. Suppose that \(\theta _{rj}^{\left( i \right)}\) is positive, then it can be shown that \(\cos \widehat\theta _{rj}^{\left( i \right)} = 1/{\left( {1 + {{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^2}} \right)^{{1 \over 2}}}\), \(\sin \widehat\theta _{rj}^{\left( i \right)} = \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|/{\left( {1 + {{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^2}} \right)^{{1 \over 2}}}\) and \(\widehat\theta _{rj}^{\left( i \right)} = \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\| + O\left( {{{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^3}} \right)\). Thus we derive \(\widehat\theta _{rj}^{\left( i \right)} \approx \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|\).
We proceed with this proof by evaluating the differential \({d{{\widehat{b}}_{rj}}}\) of eigenvector \({{{\widehat{b}}_{rj}}}\) for the sample weighted covariance matrix \({\widehat{\rm{\Sigma }}_{r{\bf{xx}}}}\) with eigenvalue \({\widehat\lambda _j}\), which is denned as
for j, k = 1, ⋯, p (see, Kurt et. al. 1979). To asymptotically obtain the deviation angle estimate \(\widehat\theta _{rj}^{\left( i \right)}\) by applying \({\rm{\Delta }}{\widehat\Sigma _{r{\bf{xx}}}} \approx \widehat{r}_i^{\left( i \right)}{\widehat\Sigma _{\widetilde{{h_i}}{\bf{xx}}}}\) from Corollary 3.1 as n is sufficiently large, we can verify that
\(\widehat\theta _{rj}^{\left( i \right)} \approx \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)\), where the scalar function \({c_j}\left( {\bf{x}} \right) = {\left( {\sum\nolimits_{k \ne j} {{{\left( {{{\widehat{b}_{rj}^\prime {{\widehat{\rm{\Sigma }}}_{{{\widetilde{h}}_i}{\bf{xx}}}}{{\widehat{b}}_{rk}}} \over {{{\widehat\lambda }_j} - {{\widehat\lambda }_k}}}} \right)}^2}} } \right)^{{1 \over 2}}}\). Similarly, Suppose \(\theta _{rj}^{\left( i \right)}\) is negative, then \(\widehat\theta _{rj}^{\left( i \right)} \approx - \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)\). As a result, we obtain
where sign(z)= 1 if z> 0, −1 if z< 0, 0 otherwise, thus completing the proof of this theorem.
D. Algorithmic Description
We present the technical details of the algorithm for the absolute deviation angle estimate \(\left| {\widehat\theta _{rj}^{\left( i \right)}} \right|\) for fixed j. Suppose that the data consist of n observations, yi and xi, i = 1, ⋯, n.
-
1.
We begin with exploring multiple linear regression of y against x, and then constructing the eigenvalue decomposition
$${\widehat{\rm{\Sigma }}_{r{\bf{xx}}}}{\widehat{b}_{rj}} = {\widehat\lambda _j}{\widehat{\rm{\Sigma }}_{\bf{x}}}{\widehat{b}_{rj}}$$for achieving estimated r-based PHD directions \({\widehat{b}_{rj}}\) and eigenvalues \({\widehat\lambda _j},\,j = {\widetilde{h}_i}\) 1, ⋯, p. The hat matrix H = X(X′X)−1X′ can be obtained from the fitted values for multiple linear regression of each column vector of identity matrix I against x. The vector hi is the ith column of the difference between H and I. The deleted residual \(\widehat{r}_i^{\left( i \right)}\) is equal to \({\widehat{r}_i}{\rm{/}}\left( {1 - {h_{ii}}} \right)\).
-
2.
For each i, we need to compute the scale function cj(x) by using the results in 1
$${c_j}({\bf{x}}) = {\left( {\sum\limits_{k \ne j} {{{\left( {{{\widehat{b}_{rj}^\prime {{\widehat{\rm{\Sigma }}}_{{{\tilde h}_i}{\bf{xx}}}}{{\widehat{b}}_{rk}}} \over {{{\widehat\lambda }_j} - {{\widehat\lambda }_k}}}} \right)}^2}} } \right)^{{1 \over 2}}}$$where \({\widehat{\rm{\Sigma }}_{{{\tilde h}_i}{\bf{xx}}}} = {1 \over n}{\sum\nolimits_{j = 1}^n {{{\widetilde{h}}_{ij}}\left( {{{\bf{x}}_j} - \overline {\bf{x}} } \right)\left( {{{\bf{x}}_j} - \overline {\bf{x}} } \right)} ^\prime }\), and \({{{\widetilde{h}}_{ij}}}\) is the jth component of \({{{\widetilde{h}}_i}}\). Therefore, we obtain the absolute deviation angle estimate
$$\left| {\widehat\theta _{rj}^{\left( i \right)}} \right| \approx \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)$$
Rights and permissions
About this article
Cite this article
Heng-Hui, L. A study of sensitivity analysis on the method of Principal Hessian Directions. Computational Statistics 16, 109–130 (2001). https://doi.org/10.1007/s001800100054
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800100054