Skip to main content
Log in

A study of sensitivity analysis on the method of Principal Hessian Directions

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

A new method for nonparametric regression data analysis by analyzing the sensitivity of normally large perturbations with the Principal Hessian Directions (PHD) method (Li 1992) is introduced, combining the merits of effective dimension reduction and visualization. We develop techniques for detecting perturbed points without knowledge of the functional form of the regression model when a small percentage of observations is subject to normally large values. The main feature in our proposed method is to estimate the deviation angle of the PHD direction. The basic idea is to recursively trim out perturbed points which cause larger directional deviations. Our multiple trimming method always reduces the pattern-ambiguity of geometric shape information about the regression surface. Several simulations with empirical results are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  • Chaudhuri, P., Huang, M. C., Loh, W. Y. and Yao, R. (1994), “Piecewise-polynomial regression trees,” Statistica Sinica, 4, 143–167.

    MATH  Google Scholar 

  • Cheng, C. S. and Li, K. C. (1995), “A study of the method of Principal Hessian Direction for analysis of data from designed experiments,” Statistica Sinica, 5, 617–639.

    MathSciNet  MATH  Google Scholar 

  • Cook, R. D. (1998), “Principal Hessian Directions Revisited,” (with discussion), J. Amer. Stat. Assoc., 93, 84–100.

    Article  MathSciNet  Google Scholar 

  • Cook, R. D. and Weisberg, S. (1989), “Regression diagnostics with dynamic graphics,” (with discussion), Technometrics, 31, 277–308.

    MathSciNet  Google Scholar 

  • Duan, N. and Li, K. C. (1991), “Slicing regression: A link-free regression method,” Ann. Statist., 19, 505–530.

    Article  MathSciNet  Google Scholar 

  • Filliben, J. J. and Li, K. C. (1997), “A systematic approach to the analysis of complex interaction patterns in two-level factorial designs,” Technometrics, 39, 286–297.

    Article  Google Scholar 

  • Hall, P. and Li, K. C. (1993), “On almost linearity of low dimensional projections from high dimensional data,” Ann. Statist., 21, 867–889.

    Article  MathSciNet  Google Scholar 

  • Hsing, T. and Carroll, R. J. (1992), “An asymptotic theory for sliced inverse regression,” Ann. Statist., 20, 1040–1061.

    Article  MathSciNet  Google Scholar 

  • Kurt, E., Anthony, R. and Herbert, S. W. (1979), Statistical Methods for Digital Computers, Vol. 3, John Wiley.

  • Li, K. C. (1991), “Sliced inverse regression for dimension reduction,” (with discussion), J. Amer. Stat. Assoc., 86, 316–342.

    Article  MathSciNet  Google Scholar 

  • Li, K. C. (1992), “On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma,” J. Amer. Stat. Assoc., 87, 1025–1039.

    Article  MathSciNet  Google Scholar 

  • Li, K. C., Lue, H. H. and Chen, C. H. (2000), “Interactive Tree-structured Regression via Principal Hessian Directions,” J. Amer. Stat. Assoc., 95, 547–560.

    Article  MathSciNet  Google Scholar 

  • Lue, H. H. (1994), “Principal-Hessian-direction-based regression trees,” unpublished Ph.D. dissertation, Department of Math., University of California, Los Angeles.

    Google Scholar 

  • Weisberg, S. (1985), Applied Linear Regression, John Wiley.

  • Tierney, L. (1990), LISP-STAT: an object-oriented environment for statistical computing and dynamic graphics, New York: John Wiley & Sons.

    Book  Google Scholar 

Download references

Acknowledgment

This research was supported in part by the National Science Council, R.O.C. grant #NSC86-2115-M-130-002.

Author information

Authors and Affiliations

Authors

Appendix

Appendix

A. Proof of Lemma 3.1

To proceed this proof, a straightforward expression for the residuals with case i deleted leads to \(\widehat{r}_j^{\left( i \right)} = {y_j} - x_j^\prime {\left( {{\rm{X}}_{\left( i \right)}^\prime {{\rm{X}}_{\left( i \right)}}} \right)^{ - 1}}{\rm{X}}_{\left( i \right)}^\prime {y_{\left( i \right)}}\), where X is a full rank matrix with n rows and (p+1) columns (a column of ones included), X(i) is the (n-1) by (p+1) matrix obtained from X by deleting the ith row, and \(x_i^\prime \) is the ith row of X. To simplify this expression, we apply the equality

$${\left( {{\rm{X}}_{\left( i \right)}^\prime {{\rm{X}}_{\left( i \right)}}} \right)^{ - 1}} = {\left( {{{\rm{X}}^\prime }{\rm{X}}} \right)^{ - 1}} + {1 \over {1 - {h_{ii}}}}{\left( {{{\rm{X}}^\prime }{\rm{X}}} \right)^{ - 1}}{x_i}x_i^\prime {\left( {{{\rm{X}}^\prime }{\rm{X}}} \right)^{ - 1}}$$

(see, Weisberg 1985). As a result, we have, thus completing the proof. \(\widehat{r}_j^{\left( i \right)} = {\widehat{r}_j} + {{{h_{ji}}} \over {1 - {h_{ii}}}}{\widehat{r}_i}\)

B. Proof of Corollary 3.1

To obtain this result, without loss of generality we assume \(\overline {\bf{x}} = 0\). It is straightforward to express \({\widehat{\rm{\Sigma }}_{{r^{\left( i \right)}}{{\bf{x}}_{\left( i \right)}}{{\bf{x}}_{\left( i \right)}}}} = {1 \over {n - 1}}\left( {\sum\nolimits_{j = 1}^n {\widehat{r}_j^{\left( i \right)}{{\bf{x}}_j}{\bf{x}}_j^\prime - \widehat{r}_i^{\left( i \right)}{{\bf{x}}_i}{\bf{x}}_i^\prime } } \right)\), where xj is a p-dimensional random vector for j = 1, ⋯, n. To simplify this expression, we apply Lemma 3.1. Then we have

$$\begin{array}{*{20}{c}} {{{\hat \sum }_{{r^{(i)}}{x_{(i)}}{x_{(i)}}}}}& = &{\frac{n}{{n - 1}}{{\hat \sum }_{rxx}} + \frac{{\hat r_i^{(i)}}}{{n - 1}}(\sum\limits_{j = 1}^n {{h_{ji}}{x_j}{{x'}_i}} )} \\ \;& = &{\frac{n}{{n = 1}}{{\hat \sum }_{rxx}} + \frac{n}{{n - 1}}\hat r_i^{(i)}{{\hat \sum }_{{{\bar h}_i}xx}}\;\;} \end{array}$$

thus completing the proof.

C. Proof of Theorem 3.1

To obtain this result, without loss of generality we assume that \({\widehat{\rm{\Sigma }}_{\bf{x}}} = {\rm{I}}\) and \(\left\| {{{\widehat{b}}_{rj}}} \right\| = 1.\,\,\,{\rm{Let}}\,\,{\rm{\Delta }}\,{\widehat{b}_{rj}}\) be the vector component of \(\widehat{b}_{rj}^{\left( i \right)}\) orthogonal to \({{{\widehat{b}}_{rj}}}\), then there exists a constant c such that \(c\widehat{b}_{rj}^{\left( i \right)} = {\widehat{b}_{rj}} + {\rm{\Delta }}{\widehat{b}_{rj}}\) and \(\cos \hat \theta _{rj}^{(i)} = (c\hat b_{rj}^{(i)},{\hat b_{rj}})/\left\| {c\hat b_{rj}^{(i)}} \right\|\left\| {{{\hat b}_{rj}}} \right\|\) where (·, ·) denotes as an inner product. Suppose that \(\theta _{rj}^{\left( i \right)}\) is positive, then it can be shown that \(\cos \widehat\theta _{rj}^{\left( i \right)} = 1/{\left( {1 + {{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^2}} \right)^{{1 \over 2}}}\), \(\sin \widehat\theta _{rj}^{\left( i \right)} = \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|/{\left( {1 + {{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^2}} \right)^{{1 \over 2}}}\) and \(\widehat\theta _{rj}^{\left( i \right)} = \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\| + O\left( {{{\left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|}^3}} \right)\). Thus we derive \(\widehat\theta _{rj}^{\left( i \right)} \approx \left\| {{\rm{\Delta }}{{\widehat{b}}_{rj}}} \right\|\).

We proceed with this proof by evaluating the differential \({d{{\widehat{b}}_{rj}}}\) of eigenvector \({{{\widehat{b}}_{rj}}}\) for the sample weighted covariance matrix \({\widehat{\rm{\Sigma }}_{r{\bf{xx}}}}\) with eigenvalue \({\widehat\lambda _j}\), which is denned as

$$d{\widehat{b}_{rj}} = \sum\limits_{k \ne j} {{{\widehat{b}_{rj}^\prime d{{\widehat{\rm{\Sigma }}}_{r{\bf{xx}}}}{{\widehat{b}}_{rk}}} \over {{{\widehat\lambda }_j} - {{\widehat\lambda }_k}}}{{\widehat{b}}_{rk}}} $$

for j, k = 1, ⋯, p (see, Kurt et. al. 1979). To asymptotically obtain the deviation angle estimate \(\widehat\theta _{rj}^{\left( i \right)}\) by applying \({\rm{\Delta }}{\widehat\Sigma _{r{\bf{xx}}}} \approx \widehat{r}_i^{\left( i \right)}{\widehat\Sigma _{\widetilde{{h_i}}{\bf{xx}}}}\) from Corollary 3.1 as n is sufficiently large, we can verify that

$$\begin{array}{*{20}{c}} {\left\| {\Delta {{\hat b}_{rj}}} \right\|}& = &{(\Delta {{\hat b}_{rj}},\Delta {{\hat b}_{rj}})\;\;\;\;\;\;\;\;\;\;} \\ \;& \approx &{\sum\limits_{k \ne j} {{{(\frac{{{{\hat b'}_{rj}}\Delta {{\hat \sum }_{rxx}}{{\hat b}_{rk}}}}{{{{\hat \lambda }_j} - {{\hat \lambda }_k}}})}^2}\;\;\;} } \\ \;& \approx &{{{(\hat r_i^{(i)})}^2}\sum\limits_{k \ne j} {(\frac{{{{\hat b'}_{rj}}{{\hat \sum }_{{{\bar h}_{ixx}}}}{{\hat b}_{rk}}}}{{{{\hat \lambda }_j} - {{\hat \lambda }_k}}}).} } \end{array}$$

\(\widehat\theta _{rj}^{\left( i \right)} \approx \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)\), where the scalar function \({c_j}\left( {\bf{x}} \right) = {\left( {\sum\nolimits_{k \ne j} {{{\left( {{{\widehat{b}_{rj}^\prime {{\widehat{\rm{\Sigma }}}_{{{\widetilde{h}}_i}{\bf{xx}}}}{{\widehat{b}}_{rk}}} \over {{{\widehat\lambda }_j} - {{\widehat\lambda }_k}}}} \right)}^2}} } \right)^{{1 \over 2}}}\). Similarly, Suppose \(\theta _{rj}^{\left( i \right)}\) is negative, then \(\widehat\theta _{rj}^{\left( i \right)} \approx - \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)\). As a result, we obtain

$$\widehat\theta _{rj}^{\left( i \right)} \approx {\rm{sign}}\left( {\theta _{rj}^{\left( i \right)}} \right)\left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)$$

where sign(z)= 1 if z> 0, −1 if z< 0, 0 otherwise, thus completing the proof of this theorem.

D. Algorithmic Description

We present the technical details of the algorithm for the absolute deviation angle estimate \(\left| {\widehat\theta _{rj}^{\left( i \right)}} \right|\) for fixed j. Suppose that the data consist of n observations, yi and xi, i = 1, ⋯, n.

  1. 1.

    We begin with exploring multiple linear regression of y against x, and then constructing the eigenvalue decomposition

    $${\widehat{\rm{\Sigma }}_{r{\bf{xx}}}}{\widehat{b}_{rj}} = {\widehat\lambda _j}{\widehat{\rm{\Sigma }}_{\bf{x}}}{\widehat{b}_{rj}}$$

    for achieving estimated r-based PHD directions \({\widehat{b}_{rj}}\) and eigenvalues \({\widehat\lambda _j},\,j = {\widetilde{h}_i}\) 1, ⋯, p. The hat matrix H = X(XX)−1X can be obtained from the fitted values for multiple linear regression of each column vector of identity matrix I against x. The vector hi is the ith column of the difference between H and I. The deleted residual \(\widehat{r}_i^{\left( i \right)}\) is equal to \({\widehat{r}_i}{\rm{/}}\left( {1 - {h_{ii}}} \right)\).

  2. 2.

    For each i, we need to compute the scale function cj(x) by using the results in 1

    $${c_j}({\bf{x}}) = {\left( {\sum\limits_{k \ne j} {{{\left( {{{\widehat{b}_{rj}^\prime {{\widehat{\rm{\Sigma }}}_{{{\tilde h}_i}{\bf{xx}}}}{{\widehat{b}}_{rk}}} \over {{{\widehat\lambda }_j} - {{\widehat\lambda }_k}}}} \right)}^2}} } \right)^{{1 \over 2}}}$$

    where \({\widehat{\rm{\Sigma }}_{{{\tilde h}_i}{\bf{xx}}}} = {1 \over n}{\sum\nolimits_{j = 1}^n {{{\widetilde{h}}_{ij}}\left( {{{\bf{x}}_j} - \overline {\bf{x}} } \right)\left( {{{\bf{x}}_j} - \overline {\bf{x}} } \right)} ^\prime }\), and \({{{\widetilde{h}}_{ij}}}\) is the jth component of \({{{\widetilde{h}}_i}}\). Therefore, we obtain the absolute deviation angle estimate

    $$\left| {\widehat\theta _{rj}^{\left( i \right)}} \right| \approx \left| {\widehat{r}_i^{\left( i \right)}} \right|{c_j}\left( {\bf{x}} \right)$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heng-Hui, L. A study of sensitivity analysis on the method of Principal Hessian Directions. Computational Statistics 16, 109–130 (2001). https://doi.org/10.1007/s001800100054

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s001800100054

Keywords

Navigation