Skip to main content

Optimization of Generalized \(C_p\) Criterion for Selecting Ridge Parameters in Generalized Ridge Regression

  • Conference paper
  • First Online:
  • 456 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 193))

Abstract

In a generalized ridge (GR) regression, since a GR estimator (GRE) depends on ridge parameters, it is important to select those parameters appropriately. Ridge parameters selected by minimizing the generalized \(C_p\) (\(GC_p\)) criterion can be obtained as closed forms. However, because the ridge parameters selected by this criterion depend on a nonnegative value \(\alpha \) expressing the strength of the penalty for model complexity, \(\alpha \) must be optimized for obtaining a better GRE. In this paper, we propose a method for optimizing \(\alpha \) in the \(GC_p\) criterion using [12] as similar as [7].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Atkinson, A.C.: A note on the generalized information criterion for choice of a model. Biometrika 67, 413–418 (1980). https://doi.org/10.1093/biomet/67.2.413

    Article  MATH  Google Scholar 

  2. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567

    Article  MathSciNet  MATH  Google Scholar 

  3. Efron, B.: The estimation of prediction error: covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99, 619–632 (2004). https://doi.org/10.1198/016214504000000692

    Article  MathSciNet  MATH  Google Scholar 

  4. Fujikoshi, Y., Satoh, K.: Modified AIC and \(C_p\) in multivariate linear regression. Biometrika 84, 707–716 (1997). https://doi.org/10.1093/biomet/84.3.707

    Article  MathSciNet  MATH  Google Scholar 

  5. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970). https://doi.org/10.2307/1267351

    Article  MATH  Google Scholar 

  6. Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). https://doi.org/10.1080/00401706.1973.10489103

    Article  MATH  Google Scholar 

  7. Nagai, I., Fukui, K., Yanagihara, H.: Choosing the number of repetitions in the multiple plug-in optimization method for the ridge parameters in multivariate generalized ridge regression. Bull. Inform. Cybernet. 45, 25–35 (2013)

    Article  MathSciNet  Google Scholar 

  8. Nagai, I., Yanagihara, H., Satoh, K.: Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J. 42, 301–324 (2012). https://doi.org/10.32917/hmj/1355238371

  9. Nishii, R.: Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12, 758–765 (1984). https://doi.org/10.1214/aos/1176346522

    Article  MathSciNet  MATH  Google Scholar 

  10. Ohishi, M., Yanagihara, H., Fujikoshi, Y.: A fast algorithm for optimizing ridge parameters in a generalized ridge regression by minimizing a model selection criterion. J. Statist. Plann. Infer. 204, 187–205 (2020). https://doi.org/10.1016/j.jspi.2019.04.010

    Article  MathSciNet  MATH  Google Scholar 

  11. Shor, N.Z.: A class of almost-differentiable functions and a minimization method for functions of this class. Cybernet. Syst. Anal. 8, 599–606 (1972). https://doi.org/10.1007/BF01068281

    Article  Google Scholar 

  12. Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135–1151 (1981). https://doi.org/10.1214/aos/1176345632

    Article  MathSciNet  MATH  Google Scholar 

  13. Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). https://doi.org/10.32917/hmj/1533088835

  14. Yanagihara, H., Nagai, I., Satoh, K.: A bias-corrected \(C_p\) criterion for optimizing ridge parameters in multivariate generalized ridge regression. J. Appl. Statist. 38, 151–172 (2009). https://doi.org/10.5023/jappstat.38.151. (in Japanese)

  15. Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93, 120–131 (1998). https://doi.org/10.1080/01621459.1998.10474094

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the associate editor and the two reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mineaki Ohishi .

Editor information

Editors and Affiliations

Appendices

Appendix 1:   Derivation of Eq. (12)

Notice that \({\varvec{y}}\) and \({\varvec{y}}^\star \) are independently and identically distributed according to \(N_n ({\varvec{\eta }},\) \(\sigma ^2 {\varvec{I}}_n)\). Hence, from [3], the PMSE in (11) can be expressed as

$$\begin{aligned} \mathop {\mathrm {PMSE}}[\hat{{\varvec{y}}} (\alpha )]&= \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 / \sigma ^2 \right] + 2 \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ ({\varvec{y}}- {\varvec{\eta }})' \hat{{\varvec{y}}} (\alpha ) / \sigma ^2 \right] . \end{aligned}$$
(15)

It is desirable to obtain an unbiased estimator of (15). However, doing so is very difficult because the hat matrix of \(\hat{{\varvec{y}}} (\alpha )\) depends on \({\varvec{y}}\). Regarding the first term of (15), we use a naive estimator that is unbiased when \(\hat{{\varvec{\theta }}} = {\varvec{0}}_k\) and \((\infty , \ldots , \infty )'\). The naive estimator is given as \(q \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 / s^2 + 2\). The values q and 2 remove bias in the estimator when \(\hat{{\varvec{\theta }}} = {\varvec{0}}_k\) and \((\infty , \ldots , \infty )'\) (e.g., see [4] and [14]). On the other hand, regarding the second term of (15), we consider orthogonal transformations. Let \({\varvec{u}}\), \(\hat{{\varvec{u}}} (\alpha )\), and \({\varvec{\zeta }}\) be n-dimensional vectors defined by \({\varvec{u}}= (u_1, \ldots , u_n)' = {\varvec{P}}' {\varvec{y}}/ \sigma \), \(\hat{{\varvec{u}}} (\alpha ) = (\hat{u}_1 (\alpha ), \ldots , \hat{u}_n (\alpha ))' = {\varvec{P}}' \hat{{\varvec{y}}} (\alpha ) / \sigma \), and \({\varvec{\zeta }}= (\zeta _1, \ldots , \zeta _n)' = {\varvec{P}}' {\varvec{\eta }}/ \sigma \), respectively, and let \({\varvec{u}}_2\), \(\hat{{\varvec{u}}}_2 (\alpha )\), and \({\varvec{\zeta }}_2\) be \((n-k)\)-dimensional vectors defined by \({\varvec{u}}_2 = {\varvec{P}}_2' {\varvec{y}}/ \sigma \), \(\hat{{\varvec{u}}}_2 (\alpha ) = {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) / \sigma \), and \({\varvec{\zeta }}_2 = {\varvec{P}}_2' {\varvec{\eta }}/ \sigma \), respectively. Then, the second term of (15) can be expressed as follows:

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ ({\varvec{y}}- {\varvec{\eta }})' \hat{{\varvec{y}}} (\alpha ) / \sigma ^2 \right]= & {} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}- {\varvec{\zeta }})' \hat{{\varvec{u}}} (\alpha ) \right] \nonumber \\= & {} \sum _{j=1}^k \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ (u_j - \zeta _j) \hat{u}_j (\alpha ) \right] + \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}_2 - {\varvec{\zeta }}_2)' \hat{{\varvec{u}}}_2 (\alpha ) \right] . \end{aligned}$$
(16)

We can obtain the following lemma about \(\hat{u}_j (\alpha )\) (the proof is given in Appendix 4).

Lemma 2

The \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\) is an almost differentiable function with respect to \(u_j\). A partial derivative of \(\hat{u}_j (\alpha )\) is given by

$$\begin{aligned} \dfrac{ \partial }{ \partial u_j } \hat{u}_j (\alpha ) = \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( 1 + \dfrac{\alpha s^2}{z_j^2} \right) \ (j \in \{1, \ldots , k\}), \end{aligned}$$
(17)

and \(\mathop {\mathrm {E}}\nolimits [|\partial \hat{u}_j (\alpha ) / \partial u_j|] < 2\).

Here, the definition for almost differentiable can be found in, e.g., [11]. Notice that \(u_j\) is distributed according to \(N (\zeta _j, 1)\) and \(u_j\ (j \in \{1, \ldots , k\})\) is independent of \(w^2\). Hence, from Lemma 2, we can apply Stein’s lemma in [12] to the first term of (16) as follows:

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ (u_j - \zeta _j) \hat{u}_j (\alpha ) \right] = \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ \dfrac{ \partial }{ \partial u_j } \hat{u}_j (\alpha ) \right] \ (j \in \{1, \ldots , k\}). \end{aligned}$$

On the other hand, we have \({\varvec{P}}_2 {\varvec{P}}_2' {\varvec{1}}_n = {\varvec{1}}_n\) since \({\varvec{P}}_1' {\varvec{1}}_n = {\varvec{0}}_k\). The result and (10) imply that \({\varvec{P}}_2 {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) = {\varvec{J}}_n {\varvec{y}}\). Hence, the second term of (16) can be expressed as

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}_2 - {\varvec{\zeta }}_2)' \hat{{\varvec{u}}}_2 (\alpha ) \right] = \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ {\varvec{y}}' {\varvec{J}}_n {\varvec{y}}- {\varvec{\eta }}' {\varvec{J}}_n {\varvec{y}}\right] / \sigma ^2 = \mathop {\mathrm {E}}\nolimits _{{\varvec{\varepsilon }}} \left[ {\varvec{\varepsilon }}' {\varvec{J}}_n {\varvec{\varepsilon }}\right] / \sigma ^2 = 1. \end{aligned}$$

Consequently, Eq. (12) is derived.

Appendix 2:   Proof of Lemma 1

Regarding the first term in (12), by using \({\varvec{P}}_1' {\varvec{J}}_n = {\varvec{O}}_{k, n}\) from \({\varvec{X}}' {\varvec{1}}_n = {\varvec{0}}_k\), and \((n-k-1) s^2 = \Vert {\varvec{P}}_2' ({\varvec{I}}_n - {\varvec{J}}_n) {\varvec{y}}\Vert ^2\), the following equation holds (e.g., see [10] and [13]).

$$\begin{aligned} \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 = (n - k - 1) s^2 + \sum _{j=1}^k \left\{ \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( \dfrac{\alpha s^2}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2, \end{aligned}$$

where \(s^2\) and \(z_j\) are given by (5) and (7), respectively. Assume that \(\alpha \in R_a\), where \(R_a\) is given by (14). Then, using \(c_{1, a}\) and \(c_{2, a}\) in (13), terms including the indicator function can be expressed as

$$\begin{aligned} \sum _{j=1}^k \left\{ \mathop {\mathrm {I}}\left( \alpha< z_j^2 / s^2 \right) \left( \dfrac{\alpha s^2}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2= & {} s^2 (c_{1, a} + c_{2, a} \alpha ^2), \\ \sum _{j=1}^k \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( 1 + \dfrac{ \alpha s^2 }{ z_j^2 } \right)= & {} k - a + c_{2, a} \alpha . \end{aligned}$$

Consequently, Lemma 1 is proved.

Appendix 3:   Proof of Theorem 1

For any \(a = 0, \ldots , k-1\), \(\phi _a (\alpha )\ (\alpha \in R_a)\) is an increasing function because \(\phi _a (\alpha )\) is a quadratic function, \(c_{2, a}\) and q are positive, and \(\alpha \)-coordinate of the vertex of \(\phi _a (\alpha )\) is negative, where \(c_{2, a}\), and \(R_a\) are given by (13), and (14), respectively. Moreover, \(\phi _a (\alpha )\) with \(a = k\) is a constant since \(c_{2, k} = 0\), and \(\phi _a (t_{a+1}) > \phi _{a+1} (t_{a+1})\) holds for \(a = 0, \ldots , k-1\). Therefore, candidate points of the minimizer of \(\mathop {\mathrm {C}}\nolimits (\alpha )\) are restricted to \(\{t_0, \ldots , t_k\}\) and consequently Theorem 1 is proved.

Appendix 4:   Proof of Lemma 2

For any \(j \in \{1, \ldots , k\}\), \(\hat{u}_j (\alpha )\) is given by

$$\begin{aligned} \hat{u}_j (\alpha ) = v_{\alpha , j} u_j = u_j \mathop {\mathrm {I}}\left( \alpha < u_j^2 / w^2 \right) \left\{ 1 - \alpha w^2 / u_j^2 \right\} , \end{aligned}$$

where \(w^2 = s^2 / \sigma ^2\). An almost differentiability for \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\) holds by the following lemma (the proof is given in Appendix 5).

Lemma 3

Let \(f (x) = x \mathop {\mathrm {I}}(c^2 < x^2) (1 - c^2 / x^2)\), where c is a positive constant. Then, f(x) is almost differentiable.

Next, we derive a partial derivative of \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\). Notice that \(\partial w^2 / \partial u_j = 0\) for any \(j \in \{1, \ldots , k\}\) since \(w^2\) is independent of \(u_1, \ldots , u_k\). Hence, we have

$$\begin{aligned} \dfrac{\partial }{\partial u_j} v_{\alpha , j} = 2 \alpha \mathop {\mathrm {I}}\left( \alpha < u_j^2 / w^2 \right) \dfrac{w^2}{u_j^3}. \end{aligned}$$

Since \(\partial \hat{u}_j (\alpha ) / \partial u_j = \{ \partial v_{\alpha , j} / \partial u_j \} u_j + v_{\alpha , j}\) and \(u_j^2 / w^2 = z_j^2 / s^2\), we obtain (17).

Moreover, the expectation of (17) is finite because \(0 \le \mathop {\mathrm {I}}(\alpha< z_j^2 / s^2) (1 + \alpha s^2 / z_j^2) < 2\). Consequently, Lemma 2 is proved.

Appendix 5:   Proof of Lemma 3

It is clear that f(x) is differentiable on \(\mathcal {R}= \mathbb {R}\backslash \{-c, c\}\). If \(x^2 > c^2\), \(f' (x) = 1 + c^2 / x^2\) and \(\sup _{x: x^2 > c^2} |f' (x)| = 2\) since \(f' (x)\) is decreasing as a function of \(x^2\). Hence, \(f' (x)\) is bounded on \(\mathcal {R}\) and we have \(| f(x) - f(y) | / |x-y| < 2\) when x and y are both in \((-\infty , -c)\), \([-c, c]\), or \((c, \infty )\). In addition, f(x) is an odd function. Thus, it is sufficient to show that \(\{ f(x) - f(y) \} / (x-y)\) is bounded for (xy) such that \(x > c\) and \(y \le c\) for Lipschitz continuity. For such conditions, the following equation holds.

$$\begin{aligned} 0< \dfrac{f(x) - f(y)}{x - y} = {\left\{ \begin{array}{ll} 1 + \dfrac{c^2}{xy}< 1 &{}(y< -c)\\ \dfrac{f(x) - 0}{x - y} \le \dfrac{f (x) - f(c)}{x - c} = f' (x^*) < 2 &{}(-c \le y \le c) \end{array}\right. }, \end{aligned}$$

where \(x^*\) is some point in (cx). Therefore, f(x) is Lipschitz continuous on \(\mathbb {R}\), and consequently Lemma 3 is proved.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ohishi, M., Yanagihara, H., Wakaki, H. (2020). Optimization of Generalized \(C_p\) Criterion for Selecting Ridge Parameters in Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_23

Download citation

Publish with us

Policies and ethics