Optimization of Generalized $$C_p$$ Criterion for Selecting Ridge Parameters in Generalized Ridge Regression

Ohishi, Mineaki; Yanagihara, Hirokazu; Wakaki, Hirofumi

doi:10.1007/978-981-15-5925-9_23

Optimization of Generalized $C_p$ Criterion for Selecting Ridge Parameters in Generalized Ridge Regression

Conference paper
First Online: 12 June 2020

456 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 193))

Abstract

In a generalized ridge (GR) regression, since a GR estimator (GRE) depends on ridge parameters, it is important to select those parameters appropriately. Ridge parameters selected by minimizing the generalized $C_p$ ($GC_p$) criterion can be obtained as closed forms. However, because the ridge parameters selected by this criterion depend on a nonnegative value $\alpha $ expressing the strength of the penalty for model complexity, $\alpha $ must be optimized for obtaining a better GRE. In this paper, we propose a method for optimizing $\alpha $ in the $GC_p$ criterion using [12] as similar as [7].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Atkinson, A.C.: A note on the generalized information criterion for choice of a model. Biometrika 67, 413–418 (1980). https://doi.org/10.1093/biomet/67.2.413
Article MATH Google Scholar
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567
Article MathSciNet MATH Google Scholar
Efron, B.: The estimation of prediction error: covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99, 619–632 (2004). https://doi.org/10.1198/016214504000000692
Article MathSciNet MATH Google Scholar
Fujikoshi, Y., Satoh, K.: Modified AIC and $C_p$ in multivariate linear regression. Biometrika 84, 707–716 (1997). https://doi.org/10.1093/biomet/84.3.707
Article MathSciNet MATH Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970). https://doi.org/10.2307/1267351
Article MATH Google Scholar
Mallows, C.L.: Some comments on $C_p$. Technometrics 15, 661–675 (1973). https://doi.org/10.1080/00401706.1973.10489103
Article MATH Google Scholar
Nagai, I., Fukui, K., Yanagihara, H.: Choosing the number of repetitions in the multiple plug-in optimization method for the ridge parameters in multivariate generalized ridge regression. Bull. Inform. Cybernet. 45, 25–35 (2013)
Article MathSciNet Google Scholar
Nagai, I., Yanagihara, H., Satoh, K.: Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J. 42, 301–324 (2012). https://doi.org/10.32917/hmj/1355238371
Nishii, R.: Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12, 758–765 (1984). https://doi.org/10.1214/aos/1176346522
Article MathSciNet MATH Google Scholar
Ohishi, M., Yanagihara, H., Fujikoshi, Y.: A fast algorithm for optimizing ridge parameters in a generalized ridge regression by minimizing a model selection criterion. J. Statist. Plann. Infer. 204, 187–205 (2020). https://doi.org/10.1016/j.jspi.2019.04.010
Article MathSciNet MATH Google Scholar
Shor, N.Z.: A class of almost-differentiable functions and a minimization method for functions of this class. Cybernet. Syst. Anal. 8, 599–606 (1972). https://doi.org/10.1007/BF01068281
Article Google Scholar
Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135–1151 (1981). https://doi.org/10.1214/aos/1176345632
Article MathSciNet MATH Google Scholar
Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). https://doi.org/10.32917/hmj/1533088835
Yanagihara, H., Nagai, I., Satoh, K.: A bias-corrected $C_p$ criterion for optimizing ridge parameters in multivariate generalized ridge regression. J. Appl. Statist. 38, 151–172 (2009). https://doi.org/10.5023/jappstat.38.151. (in Japanese)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93, 120–131 (1998). https://doi.org/10.1080/01621459.1998.10474094
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank the associate editor and the two reviewers for their valuable comments.

Author information

Authors and Affiliations

Hiroshima University, Higashi-Hiroshima, 739-8526, Japan
Mineaki Ohishi, Hirokazu Yanagihara & Hirofumi Wakaki

Authors

Mineaki Ohishi
View author publications
You can also search for this author in PubMed Google Scholar
Hirokazu Yanagihara
View author publications
You can also search for this author in PubMed Google Scholar
Hirofumi Wakaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mineaki Ohishi .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
KES International Research, UK
Robert J. Howlett
Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Appendices

Appendix 1: Derivation of Eq. (12)

Notice that ${\varvec{y}}$ and ${\varvec{y}}^\star $ are independently and identically distributed according to $N_n ({\varvec{\eta }},$ $\sigma ^2 {\varvec{I}}_n)$. Hence, from [3], the PMSE in (11) can be expressed as

$$\begin{aligned} \mathop {\mathrm {PMSE}}[\hat{{\varvec{y}}} (\alpha )]&= \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 / \sigma ^2 \right] + 2 \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ ({\varvec{y}}- {\varvec{\eta }})' \hat{{\varvec{y}}} (\alpha ) / \sigma ^2 \right] . \end{aligned}$$

(15)

It is desirable to obtain an unbiased estimator of (15). However, doing so is very difficult because the hat matrix of $\hat{{\varvec{y}}} (\alpha )$ depends on ${\varvec{y}}$. Regarding the first term of (15), we use a naive estimator that is unbiased when $\hat{{\varvec{\theta }}} = {\varvec{0}}_k$ and $(\infty , \ldots , \infty )'$. The naive estimator is given as $q \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 / s^2 + 2$. The values q and 2 remove bias in the estimator when $\hat{{\varvec{\theta }}} = {\varvec{0}}_k$ and $(\infty , \ldots , \infty )'$ (e.g., see [4] and [14]). On the other hand, regarding the second term of (15), we consider orthogonal transformations. Let ${\varvec{u}}$, $\hat{{\varvec{u}}} (\alpha )$, and ${\varvec{\zeta }}$ be n-dimensional vectors defined by ${\varvec{u}}= (u_1, \ldots , u_n)' = {\varvec{P}}' {\varvec{y}}/ \sigma $, $\hat{{\varvec{u}}} (\alpha ) = (\hat{u}_1 (\alpha ), \ldots , \hat{u}_n (\alpha ))' = {\varvec{P}}' \hat{{\varvec{y}}} (\alpha ) / \sigma $, and ${\varvec{\zeta }}= (\zeta _1, \ldots , \zeta _n)' = {\varvec{P}}' {\varvec{\eta }}/ \sigma $, respectively, and let ${\varvec{u}}_2$, $\hat{{\varvec{u}}}_2 (\alpha )$, and ${\varvec{\zeta }}_2$ be $(n-k)$-dimensional vectors defined by ${\varvec{u}}_2 = {\varvec{P}}_2' {\varvec{y}}/ \sigma $, $\hat{{\varvec{u}}}_2 (\alpha ) = {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) / \sigma $, and ${\varvec{\zeta }}_2 = {\varvec{P}}_2' {\varvec{\eta }}/ \sigma $, respectively. Then, the second term of (15) can be expressed as follows:

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ ({\varvec{y}}- {\varvec{\eta }})' \hat{{\varvec{y}}} (\alpha ) / \sigma ^2 \right]= & {} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}- {\varvec{\zeta }})' \hat{{\varvec{u}}} (\alpha ) \right] \nonumber \\= & {} \sum _{j=1}^k \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ (u_j - \zeta _j) \hat{u}_j (\alpha ) \right] + \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}_2 - {\varvec{\zeta }}_2)' \hat{{\varvec{u}}}_2 (\alpha ) \right] . \end{aligned}$$

(16)

We can obtain the following lemma about $\hat{u}_j (\alpha )$ (the proof is given in Appendix 4).

Lemma 2

The $\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})$ is an almost differentiable function with respect to $u_j$. A partial derivative of $\hat{u}_j (\alpha )$ is given by

$$\begin{aligned} \dfrac{ \partial }{ \partial u_j } \hat{u}_j (\alpha ) = \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( 1 + \dfrac{\alpha s^2}{z_j^2} \right) \ (j \in \{1, \ldots , k\}), \end{aligned}$$

(17)

and $\mathop {\mathrm {E}}\nolimits [|\partial \hat{u}_j (\alpha ) / \partial u_j|] < 2$.

Here, the definition for almost differentiable can be found in, e.g., [11]. Notice that $u_j$ is distributed according to $N (\zeta _j, 1)$ and $u_j\ (j \in \{1, \ldots , k\})$ is independent of $w^2$. Hence, from Lemma 2, we can apply Stein’s lemma in [12] to the first term of (16) as follows:

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ (u_j - \zeta _j) \hat{u}_j (\alpha ) \right] = \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ \dfrac{ \partial }{ \partial u_j } \hat{u}_j (\alpha ) \right] \ (j \in \{1, \ldots , k\}). \end{aligned}$$

On the other hand, we have ${\varvec{P}}_2 {\varvec{P}}_2' {\varvec{1}}_n = {\varvec{1}}_n$ since ${\varvec{P}}_1' {\varvec{1}}_n = {\varvec{0}}_k$. The result and (10) imply that ${\varvec{P}}_2 {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) = {\varvec{J}}_n {\varvec{y}}$. Hence, the second term of (16) can be expressed as

$$\begin{aligned} \mathop {\mathrm {E}}\nolimits _{{\varvec{u}}} \left[ ({\varvec{u}}_2 - {\varvec{\zeta }}_2)' \hat{{\varvec{u}}}_2 (\alpha ) \right] = \mathop {\mathrm {E}}\nolimits _{{\varvec{y}}} \left[ {\varvec{y}}' {\varvec{J}}_n {\varvec{y}}- {\varvec{\eta }}' {\varvec{J}}_n {\varvec{y}}\right] / \sigma ^2 = \mathop {\mathrm {E}}\nolimits _{{\varvec{\varepsilon }}} \left[ {\varvec{\varepsilon }}' {\varvec{J}}_n {\varvec{\varepsilon }}\right] / \sigma ^2 = 1. \end{aligned}$$

Consequently, Eq. (12) is derived.

Appendix 2: Proof of Lemma 1

Regarding the first term in (12), by using ${\varvec{P}}_1' {\varvec{J}}_n = {\varvec{O}}_{k, n}$ from ${\varvec{X}}' {\varvec{1}}_n = {\varvec{0}}_k$, and $(n-k-1) s^2 = \Vert {\varvec{P}}_2' ({\varvec{I}}_n - {\varvec{J}}_n) {\varvec{y}}\Vert ^2$, the following equation holds (e.g., see [10] and [13]).

$$\begin{aligned} \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 = (n - k - 1) s^2 + \sum _{j=1}^k \left\{ \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( \dfrac{\alpha s^2}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2, \end{aligned}$$

where $s^2$ and $z_j$ are given by (5) and (7), respectively. Assume that $\alpha \in R_a$, where $R_a$ is given by (14). Then, using $c_{1, a}$ and $c_{2, a}$ in (13), terms including the indicator function can be expressed as

$$\begin{aligned} \sum _{j=1}^k \left\{ \mathop {\mathrm {I}}\left( \alpha< z_j^2 / s^2 \right) \left( \dfrac{\alpha s^2}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2= & {} s^2 (c_{1, a} + c_{2, a} \alpha ^2), \\ \sum _{j=1}^k \mathop {\mathrm {I}}\left( \alpha < z_j^2 / s^2 \right) \left( 1 + \dfrac{ \alpha s^2 }{ z_j^2 } \right)= & {} k - a + c_{2, a} \alpha . \end{aligned}$$

Consequently, Lemma 1 is proved.

Appendix 3: Proof of Theorem 1

For any $a = 0, \ldots , k-1$, $\phi _a (\alpha )\ (\alpha \in R_a)$ is an increasing function because $\phi _a (\alpha )$ is a quadratic function, $c_{2, a}$ and q are positive, and $\alpha $-coordinate of the vertex of $\phi _a (\alpha )$ is negative, where $c_{2, a}$, and $R_a$ are given by (13), and (14), respectively. Moreover, $\phi _a (\alpha )$ with $a = k$ is a constant since $c_{2, k} = 0$, and $\phi _a (t_{a+1}) > \phi _{a+1} (t_{a+1})$ holds for $a = 0, \ldots , k-1$. Therefore, candidate points of the minimizer of $\mathop {\mathrm {C}}\nolimits (\alpha )$ are restricted to $\{t_0, \ldots , t_k\}$ and consequently Theorem 1 is proved.

Appendix 4: Proof of Lemma 2

For any $j \in \{1, \ldots , k\}$, $\hat{u}_j (\alpha )$ is given by

$$\begin{aligned} \hat{u}_j (\alpha ) = v_{\alpha , j} u_j = u_j \mathop {\mathrm {I}}\left( \alpha < u_j^2 / w^2 \right) \left\{ 1 - \alpha w^2 / u_j^2 \right\} , \end{aligned}$$

where $w^2 = s^2 / \sigma ^2$. An almost differentiability for $\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})$ holds by the following lemma (the proof is given in Appendix 5).

Lemma 3

Let $f (x) = x \mathop {\mathrm {I}}(c^2 < x^2) (1 - c^2 / x^2)$, where c is a positive constant. Then, f(x) is almost differentiable.

Next, we derive a partial derivative of $\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})$. Notice that $\partial w^2 / \partial u_j = 0$ for any $j \in \{1, \ldots , k\}$ since $w^2$ is independent of $u_1, \ldots , u_k$. Hence, we have

$$\begin{aligned} \dfrac{\partial }{\partial u_j} v_{\alpha , j} = 2 \alpha \mathop {\mathrm {I}}\left( \alpha < u_j^2 / w^2 \right) \dfrac{w^2}{u_j^3}. \end{aligned}$$

Since $\partial \hat{u}_j (\alpha ) / \partial u_j = \{ \partial v_{\alpha , j} / \partial u_j \} u_j + v_{\alpha , j}$ and $u_j^2 / w^2 = z_j^2 / s^2$, we obtain (17).

Moreover, the expectation of (17) is finite because $0 \le \mathop {\mathrm {I}}(\alpha< z_j^2 / s^2) (1 + \alpha s^2 / z_j^2) < 2$. Consequently, Lemma 2 is proved.

Appendix 5: Proof of Lemma 3

It is clear that f(x) is differentiable on $\mathcal {R}= \mathbb {R}\backslash \{-c, c\}$. If $x^2 > c^2$, $f' (x) = 1 + c^2 / x^2$ and $\sup _{x: x^2 > c^2} |f' (x)| = 2$ since $f' (x)$ is decreasing as a function of $x^2$. Hence, $f' (x)$ is bounded on $\mathcal {R}$ and we have $| f(x) - f(y) | / |x-y| < 2$ when x and y are both in $(-\infty , -c)$, $[-c, c]$, or $(c, \infty )$. In addition, f(x) is an odd function. Thus, it is sufficient to show that $\{ f(x) - f(y) \} / (x-y)$ is bounded for (x, y) such that $x > c$ and $y \le c$ for Lipschitz continuity. For such conditions, the following equation holds.

$$\begin{aligned} 0< \dfrac{f(x) - f(y)}{x - y} = {\left\{ \begin{array}{ll} 1 + \dfrac{c^2}{xy}< 1 &{}(y< -c)\\ \dfrac{f(x) - 0}{x - y} \le \dfrac{f (x) - f(c)}{x - c} = f' (x^*) < 2 &{}(-c \le y \le c) \end{array}\right. }, \end{aligned}$$

where $x^*$ is some point in (c, x). Therefore, f(x) is Lipschitz continuous on $\mathbb {R}$, and consequently Lemma 3 is proved.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ohishi, M., Yanagihara, H., Wakaki, H. (2020). Optimization of Generalized $C_p$ Criterion for Selecting Ridge Parameters in Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_23

Download citation

DOI: https://doi.org/10.1007/978-981-15-5925-9_23
Published: 12 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5924-2
Online ISBN: 978-981-15-5925-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Optimization of Generalized \(C_p\) Criterion for Selecting Ridge Parameters in Generalized Ridge Regression

Abstract

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Derivation of Eq. (12)

Lemma 2

Appendix 2: Proof of Lemma 1

Appendix 3: Proof of Theorem 1

Appendix 4: Proof of Lemma 2

Lemma 3

Appendix 5: Proof of Lemma 3

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Buying options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Derivation of Eq. (12)

Lemma 2

Appendix 2: Proof of Lemma 1

Appendix 3: Proof of Theorem 1

Appendix 4: Proof of Lemma 2

Lemma 3

Appendix 5: Proof of Lemma 3

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation