Abstract
In a generalized ridge (GR) regression, since a GR estimator (GRE) depends on ridge parameters, it is important to select those parameters appropriately. Ridge parameters selected by minimizing the generalized \(C_p\) (\(GC_p\)) criterion can be obtained as closed forms. However, because the ridge parameters selected by this criterion depend on a nonnegative value \(\alpha \) expressing the strength of the penalty for model complexity, \(\alpha \) must be optimized for obtaining a better GRE. In this paper, we propose a method for optimizing \(\alpha \) in the \(GC_p\) criterion using [12] as similar as [7].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Atkinson, A.C.: A note on the generalized information criterion for choice of a model. Biometrika 67, 413–418 (1980). https://doi.org/10.1093/biomet/67.2.413
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567
Efron, B.: The estimation of prediction error: covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99, 619–632 (2004). https://doi.org/10.1198/016214504000000692
Fujikoshi, Y., Satoh, K.: Modified AIC and \(C_p\) in multivariate linear regression. Biometrika 84, 707–716 (1997). https://doi.org/10.1093/biomet/84.3.707
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970). https://doi.org/10.2307/1267351
Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). https://doi.org/10.1080/00401706.1973.10489103
Nagai, I., Fukui, K., Yanagihara, H.: Choosing the number of repetitions in the multiple plug-in optimization method for the ridge parameters in multivariate generalized ridge regression. Bull. Inform. Cybernet. 45, 25–35 (2013)
Nagai, I., Yanagihara, H., Satoh, K.: Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J. 42, 301–324 (2012). https://doi.org/10.32917/hmj/1355238371
Nishii, R.: Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12, 758–765 (1984). https://doi.org/10.1214/aos/1176346522
Ohishi, M., Yanagihara, H., Fujikoshi, Y.: A fast algorithm for optimizing ridge parameters in a generalized ridge regression by minimizing a model selection criterion. J. Statist. Plann. Infer. 204, 187–205 (2020). https://doi.org/10.1016/j.jspi.2019.04.010
Shor, N.Z.: A class of almost-differentiable functions and a minimization method for functions of this class. Cybernet. Syst. Anal. 8, 599–606 (1972). https://doi.org/10.1007/BF01068281
Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135–1151 (1981). https://doi.org/10.1214/aos/1176345632
Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). https://doi.org/10.32917/hmj/1533088835
Yanagihara, H., Nagai, I., Satoh, K.: A bias-corrected \(C_p\) criterion for optimizing ridge parameters in multivariate generalized ridge regression. J. Appl. Statist. 38, 151–172 (2009). https://doi.org/10.5023/jappstat.38.151. (in Japanese)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93, 120–131 (1998). https://doi.org/10.1080/01621459.1998.10474094
Acknowledgements
The authors thank the associate editor and the two reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Derivation of Eq. (12)
Notice that \({\varvec{y}}\) and \({\varvec{y}}^\star \) are independently and identically distributed according to \(N_n ({\varvec{\eta }},\) \(\sigma ^2 {\varvec{I}}_n)\). Hence, from [3], the PMSE in (11) can be expressed as
It is desirable to obtain an unbiased estimator of (15). However, doing so is very difficult because the hat matrix of \(\hat{{\varvec{y}}} (\alpha )\) depends on \({\varvec{y}}\). Regarding the first term of (15), we use a naive estimator that is unbiased when \(\hat{{\varvec{\theta }}} = {\varvec{0}}_k\) and \((\infty , \ldots , \infty )'\). The naive estimator is given as \(q \Vert {\varvec{y}}- \hat{{\varvec{y}}} (\alpha ) \Vert ^2 / s^2 + 2\). The values q and 2 remove bias in the estimator when \(\hat{{\varvec{\theta }}} = {\varvec{0}}_k\) and \((\infty , \ldots , \infty )'\) (e.g., see [4] and [14]). On the other hand, regarding the second term of (15), we consider orthogonal transformations. Let \({\varvec{u}}\), \(\hat{{\varvec{u}}} (\alpha )\), and \({\varvec{\zeta }}\) be n-dimensional vectors defined by \({\varvec{u}}= (u_1, \ldots , u_n)' = {\varvec{P}}' {\varvec{y}}/ \sigma \), \(\hat{{\varvec{u}}} (\alpha ) = (\hat{u}_1 (\alpha ), \ldots , \hat{u}_n (\alpha ))' = {\varvec{P}}' \hat{{\varvec{y}}} (\alpha ) / \sigma \), and \({\varvec{\zeta }}= (\zeta _1, \ldots , \zeta _n)' = {\varvec{P}}' {\varvec{\eta }}/ \sigma \), respectively, and let \({\varvec{u}}_2\), \(\hat{{\varvec{u}}}_2 (\alpha )\), and \({\varvec{\zeta }}_2\) be \((n-k)\)-dimensional vectors defined by \({\varvec{u}}_2 = {\varvec{P}}_2' {\varvec{y}}/ \sigma \), \(\hat{{\varvec{u}}}_2 (\alpha ) = {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) / \sigma \), and \({\varvec{\zeta }}_2 = {\varvec{P}}_2' {\varvec{\eta }}/ \sigma \), respectively. Then, the second term of (15) can be expressed as follows:
We can obtain the following lemma about \(\hat{u}_j (\alpha )\) (the proof is given in Appendix 4).
Lemma 2
The \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\) is an almost differentiable function with respect to \(u_j\). A partial derivative of \(\hat{u}_j (\alpha )\) is given by
and \(\mathop {\mathrm {E}}\nolimits [|\partial \hat{u}_j (\alpha ) / \partial u_j|] < 2\).
Here, the definition for almost differentiable can be found in, e.g., [11]. Notice that \(u_j\) is distributed according to \(N (\zeta _j, 1)\) and \(u_j\ (j \in \{1, \ldots , k\})\) is independent of \(w^2\). Hence, from Lemma 2, we can apply Stein’s lemma in [12] to the first term of (16) as follows:
On the other hand, we have \({\varvec{P}}_2 {\varvec{P}}_2' {\varvec{1}}_n = {\varvec{1}}_n\) since \({\varvec{P}}_1' {\varvec{1}}_n = {\varvec{0}}_k\). The result and (10) imply that \({\varvec{P}}_2 {\varvec{P}}_2' \hat{{\varvec{y}}} (\alpha ) = {\varvec{J}}_n {\varvec{y}}\). Hence, the second term of (16) can be expressed as
Consequently, Eq. (12) is derived.
Appendix 2: Proof of Lemma 1
Regarding the first term in (12), by using \({\varvec{P}}_1' {\varvec{J}}_n = {\varvec{O}}_{k, n}\) from \({\varvec{X}}' {\varvec{1}}_n = {\varvec{0}}_k\), and \((n-k-1) s^2 = \Vert {\varvec{P}}_2' ({\varvec{I}}_n - {\varvec{J}}_n) {\varvec{y}}\Vert ^2\), the following equation holds (e.g., see [10] and [13]).
where \(s^2\) and \(z_j\) are given by (5) and (7), respectively. Assume that \(\alpha \in R_a\), where \(R_a\) is given by (14). Then, using \(c_{1, a}\) and \(c_{2, a}\) in (13), terms including the indicator function can be expressed as
Consequently, Lemma 1 is proved.
Appendix 3: Proof of Theorem 1
For any \(a = 0, \ldots , k-1\), \(\phi _a (\alpha )\ (\alpha \in R_a)\) is an increasing function because \(\phi _a (\alpha )\) is a quadratic function, \(c_{2, a}\) and q are positive, and \(\alpha \)-coordinate of the vertex of \(\phi _a (\alpha )\) is negative, where \(c_{2, a}\), and \(R_a\) are given by (13), and (14), respectively. Moreover, \(\phi _a (\alpha )\) with \(a = k\) is a constant since \(c_{2, k} = 0\), and \(\phi _a (t_{a+1}) > \phi _{a+1} (t_{a+1})\) holds for \(a = 0, \ldots , k-1\). Therefore, candidate points of the minimizer of \(\mathop {\mathrm {C}}\nolimits (\alpha )\) are restricted to \(\{t_0, \ldots , t_k\}\) and consequently Theorem 1 is proved.
Appendix 4: Proof of Lemma 2
For any \(j \in \{1, \ldots , k\}\), \(\hat{u}_j (\alpha )\) is given by
where \(w^2 = s^2 / \sigma ^2\). An almost differentiability for \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\) holds by the following lemma (the proof is given in Appendix 5).
Lemma 3
Let \(f (x) = x \mathop {\mathrm {I}}(c^2 < x^2) (1 - c^2 / x^2)\), where c is a positive constant. Then, f(x) is almost differentiable.
Next, we derive a partial derivative of \(\hat{u}_j (\alpha )\ (j \in \{1, \ldots , k\})\). Notice that \(\partial w^2 / \partial u_j = 0\) for any \(j \in \{1, \ldots , k\}\) since \(w^2\) is independent of \(u_1, \ldots , u_k\). Hence, we have
Since \(\partial \hat{u}_j (\alpha ) / \partial u_j = \{ \partial v_{\alpha , j} / \partial u_j \} u_j + v_{\alpha , j}\) and \(u_j^2 / w^2 = z_j^2 / s^2\), we obtain (17).
Moreover, the expectation of (17) is finite because \(0 \le \mathop {\mathrm {I}}(\alpha< z_j^2 / s^2) (1 + \alpha s^2 / z_j^2) < 2\). Consequently, Lemma 2 is proved.
Appendix 5: Proof of Lemma 3
It is clear that f(x) is differentiable on \(\mathcal {R}= \mathbb {R}\backslash \{-c, c\}\). If \(x^2 > c^2\), \(f' (x) = 1 + c^2 / x^2\) and \(\sup _{x: x^2 > c^2} |f' (x)| = 2\) since \(f' (x)\) is decreasing as a function of \(x^2\). Hence, \(f' (x)\) is bounded on \(\mathcal {R}\) and we have \(| f(x) - f(y) | / |x-y| < 2\) when x and y are both in \((-\infty , -c)\), \([-c, c]\), or \((c, \infty )\). In addition, f(x) is an odd function. Thus, it is sufficient to show that \(\{ f(x) - f(y) \} / (x-y)\) is bounded for (x, y) such that \(x > c\) and \(y \le c\) for Lipschitz continuity. For such conditions, the following equation holds.
where \(x^*\) is some point in (c, x). Therefore, f(x) is Lipschitz continuous on \(\mathbb {R}\), and consequently Lemma 3 is proved.
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ohishi, M., Yanagihara, H., Wakaki, H. (2020). Optimization of Generalized \(C_p\) Criterion for Selecting Ridge Parameters in Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-5925-9_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5924-2
Online ISBN: 978-981-15-5925-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)