Skip to main content
Log in

Robust and sparse regression in generalized linear model by stochastic optimization

  • Original Paper
  • Information Theory and Statistics
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

The generalized linear model (GLM) plays a key role in regression analyses. In high-dimensional data, the sparse GLM has been used but it is not robust against outliers. Recently, the robust methods have been proposed for the specific example of the sparse GLM. Among them, we focus on the robust and sparse linear regression based on the \(\gamma\)-divergence. The estimator of the \(\gamma\)-divergence has strong robustness under heavy contamination. In this paper, we extend the robust and sparse linear regression based on the \(\gamma\)-divergence to the robust and sparse GLM based on the \(\gamma\)-divergence with a stochastic optimization approach to obtain the estimate. We adopt the randomized stochastic projected gradient descent as a stochastic optimization approach and extend the established convergence property to the classical first-order necessary condition. By virtue of the stochastic optimization approach, we can efficiently estimate parameters for very large problems. Particularly, we show the linear regression, logistic regression and Poisson regression with \(L_1\) regularization in detail as specific examples of robust and sparse GLM. In numerical experiments and real data analysis, the proposed method outperformed comparative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Acknowledgements

This work was partially supported by JSPS KAKENHI Grant Number 17K00065.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takayuki Kawashima.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Here, we prove a convergence of \(\sum _{y=0}^\infty f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\) and \(\sum _{y=0}^\infty (y- y_{t,i} ) f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\). First, let us consider \(\sum _{y=0}^\infty f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\) and we denote n-th term that \(S_n = f(n|x_{t,i};\theta ^{(t)})^{1+\gamma }\). Then, we use the dalembert ratio test for \(S_n\):

$$\begin{aligned}&\lim _{n \rightarrow \infty } \left| \frac{ S_{n+1}}{S_n} \right| \\&\quad =\lim _{n \rightarrow \infty } \left| \frac{ f(n+1|x_{t,i};\theta ^{(t)})^{1+\gamma } }{f(n|x_{t,i};\theta ^{(t)})^{1+\gamma }} \right| \\&\quad =\lim _{n \rightarrow \infty } \left| \frac{ \frac{\exp (-\mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)}) ) }{n+1!} \mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)})^{n+1} }{ \frac{\exp (-\mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)}) ) }{n!} \mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)})^n } \right| ^{1+\gamma } \\&\quad =\lim _{n \rightarrow \infty } \left| \frac{ \mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)}) }{ n+1 } \right| ^{1+\gamma } \\&\qquad \text{ If } \text{ the } \text{ term } \mu _{x_{t,i}}(\beta _0^{(t)}, \beta ^{(t)}) \text{ is } \text{ bounded, } \\&\quad = 0. \end{aligned}$$

Therefore, \(\sum _{y=0}^\infty f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\) converges.

Next, let us consider \(\sum _{y=0}^\infty (y - y_{t,i} ) f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\) and we denote n-th term that \(S^{'}_n =(n - y_{t,i} ) f(n|x_{t,i};\theta ^{(t)})^{1+\gamma }\). Then, we use the dalembert ratio test for \(S^{'}_n\):

$$\begin{aligned}&\lim _{n \rightarrow \infty } \left| \frac{ S^{'}_{n+1}}{S^{'}_n} \right| \\&\quad = \lim _{n \rightarrow \infty } \left| \frac{ (1+\frac{1}{n} - \frac{y_{t,i}}{n} )f(n+1|x_{t,i};\theta ^{(t)})^{1+\gamma } }{ (1 - \frac{y_{t,i}}{n} )f(n|x_{t,i};\theta ^{(t)})^{1+\gamma } } \right| \\&\quad = \lim _{n \rightarrow \infty } \left| \frac{ (1+\frac{1}{n} - \frac{y_{t,i}}{n} )}{ (1 - \frac{y_{t,i}}{n} ) } \right| \left| \frac{ f(n+1|x_{t,i};\theta ^{(t)})^{1+\gamma } }{ f(n|x_{t,i};\theta ^{(t)})^{1+\gamma } } \right| \\&\quad = 0. \end{aligned}$$

Therefore, \(\sum _{y=0}^\infty (y-y_{t,i}) f(y|x_{t,i};\theta ^{(t)})^{1+\gamma }\) converges.

Appendix 2

The proof of Theorem 2.

$$\begin{aligned}&\lim _{ k \downarrow 0} \frac{ \varPsi (\theta ^{(R)} + k\delta ) - \varPsi (\theta ^{(R)}) }{k} \nonumber \\&\quad = \lim _{ k \downarrow 0} \frac{ E_{(x,y)} \! \left[ \! l( (x,y);\theta ^{(R)} {+} k\delta \! ) \! \right] \! {-} E_{(x,y)} \! \left[ \! l( (x,y);\theta ^{(R)} ) \! \right] \! {+} \lambda P(\theta ^{(R)} {+} k\delta ) {-} \lambda P(\theta ^{(R)}) }{k} \nonumber \\&\quad = \lim _{ k \downarrow 0} \frac{ E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} + k\delta ) \right] - E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} ) \right] }{k} \nonumber \\&\qquad + \lim _{ k \downarrow 0} \frac{ \lambda P(\theta ^{(R)} + k\delta ) - \lambda P(\theta ^{(R)}) }{k} . \end{aligned}$$
(22)

The directional derivative of the differentiable function always exists and is represented by the dot product with the gradient of the differentiable function and the direction given by

$$\begin{aligned}&\lim _{ k \downarrow 0} \frac{ E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} + k\delta ) \right] - E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} ) \right] }{k} \nonumber \\&\quad = \left\langle \nabla E_{(x,y)} \left[ l( (x,y);\theta ^{(R)}) \right] , \delta \right\rangle . \end{aligned}$$
(23)

Moreover, the directional derivative of the (proper) convex function exists at the relative interior point of the domain and is greater than the dot product with the subgradient of the convex function and direction (Rockafellar 1970) given by

$$\begin{aligned} \lim _{ k \downarrow 0} \frac{ \lambda P(\theta ^{(R)} + k\delta ) - \lambda P(\theta ^{(R)}) }{k}&= \sup _{g \in \partial P(\theta ^{(R)}) } \lambda \left\langle g , \delta \right\rangle \nonumber \\&\ge \lambda \left\langle g , \delta \right\rangle \quad \ for \ any \ g \in \partial P(\theta ^{(R)}). \end{aligned}$$
(24)

Then, by the optimality condition of (16), we have the following equation

$$\begin{aligned}&0 \in \nabla E_{(x,y)} \left[ l((x,y); \theta ^{(R)} ) \right] + \lambda \partial P(\theta ^{+}) +\frac{1}{\eta _R} \left\{ \nabla w \left( \theta ^{+} \right) - \nabla w \left( \theta ^{(R)} \right) \right\} \nonumber \\&\quad \frac{1}{\eta _R} \left\{ \nabla w \left( \theta ^{(R)} \right) - \nabla w \left( \theta ^{+} \right) \right\} \in \nabla E_{(x,y)} \left[ l((x,y); \theta ^{(R)} ) \right] + \lambda \partial P(\theta ^{+}) . \end{aligned}$$
(25)

Therefore, we can obtain (21) from \(P_{X,R} \approx 0\), (22), (23), (24) and (25) as follows;

$$\begin{aligned}&\lim _{ k \downarrow 0} \frac{ E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} + k\delta ) \right] - E_{(x,y)} \left[ l( (x,y);\theta ^{(R)} ) \right] }{k} \\&\qquad + \lim _{ k \downarrow 0} \frac{ \lambda P(\theta ^{(R)} + k\delta ) - \lambda P(\theta ^{(R)}) }{k} \\&\quad \ge \left\langle \nabla E_{(x,y)} \left[ l( (x,y);\theta ^{(R)}) \right] , \delta \right\rangle +\lambda \left\langle g , \delta \right\rangle \quad for \ any \ g \in \partial P(\theta ^{(R)}) \\&\quad =\left\langle \nabla E_{(x,y)} \left[ l( (x,y);\theta ^{(R)}) \right] +\lambda g ,\delta \right\rangle \quad for \ any \ g \in \partial P(\theta ^{(R)}) \\&\qquad \ni 0 . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kawashima, T., Fujisawa, H. Robust and sparse regression in generalized linear model by stochastic optimization. Jpn J Stat Data Sci 2, 465–489 (2019). https://doi.org/10.1007/s42081-019-00049-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-019-00049-9

Keywords

Navigation