Tucker Tensor Regression and Neuroimaging Analysis

Li, Xiaoshan; Xu, Da; Zhou, Hua; Li, Lexin

doi:10.1007/s12561-018-9215-6

Tucker Tensor Regression and Neuroimaging Analysis

Published: 07 March 2018

Volume 10, pages 520–545, (2018)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Xiaoshan Li¹,
Da Xu³,
Hua Zhou² &
…
Lexin Li ORCID: orcid.org/0000-0003-2962-1989³

2349 Accesses
74 Citations
Explore all metrics

Abstract

Neuroimaging data often take the form of high-dimensional arrays, also known as tensors. Addressing scientific questions arising from such data demands new regression models that take multidimensional arrays as covariates. Simply turning an image array into a vector would both cause extremely high dimensionality and destroy the inherent spatial structure of the array. In a recent work, Zhou et al. (J Am Stat Assoc, 108(502):540–552, 2013) proposed a family of generalized linear tensor regression models based upon the CP (CANDECOMP/PARAFAC) decomposition of regression coefficient array. Low-rank approximation brings the ultrahigh dimensionality to a manageable level and leads to efficient estimation. In this article, we propose a tensor regression model based on the more flexible Tucker decomposition. Compared to the CP model, Tucker regression model allows different number of factors along each mode. Such flexibility leads to several advantages that are particularly suited to neuroimaging analysis, including further reduction of the number of free parameters, accommodation of images with skewed dimensions, explicit modeling of interactions, and a principled way of image downsizing. We also compare the Tucker model with CP numerically on both simulated data and real magnetic resonance imaging data, and demonstrate its effectiveness in finite sample performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Methods for Tensor Data Analysis

Dimension Reduction for Tensor Classification

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

ADHD (2017) The ADHD-200 sample. http://fcon$XXSlahUndXX$1000.projects.nitrc.org/indi/adhd200/. Accessed Mar 2017
ADNI (2017) Alzheimer’s disease neuroimaging initiative. http://adni.loni.ucla.edu. Accessed Mar 2017
Caffo B, Crainiceanu C, Verduzco G, Joel S, Mostofsky SH, Bassett S, Pekar J (2010) Two-stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. Neuroimage 51(3):1140–1149
Article Google Scholar
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev. 43(1):129–159
Article MathSciNet Google Scholar
de Leeuw J (1994) Block-relaxation algorithms in statistics. In: Information systems and data analysis. Springer, Berlin, pp 308–325
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Article Google Scholar
Friston K, Ashburner J, Kiebel S, Nichols T, Penny W (eds) (2007) Statistical parametric mapping: the analysis of functional brain images. Academic Press, London
Google Scholar
Goldsmith J, Huang L, Crainiceanu C (2014) Smooth scalar-on-image regression via spatial bayesian variable selection. J Comput Graph Stat 23:46–64
Article MathSciNet Google Scholar
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Article MathSciNet Google Scholar
Lange K (2004) Optimization. Springer texts in statistics. Springer, New York
Google Scholar
Lange K (2010) Numerical analysis for statisticians. Statistics and computing, second edn. Springer, New York
Book Google Scholar
Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer texts in statistics, third edn. Springer, New York
Google Scholar
Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG (2011) Multiscale adaptive regression models for neuroimaging data. J R Stat Soc 73:559–578
Article MathSciNet Google Scholar
Li F, Zhang T, Wang Q, Gonzalez M, Maresh E, Coan J (2015) Spatial Bayesian variable selection and grouping in high-dimensional scalar-on-image regressions. Ann Appl Stat (in press)
McCullagh P, Nelder JA (1983) Generalized linear models. Monographs on statistics and applied probability. Chapman & Hall, London
Book Google Scholar
Reiss P, Ogden R (2010) Functional generalized linear models with images as predictors. Biometrics 66:61–69
Article MathSciNet Google Scholar
Rothenberg TJ (1971) Identification in parametric models. Econometrica 39(3):577–91
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58(1):267–288
MathSciNet MATH Google Scholar
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31:279–311
Article MathSciNet Google Scholar
van der Vaart AW (1998) Asymptotic statistics, volume 3 of Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Google Scholar
Wang X, Nan B, Zhu J, Koeppe R (2014) Regularized 3D functional regression for brain image data via haar wavelets. Ann Appl Stat 8:1045–1064
Article MathSciNet Google Scholar
Yue Y, Loh JM, Lindquist MA (2010) Adaptive spatial smoothing of fMRI images. Stat Interface 3:3–14
Article MathSciNet Google Scholar
Zhou H, Li L (2014) Regularized matrix regression. J R Stat Soc 76:463–483
Article MathSciNet Google Scholar
Zhou H, Li L, Zhu H (2013) Tensor regression with applications in neuroimaging data analysis. J Am Stat Assoc 108(502):540–552
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Wells Fargo & Company, Charlotte, 28202, USA
Xiaoshan Li
University of California, Los Angeles, USA
Hua Zhou
Division of Biostatistics, University of California, Berkeley, 94720, USA
Da Xu & Lexin Li

Authors

Xiaoshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Da Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lexin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lexin Li.

Appendix

1.1 Proof of Lemma 1

We rewrite the array inner product

$$\begin{aligned} \langle {\varvec{B}},{\varvec{X}}\rangle= & {} \langle {\varvec{B}}_{(d)},{\varvec{X}}_{(d)} \rangle = \langle {\varvec{B}}_d {\varvec{G}}_{(d)} ({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1) ^{\tiny {\text{ T }}}, {\varvec{X}}_{(d)} \rangle \\= & {} \langle {\varvec{G}}_{(d)}, {\varvec{B}}_d ^{\tiny {\text{ T }}}{\varvec{X}}_{(d)}({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1) \rangle \\= & {} \langle {\varvec{G}}_{(d)}, \tilde{\varvec{X}}_{(d)} \rangle = \langle {\varvec{G}}, \tilde{\varvec{X}}\rangle , \end{aligned}$$

where the second and fourth equalities follow from (4) and the third follows from the invariance of trace function under cyclic permutation.

1.2 Proof of Proposition 1

It is easy to see that the block relaxation algorithm monotonically increases the objective values, i.e., $\ell (\varvec{\theta }^{(t+1)}) \ge \ell (\varvec{\theta }^{(t)})$ for all $t \ge 0$. Therefore its global convergence property follows from the standard theory for monotone algorithms [5, 11, 12]. Specifically global convergence is guaranteed under the following conditions: (i) $\ell $ is coercive, (ii) the stationary points of $\ell $ are isolated, (iii) the algorithmic mapping is continuous, (iv) $\varvec{\theta }$ is a fixed point of the algorithm if and only if it is a stationarity point of $\ell $, and (v) $\ell (\varvec{\theta }^{(t+1)}) \ge \ell (\varvec{\theta }^{(t)})$ with equality if and only if $\theta ^{(t)}$ is a fixed point of the algorithm. Condition (i) is guaranteed by the compactness of the set $\{\varvec{\theta }: \ell (\varvec{\theta }) \ge \ell (\varvec{\theta }^{(0)})$. Condition (ii) is assumed. Condition (iii) follows from the strict concavity assumption and implicit function theorem. By Fermat’s principle, $\varvec{\theta }= ({\varvec{G}}, {\varvec{B}}_1, \ldots , {\varvec{B}}_D)$ is a fixed point of the block relaxation algorithm if $D\ell ({\varvec{G}}) = \mathbf{0}$ and $D\ell ({\varvec{B}}_d)=\mathbf{0}$ for all d. Thus $\varvec{\theta }$ is a fixed point if and only if it is a stationarity point of $\ell $, i.e., condition (iv) is satisfied. Condition (v) follows from the monotonicity of the block relaxation algorithm. Local convergence follows from the classical Ostrowski theorem, which states that the algorithmic sequence $\varvec{\theta }^{(t)}$ is local attracted to strictly local minimum $\varvec{\theta }^{(\infty )}$ if the spectral radius of the differential of the algorithmic map $\rho [dM(\varvec{\theta }^{(\infty )})]$ is strictly less than one. This follows from the strict concavity assumption of the block updates. See Zhou et al. [25] for more details.

1.3 Proof of Lemma 2

Assume ${\varvec{B}}$ admits the Tucker decomposition (3). By (4),

$$\begin{aligned} {\varvec{B}}_{(d)} = {\varvec{B}}_d {\varvec{G}}_{(d)} ({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1) ^{\tiny {\text{ T }}}. \end{aligned}$$

Using the well-known fact that $\mathrm {vec}({\varvec{X}}{\varvec{Y}}{\varvec{Z}}) = ({\varvec{Z}}^{\tiny {\text{ T }}}\otimes {\varvec{X}}) \mathrm {vec}({\varvec{Y}})$,

$$\begin{aligned} \mathrm {vec}{\varvec{B}}_{(d)} = [({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1) {\varvec{G}}_{(d)} ^{\tiny {\text{ T }}}\otimes {\varvec{I}}_{p_d}] \mathrm {vec}({\varvec{B}}_d). \end{aligned}$$

Thus by the chain rule we have

$$\begin{aligned} {\varvec{J}}_d= & {} D{\varvec{B}}({\varvec{B}}_d) = D{\varvec{B}}({\varvec{B}}_{(d)}) \cdot D{\varvec{B}}_{(d)}({\varvec{B}}_d) = \varvec{\Pi }_d \frac{\partial \mathrm {vec}{\varvec{B}}_{(d)}}{\partial (\mathrm {vec}{\varvec{B}}_d) ^{\tiny {\text{ T }}}} \\= & {} \varvec{\Pi }_d \{[{\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1) {\varvec{G}}_{(d)} ^{\tiny {\text{ T }}}] \otimes {\varvec{I}}_{p_d} \}. \end{aligned}$$

Again by the chain rule, $D\eta ({\varvec{B}}_d) = D\eta ({\varvec{B}}) \cdot D{\varvec{B}}({\varvec{B}}_d) = (\mathrm {vec}{\varvec{X}}) ^{\tiny {\text{ T }}}{\varvec{J}}_d$. For the derivative in ${\varvec{G}}$, the duality Lemma 1 implies $\langle {\varvec{B}}, {\varvec{X}}\rangle = \langle {\varvec{G}}, \tilde{\varvec{X}}\rangle $ for $\tilde{\varvec{X}}= \llbracket {\varvec{X}}; {\varvec{B}}_1 ^{\tiny {\text{ T }}}, \ldots , {\varvec{B}}_D ^{\tiny {\text{ T }}}\rrbracket $. Then, by (4), we have

$$\begin{aligned} D\eta ({\varvec{G}}) = (\mathrm {vec}\tilde{\varvec{X}}) ^{\tiny {\text{ T }}}= (\mathrm {vec}{\varvec{X}}) ^{\tiny {\text{ T }}}({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_1). \end{aligned}$$

Combining these results gives the gradient displayed in Lemma 2.

Next we consider the Hessian $d^2 \eta $. Because ${\varvec{B}}$ is linear in ${\varvec{G}}$, the block ${\varvec{H}}_{{\varvec{G}},{\varvec{G}}}$ vanishes. For the block ${\varvec{H}}_{{\varvec{B}},{\varvec{B}}}$, the $(i_d,r_d,i_{d'},r_{d'})$-entry is

$$\begin{aligned} h_{(i_d,r_d),(i_{d'},r_{d'})}&= \sum _{j_1,\ldots ,j_D} x_{j_1,\ldots ,j_D} \frac{\partial ^2 b_{j_1,\ldots ,j_D}}{\partial \beta _{i_d}^{(r)} \partial \beta _{i_{d'}}^{(r')}} \\&= \sum _{j_1,\ldots ,j_D} x_{j_1,\ldots ,j_D} \sum _{s_1,\ldots ,s_D} g_{s_1,\ldots ,s_D} \frac{\partial ^2 \beta _{j_1}^{(s_1)} \cdots \beta _{j_D}^{(s_D)}}{\partial \beta _{i_d}^{(r)} \partial \beta _{i_{d'}}^{(r')}}. \end{aligned}$$

The second derivative in the summand is nonzero only if $j_d=i_d$, $j_{d'}=i_{d'}$, $s_d=r_d$, $s_{d'}=r_{d'}$, and $d \ne d'$. Therefore

$$\begin{aligned} h_{(i_d,r_d),(i_{d'},r_{d'})}&= 1_{\{d\ne d'\}} \sum _{j_d=i_d,j_{d'}=i_{d'}} x_{j_1,\ldots ,j_D} \sum _{s_d=r_d,s_{d'}=r_{d'}} g_{s_1,\ldots ,s_D} \prod _{d''\ne d, d'} \beta _{j_{d''}}^{(s_{d''})}. \end{aligned}$$

The first sum is over $\prod _{d''\ne d,d'}p_{d''}$ terms and the second term is over $\prod _{d''\ne d,d'} R_{d''}$ terms. A careful inspection reveals that the sub-block ${\varvec{H}}_{dd'}$ shares the same entries as the matrix

$$\begin{aligned} {\varvec{X}}_{(dd')} ({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_{d'+1} \otimes {\varvec{B}}_{d'-1} \otimes \cdots \otimes {\varvec{B}}_1) {\varvec{G}}_{(dd')} ^{\tiny {\text{ T }}}. \end{aligned}$$

Finally, for the ${\varvec{H}}_{{\varvec{G}},{\varvec{B}}}$ block, the $\{(r_1,\ldots ,r_D),(i_d,r_d)\}$-entry is

$$\begin{aligned} h_{(r_1,\ldots ,r_D),(i_{d},s_{d})}= & {} \sum _{j_1,\ldots ,j_D} x_{j_1,\ldots ,j_D} \frac{\partial ^2 b_{j_1,\ldots ,j_D}}{\partial g_{r_1,\ldots ,r_D} \partial \beta _{i_d}^{(s_d)}} \\= & {} \sum _{j_1,\ldots ,j_D} x_{j_1,\ldots ,j_D} \sum _{t_1,\ldots ,t_D} \frac{\partial ^2 g_{t_1,\ldots ,t_D} \beta _{j_1}^{(t_1)} \cdots \beta _{j_D}^{(t_D)}}{\partial g_{r_1,\ldots ,r_D} \partial \beta _{i_{d}}^{(s_d)}} \\= & {} \sum _{j_1,\ldots ,j_D} x_{j_1,\ldots ,j_D} \frac{\partial \beta _{j_1}^{(r_1)} \cdots \beta _{j_D}^{(r_D)}}{\partial \beta _{i_{d}}^{(s_d)}} \\= & {} 1_{\{r_d=s_d\}} \sum _{j_d=i_d} x_{j_1,\ldots ,j_D} \prod _{d'\ne d} \beta _{j_{d'}}^{(r_{d'})}, \end{aligned}$$

where the sum is over $\prod _{d' \ne d} p_{d'}$ terms. The sub-block ${\varvec{H}}_d \in \mathrm {I \! R} ^{\prod _d R_d \times p_dR_d} $ has at most $p_d \prod _d R_d$ nonzero entries. A close inspection suggests that the nonzero entries coincide with those in the matrix

$$\begin{aligned} {\varvec{X}}_{(d)} ({\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_{d+1} \otimes {\varvec{B}}_{d-1} \otimes \cdots \otimes {\varvec{B}}_1). \end{aligned}$$

1.4 Proof of Proposition 2

Since $\mu = b'(\theta )$, $d\mu /d\theta = b''(\theta ) = \sigma ^2 /a(\phi )$ and

$$\begin{aligned} \nabla \ell ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D)&= \frac{y - b'(\theta )}{a(\phi )} \frac{d\theta }{d\mu } \frac{d\mu }{d\eta } \nabla \eta ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D) \\&= \frac{(y - \mu ) \mu '(\eta )}{\sigma ^2} [{\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}(\mathrm {vec}{\varvec{X}}) \end{aligned}$$

by Lemma 2. Further differentiating shows

$$\begin{aligned}&d^2 \ell ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D) \\&\quad = - \frac{1}{\sigma ^2} \nabla \mu ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D) d\mu ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D) + \frac{y - \mu }{\sigma ^2} d^2 \mu ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D) \\&\quad = - \frac{[\mu '(\eta )]^2}{\sigma ^2} ([{\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}\mathrm {vec}{\varvec{X}}) ([{\varvec{B}}_D\\&\qquad \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}\mathrm {vec}{\varvec{X}}) ^{\tiny {\text{ T }}}\\&\qquad + \frac{(y-\mu )\theta ''(\eta )}{\sigma ^2} ([{\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}\mathrm {vec}{\varvec{X}}) ([{\varvec{B}}_D\\&\qquad \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}\mathrm {vec}{\varvec{X}}) ^{\tiny {\text{ T }}}\\&\qquad + \frac{(y-\mu )\theta '(\eta )}{\sigma ^2} d^2 \eta ({\varvec{B}}). \end{aligned}$$

It is easy to see that $\mathbf {E}[\nabla \ell ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D)] = \mathbf{0}$. Moreover, $\mathbf {E}[-d^2\ell ({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D)] = {\varvec{I}}({\varvec{G}}, {\varvec{B}}_1,\ldots ,{\varvec{B}}_D)$. Then (8) follows.

1.5 Proof of Proposition 3

The proof follows from a classical result [18] that states that, if $\theta _0$ be a regular point of the information matrix $I(\theta )$, then $\theta _0$ is locally identifiable if and only if $I(\theta _0)$ is nonsingular. The regularity assumptions are satisfied by Tucker regression model: (1) the parameter space $\mathcal {{\varvec{B}}}$ is open, (2) the density $p(y,{\varvec{x}}|{\varvec{B}})$ is proper for all ${\varvec{B}}\in \mathcal {{\varvec{B}}}$, (3) the support of the density $p(y,{\varvec{x}}|{\varvec{B}})$ is same for all ${\varvec{B}}\in \mathcal {{\varvec{B}}}$, (4) the log-density $\ell ({\varvec{B}}|y,{\varvec{x}}) = \ln p(y,{\varvec{x}}|{\varvec{B}})$ is continuously differentiable, and (5) the information matrix

$$\begin{aligned} {\varvec{I}}({\varvec{B}})&= [{\varvec{B}}_D \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] ^{\tiny {\text{ T }}}\left[ \sum _{i=1}^n \frac{\mu '(\eta _i)^2}{\sigma _i^2} (\mathrm {vec}\, {\varvec{x}}_i) (\mathrm {vec}\, {\varvec{x}}_i) ^{\tiny {\text{ T }}}\right] [{\varvec{B}}_D \\&\quad \otimes \cdots \otimes {\varvec{B}}_1 \, {\varvec{J}}_1 \ldots {\varvec{J}}_D] \end{aligned}$$

is continuous in ${\varvec{B}}$ by Proposition 2. Therefore ${\varvec{B}}\in \mathcal {{\varvec{B}}}$ is locally identifiable if and only if ${\varvec{I}}({\varvec{B}})$ is nonsingular.

1.6 Proof of Theorem 1

The asymptotics for tensor regression follow from the standard theory of M-estimation. The key observation is that the nonlinear part of tensor model (4) is a degree-$(D+1)$ polynomial of parameters ${\varvec{G}}$ and ${\varvec{B}}_d$ and the collection of polynomials $\{\langle {\varvec{B}}, {\varvec{X}}\rangle , {\varvec{B}}\in \mathcal {{\varvec{B}}}\}$ form a Vapnik–C̆ervonenkis (VC) class. Then the classical uniform convergence theory applies [21]. The arguments in [25] extends the classical argument for GLM [21, Example 5.40] to the CP tensor regression model. The same proof also applies to the Tucker model with little changes and thus is omitted here. For the asymptotic normality, we need to establish that the log-likelihood function of Tucker regression model is quadratic mean differentiable (q.m.d.) [13]. By a well-known result [13, Theorem 12.2.2] or [21, Lemma 7.6], it suffices to verify that the density is continuously differentiable in parameter for $\mu $-almost all x and that the Fisher information matrix exists and is continuous. The derivative of density is

$$\begin{aligned} \nabla p({\varvec{B}}_1,\ldots ,{\varvec{B}}_D) = \nabla e^{\ell ({\varvec{B}}_1,\ldots ,{\varvec{B}}_D)} = p({\varvec{B}}_1,\ldots ,{\varvec{B}}_D) \nabla \ell ({\varvec{B}}_1,\ldots ,{\varvec{B}}_D), \end{aligned}$$

which is well defined and continuous by Proposition 2. The same proposition shows that the information matrix exists and is continuous. Therefore the Tucker regression model is q.m.d. and the asymptotic normality follows from the classical result for q.m.d. families [21, Theorem 5.39].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Xu, D., Zhou, H. et al. Tucker Tensor Regression and Neuroimaging Analysis. Stat Biosci 10, 520–545 (2018). https://doi.org/10.1007/s12561-018-9215-6

Download citation

Received: 11 October 2016
Accepted: 28 February 2018
Published: 07 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s12561-018-9215-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tucker Tensor Regression and Neuroimaging Analysis

Abstract

Access this article

Similar content being viewed by others

Statistical Methods for Tensor Data Analysis

Dimension Reduction for Tensor Classification

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Proposition 1

1.3 Proof of Lemma 2

1.4 Proof of Proposition 2

1.5 Proof of Proposition 3

1.6 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tucker Tensor Regression and Neuroimaging Analysis

Abstract

Access this article

Similar content being viewed by others

Statistical Methods for Tensor Data Analysis

Dimension Reduction for Tensor Classification

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Proposition 1

1.3 Proof of Lemma 2

1.4 Proof of Proposition 2

1.5 Proof of Proposition 3

1.6 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation