Abstract
Highorder data are modeled using matrices whose entries are numerical arrays of a fixed size. These arrays, called tscalars, form a commutative ring under the convolution product. Matrices with elements in the ring of tscalars are referred to as tmatrices. The tmatrices can be scaled, added and multiplied in the usual way. There are tmatrix generalizations of positive matrices, orthogonal matrices and Hermitian symmetric matrices. With the tmatrix model, it is possible to generalize many wellknown matrix algorithms. In particular, the tmatrices are used to generalize the singular value decomposition (SVD), highorder SVD (HOSVD), principal component analysis (PCA), twodimensional PCA (2DPCA) and Grassmannian component analysis (GCA). The generalized tmatrix algorithms, namely TSVD, THOSVD, TPCA, T2DPCA and TGCA, are applied to lowrank approximation, reconstruction and supervised classification of images. Experiments show that the tmatrix algorithms compare favorably with standard matrix algorithms.
Introduction
In data analysis, machine learning and computer vision, the data are often given in the form of multidimensional arrays of numbers. For example, an RGB image has three dimensions, namely two for the pixel array and a third dimension for the values of the pixels. An RGB image is said to be an array of order three. Alternatively, the RGB image is said to have three modes or to be threeway. A video sequence of images is of order four, with two dimensions for the pixel array, one dimension for time and a fourth dimension for the pixel values.
One way of analyzing multidimensional data is to remove the array structure by flattening, to obtain a vector. A set of vectors obtained in this way can be analyzed using standard matrixvector algorithms such as the singular value decomposition (SVD) and principal component analysis (PCA). An alternative to flattening is to use algorithms that preserve the multidimensional structure. In these algorithms, the elements of matrices and vectors are entire arrays rather than real numbers in \({\mathbb {R}}\) or complex numbers in \({\mathbb {C}}\). Multidimensional arrays with the same dimensions can be added in the usual way, but there is no definition of multiplication which satisfies the requirements for a field such as \({\mathbb {R}}\) or \({\mathbb {C}}\). However, multiplication based on the convolution product has many but not all of the properties of a field. Convolution multiplication differs from the multiplication in a field in that many elements have no multiplicative inverse. The multidimensional arrays with given dimensions form a commutative ring under the convolution product. The elements of this ring are referred to as tscalars.
An application of the Fourier transform shows that each ring of tscalars under the convolution product is isomorphic to a ring of arrays in which the Hadamard product defines the multiplication. In effect, the ring obtained by applying the Fourier transform splits into a product of copies of \({\mathbb {C}}\). It is this splitting which allows the construction of new algorithms for analyzing tensorial data without flattening. The socalled tmatrices with tscalar entries have many of the properties of matrices with elements in \({\mathbb {R}}\) or \({\mathbb {C}}\). In particular, tmatrices can be scaled, added and multiplied. There is an additive identity and a multiplicative identity. The determinant of a tmatrix is defined, and a given tmatrix is invertible if and only if it has an invertible determinant. The tmatrices include generalizations of positive matrices, orthogonal matrices and symmetric matrices.
A tensorial version, TSVD, of the SVD is described in [18] and [41]. The TSVD expresses a tmatrix as the product of three tmatrices, of which two are generalizations of the orthogonal matrices and one is a diagonal matrix with positive tscalars on the diagonal. The TSVD is used to define tensorial versions of principal component analysis (PCA) and twodimensional PCA (2DPCA). A tensorial version of Grassmannian component analysis is also defined. These tensorial algorithms are tested by experiments that include lowrank approximations to tensors, reconstruction of tensors and terrain classification using hyperspectral images. The different algorithms are compared using the peak signaltonoise ratio and Cohen’s kappa.
The tscalars are described in Sect. 2, and the tmatrices are described in Sect. 3. The TSVD is described in Sect. 4. A tensorial version of principal component analysis (TPCA) is obtained from the TSVD in Sect. 5 and then generalized to tensorial twodimensional PCA (T2DPCA). A tensorial version of Grassmannian components analysis is also defined. The tensorial algorithms are tested experimentally in Sect. 6. Some concluding remarks are made in Sect. 7.
Related Work
A tensor of order two or more can be simplified using the socalled Nmode singular value decomposition (SVD). The threemode case is described by Tucker in [30]. The multimodal case is discussed in detail by De Lathauwer et al. [6]. Each mode of the tensor has an associated set of vectors, each one of which is obtained by varying the index for the given mode while keeping the indices of the other modes fixed. In the Nmode SVD, an orthonormal basis is obtained for the space spanned by these vectors. In the twomode case, the result is the usual SVD. The resulting decomposition of a tensor is referred to as the higherorder SVD (HOSVD). Surveys of tensor decompositions can be found in Kolda and Bader [19] and Sidiropoulos et al. [27]. De Lathauwer et al. [6] describe a higherorder eigenvalue decomposition. Vasilescu and Terzopoulos [34] use the Nmode SVD to simplify a fifthorder tensor constructed from face images taken under varying conditions and with varying expressions. A tensor version of the singular value decomposition is described in [18, 41], and [17].
He et al. [15] sample a hyperspectral data cube to yield tensors of order three of which two orders are for the pixel array and one order is for the hyperspectral bands. A training set of samples is used to produce a dictionary for sparse classification. Lu et al. [23] use Nmode analysis to obtain projections of tensors to a lowerdimensional space. The resulting multilinear PCA is applied to the classification of gait images. Vannieuwenhoven et al. [32] describe a new method for truncating the higherorder SVD, to obtain lowrank multilinear approximations to tensors. The method is tested on the classification of handwritten digits and the compression of a database of face images.
Many authors have studied algebras of matrices in which the elements are tensors of order one, equipped with a convolution multiplication, under which they form a commutative ring R with a multiplicative identity. In particular, Gleich et al. [11] describe the generalized eigenvalues and eigenvectors of matrices with elements in R and show how the standard power method for finding an eigenvector and the standard Arnoldi method for constructing an orthogonal basis for a Krylov subspace can both be generalized. Braman [3] shows that the tvectors with a given dimension form a free module over R. Kilmer and Martin [18] show that many of the properties and structures of canonical matrices and vectors can be generalized. Their examples include transposition, orthogonality and the singular value decomposition (SVD). The tensor SVD is used to compress tensors. A tensorbased method for image deblurring is also described. Kilmer et al. [17] generalize the inner product of two vectors, suggest a notion of the angle between two vectors with elements in R and define a notion of orthogonality for two vectors. A generalization of the Gram–Schmidt method for generating an orthonormal set of vectors is also described in [17].
Zhang et al. [41] use the tensor SVD to store video sequences efficiently and also to fill in missing entries in video sequences. Zhang et al. [39] use a randomized version of the tensor SVD to produce lowrank approximations to matrices. Ren et al. [28] define a tensor version of principal component analysis and use it to extract features from hyperspectral images. The features are classified using standard methods such as support vector machines and nearest neighbors. Liao et al. [20] generalize a sparse representation classifier to tensor data and apply the generalized classifier to image data such as numerals and faces. Chen et al. [4] use a fourdimensional HOSVD to detect changes in a time sequence of hyperspectral images. The Kmeans clustering algorithm is used to classify the pixel values as changed or unchanged. Fan et al. [8] model a hyperspectral image as the sum of an ideal image, a sparse noise term and a Gaussian noise term. A product of two lowrank tensors models the ideal image. The lowrank tensors are estimated by minimizing a penalty function obtained by adding the squared errors in a fit of the hyperspectral image to penalty terms for the sparse noise and the sizes of the two lowrank tensors. Lu et al. [22] approximate a thirdorder tensor using the sum of a lowrank tensor and a sparse tensor. Under suitable conditions, the lowrank tensor and the sparse tensor are recovered exactly.
TScalars
The notations for tscalars are summarized in Sect. 2.1. Basic definitions are given in Sect. 2.2. The Fourier transform of a tscalar is defined in Sect. 2.3. Properties of tscalars and the Fourier transform of a tscalar are described in Sect. 2.4. A generalization of the tscalars is described in Sect. 3.5.
Notations and Preliminaries
An array of order N over the complex numbers \({\mathbb {C}}\) is an element of the set C defined by \(C\equiv {\mathbb {C}}^{I_{1}\times \cdots \times I_{N}}\), where the \(I_{n}\) for \(1\le n \le N\) are strictly positive integers. Similarly, an array of order N over the real numbers is an element of the set R defined by \(R\equiv {\mathbb {R}}^{I_{1}\times \cdots \times I_{N}}\). The sets R and C have the structure of commutative rings, in which the product is defined by circular convolution. The elements of C and R are referred to as tscalars.
Elements of \({\mathbb {R}}\) and \({\mathbb {C}}\) are denoted by lowercase letters and tensorial data are denoted by uppercase letters. The tscalars are identified using the subscript T, for example \(X_{T}\). Lowercase subscripts such as i, j, \(\alpha \), \(\beta \) are indices or lists of indices.
All indices begin from 1 rather than 0. Given an array of any order N, namely \(X \in {\mathbb {C}}^{I_1\times I_2 \times \cdots \times I_N}\) (\(N \geqslant 1 \)), \(X_{i_1, i_2, \ldots , i_N}\) or \((X)_{i_1, i_2, \ldots , i_N}\) denote its \((i_1, i_2, \ldots , i_N)\)th entry in \({\mathbb {C}}\). The notation \(X_{i}\), or \((X)_{i}\), is also used, where i is a multiindex defined by \(i = (i_{1},\ldots , i_{N})\). Let \(I=(I_{1},I_{2}, \ldots , I_{N})\) and let i be a multiindex. The notation \(1\le i\le I\) specifies the range of values of i such that \(1\le i_{n}\le I_{n}\) for \(1\le n\le N\). It is often convenient to extend the indexing beyond the range specified by I. Let j be a general multiindex. Then, \(X_{j}\) is defined by \(X_{j} = X_{i}\), where i is the multiindex such that each component \(i_{n}\) is in the range \(1\le i_{n}\le I_{n}\) and \(i_{n}j_{n}\) is divisible by \(I_{n}\). A multiindex such as \(ij+1\) has components \(i_{n}j_{n}+1\) for \(1\le n\le N\). The sum \(\sum \nolimits _{i=1}^{I} (\cdot )\) is an abbreviation for \( \sum \nolimits _{i_{1}=1}^{I_{1}}\cdots \sum _{i_{N}=1}^{I_{N}} (\cdot ). \)
Definitions
The following definitions are for tscalars in C. Similar definitions can be made for tscalars in R.
Definition 1
Tscalar addition. Given tscalars \(X_{T}\) and \(Y_{T} \) in C, the addition of \(X_{T}\) and \(Y_{T} \) denoted by \(D_{T} \doteq X_{T} + Y_{T} \) is elementwise:
Definition 2
Tscalar multiplication. Given tscalars \(X_{T}\) and \(Y_{T}\) in C, their product, denoted by \(D_{T}= X_{T} \circ Y_{T} \), is a tscalar in C defined by the circular convolution
Definitions 1 and 2 reduce to complex number addition and multiplication when \(N = 1\) and \(I_{1} = 1\).
Definition 3
Zero tscalar. The zero tscalar \(Z_{T}\) is the array in C defined by
For all tscalars \(X_{T}\), \(X_{T} + Z_{T} = X_{T} \) and \(X_{T} \circ Z_{T} = Z_{T}\).
Definition 4
Identity tscalar. The identity tscalar \(E_{T}\) in C has the first entry equal to 1 and all other entries equal to 0, namely \(E_{T,i} = 1\) if \(i = (1, \ldots , 1)\) and \(E_{T,i} = 0\) otherwise.
For all tscalars \(X_{T} \in C\), \(X_{T} \circ E_{T} \equiv X_{T}\).
The set of tscalars satisfies the axioms of a commutative ring with \(Z_{T}\) as an additive identity and \(E_{T}\) as a multiplicative identity. This ring of tscalars is denoted by \((C, +, \circ )\). The ring \((C, +, \circ )\) is a generalization of the field \(({\mathbb {C}}, +, \cdot )\) of complex numbers. If the tscalars are restricted to have real number elements, then the ring \((R, +, \circ )\) is obtained.
Fourier Transform of a TScalar
Let \(\zeta _{n}\) be a primitive \(I_{n}\)th root of unity, for example,
Let \({\overline{\zeta }}_{n}\) be the complex conjugate of \(\zeta _{n}\), and let \(X_{T}\) be a tscalar in the ring C. The Fourier transform \(F(X_{T})\) of \(X_{T}\) is defined by
for all indices \(1\le i\le I\).
The inverse of the Fourier transform is defined by
for all indices \(1\le i\le I\).
Given tscalars \(X_T \in C\) and \(Y_{T} \in C\) and their tscalar product \(D_{T} = X_T \circ Y_T \), it follows that
where \(*\) denotes the Hadamard product in C. Equation (4) is an extension of the convolution theorem [2]. The equation can be equivalently rewritten as
where \(\cdot \) is multiplication in \({\mathbb {C}}\).
An equivalent definition of the Fourier transform of a highorder array in the form of multimode tensor multiplication and a diagram of the multiplication of two tscalars, computed in the Fourier domain, is given in a supplementary file.
It is not difficult to prove that C is a commutative ring, \((C,+, *)\), under the Hadamard product. The Fourier transform is a ring isomorphism from \((C, +, \circ )\) to \((C, +, *)\). The identity element of \((C, +, *)\) is \(J_{T} = F(E_{T})\). All the entries of \(J_{T}\) are equal to 1.
Properties of TScalars
The invertible tscalars are defined as follows.
Definition 5
Invertible tscalar: Given a tscalar \(X_{T}\), if there exists a tscalar \(Y_{T}\) satisfying \(X_{T} \circ Y_{T} = E_{T}\), then \(X_{T}\) is said to be invertible. The tscalar \(Y_{T}\) is the inverse of \(X_{T}\) and denoted by \(Y_{T} \doteq X_{T}^{1} \doteq E_{T} / X_{T}\;. \)
The zero tscalar \(Z_{T}\) is noninvertible. In addition, there are an infinite number of tscalars that are noninvertible. For example, given a tscalar \(X_{T} \in C\), if the entries of \(X_{T}\) are all equal, then \(X_{T}\) is noninvertible. The existence of more than one noninvertible element shows that C is not a field.
Definition 6
Scalar multiplication of a tscalar. Given a scalar \(\lambda \in {\mathbb {C}}\) and a tscalar \(X_{T} \in C\), their product, denoted by \(Y_{T} = \lambda \cdot X_{T} \equiv X_{T} \cdot \lambda \), is the tscalar given by
It can be shown that the set of tscalars is a vector space over \({\mathbb {C}}\).
The following definition of the conjugate of a tscalar generalizes the conjugate of a complex number.
Definition 7
Conjugate of a tscalar. Given a tscalar \(X_{T}\) in C, its conjugate, denoted by \({\text {conj}}(X_{T}) \), is the tscalar in C such that
where \(\overline{X_{T, 2i}}\) is the complex conjugate of \(X_{T, 2i}\) in \({\mathbb {C}}\).
The conjugate of a tscalar reduces to the conjugate of a complex number when \(N = 1\), \(I_{1} = 1\). The relationship of \({\text {conj}}(X_{T})\) and \(X_{T}\) is much clearer if they are mapped onto the Fourier domain—each entry of \(F({\text {conj}}(X_{T}))\) is the complex conjugate of the corresponding entry of \(F(X_{T})\), namely
It follows from Eq. (7) that \({\text {conj}}({\text {conj}}(X_{T}) ) = X_{T} \) for any \(X_{T} \in C\).
Definition 8
Selfconjugate tscalar: Given a tscalar \(X_{T} \in C\), if \(X_{T} = {\text {conj}}(X_{T}) \), then \(X_{T}\) is said to be a selfconjugate tscalar.
If \(X_{T}\) is selfconjugate, then
It follows from Eq. (9) that \(X_{T}\) is selfconjugate if and only if all the elements of \(F(X_{T})\) are real numbers.
The tscalars \(Z_{T}\) and \(E_{T}\) are both selfconjugate. Furthermore, the selfconjugate tscalars form a ring denoted by \(C^{sc}\). This ring is a subring of C.
Given any tscalar \({X}_{T} \in C\), let \(\mathfrak {R}({X}_{T})\) and \(\mathfrak {I}({X}_{T})\) be defined by
It follows from Eq. (9) that \(\mathfrak {R}(X_{{T}})\) and \(\mathfrak {I}(X_{{T}})\) are selfconjugate. The tscalars \(X_{T} \in C\) and \({\text {conj}}(X_{T}) \in C\) can be expressed in the form
In an analogy with the real and imaginary parts of a complex number, \(\mathfrak {R}(X_{T})\) is called the real part of \(X_{T}\) and \(\mathfrak {I}(X_{{T}})\) is called the imaginary part of \(X_{T}\).
Given two tscalars \(X_{T}\) and \(Y_{T}\), Eq. (14) holds true and is backward compatible with the corresponding equations for complex numbers.
Definition 9
Nonnegative tscalar: The tscalar \(X_{T}\) is said to be nonnegative if there exists a selfconjugate tscalar \(Y_{T}\) such that \(X_{T} = Y_{T} \circ Y_{T} \doteq Y_{T}^{2}\).
If a tscalar \(X_{T}\) is nonnegative, it is also selfconjugate, because the multiplication of any two selfconjugate tscalars is also a selfconjugate tscalar. Thus, both \(Z_{T}\) and \(E_{T}\) are nonnegative, since \(Z_{T}\) and \(E_{T}\) are selfconjugate tscalars and satisfy \(Z_{T} = Z_{T}^{2}\) and \(E_{T} = E_{T}^{2}\). Furthermore, for all \(X_{T} \in C\), the ring element \(\mathfrak {R}(X_{T})^{2} + \mathfrak {I}(X_{T})^{2}\) is nonnegative.
The set \(S^{\mathrm{nonneg}}\) of nonnegative tscalars is closed under the tscalar addition and multiplication. Since a nonnegative tscalar is also a selfconjugate tscalar, \(S^{\mathrm{nonneg}} \subset C^{sc} \subset C\).
Theorem 1
For all tscalars \(X_{T} \in S^{\mathrm{nonneg}}\), there exists a unique tscalar \(S_{T} \in S^{\mathrm{nonneg}}\) satisfying \(X_{T} = S_{T} \circ S_{T} \doteq S_{T}^{2}\). We call the nonnegative tscalar \(S_{T}\) the arithmetic square root of the nonnegative tscalar \(X_{T}\) and denote it by
Proof
Let \(X_{{T}}=Y_{{T}} \circ Y_{{T}}\), such that \(Y_{{T}}\) is selfconjugate. On applying the Fourier transform, it follows that
Let \(S_{T}\) be defined such that
where the nonnegative square root is chosen for each value of i. The Fourier components \(F(S_{{T}})_{i}\) are realvalued; thus, \(S_{{T}}\) is selfconjugate. The equation \(X_{{T}} = S_{{T}} \circ S_{{T}}\) holds because the Fourier transform is injective. \(\square \)
Definition 10
A nonnegative tscalar that is invertible under multiplication is called a positive tscalar. The set of positive tscalars is denoted by \(S^{{\mathrm{pos}}}\).
The following inclusions are strict, \(S^{{\mathrm{pos}}} \subset S^{{\mathrm{nonneg}}} \subset C^{{\mathrm{sc}}} \subset C\). The inverse and the arithmetic square root of a positive tscalar are positive.
The absolute tvalue \(r(X_{{T}})\) of \(X_{{T}}\) is defined by
The tscalars \(\mathfrak {R}(X_{{T}})\) and \(\mathfrak {I}(X_{{T}})\) are both selfconjugate; therefore, \(\mathfrak {R}(X_{{T}})^{2}\) and \(\mathfrak {I}(X_{{T}})^{2}\) are both nonnegative. The sum \(\mathfrak {R}(X_{{T}})^{2}+\mathfrak {I}(X_{{T}})^{2}\) is nonnegative, and it has a nonnegative arithmetical square root, namely \(r(X_T)\).
If \(r(X_{{T}})\) is invertible, then let \(\phi (X_{T})\) be defined by
The ring element \(\phi (X_{{T}})\) is a generalized angle. The order 1 version of \(\phi (X_{{T}})\) is obtained by Gleich et al. [11]. Equation (17) generalizes the polar form of a complex number. It can be shown that
The absolute tvalue \(r(X_{{T}})\) is used in Sect. 3 to define a generalization of the Frobenius norm for tmatrices.
Matrices with TScalar Elements
It is shown that tmatrices, i.e., matrices with elements in the rings C or R, are in many ways analogous to matrices with elements in \({\mathbb {C}}\) or \({\mathbb {R}}\).
Indexing
The tmatrices are ordertwo arrays of tscalars. Since the tscalars are arrays of complex numbers, it is convenient to organize tmatrices as hierarchical arrays of complex numbers.
Let \(X_{\mathrm{TM}}\) be a tmatrix with \(D_1\) rows and \(D_2\) columns. Then, \(X_{\mathrm{TM}}\) is an element of \(C^{D_1\times D_2}\). The \((\alpha , \beta )\) entry of \(X_{\mathrm{TM}}\) is the element of C denoted by \(X_{{\mathrm{TM}},\alpha ,\beta }\) for \(1\le \alpha \le D_1\) and \(1\le \beta \le D_2\). Let i be a multiindex for elements of C. Then, \(X_{{\mathrm{TM}}, i, \alpha ,\beta }\) is the element of \({\mathbb {C}}\) given as the ith entry of the ring element \(X_{{\mathrm{TM}},\alpha ,\beta }\).
The tmatrix \(X_{\mathrm{TM}}\) can be interpreted as an element in \({\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times D_2}\), or alternatively it can be interpreted as an element in \({\mathbb {C}}^{D_1\times D_2\times I_1\times \cdots \times I_N}\). The only thing needed to switch from one data structure to the other is a permutation of indices. The data structure \({\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times D_2}\) is chosen unless otherwise indicated.
Properties of tMatrices

(1)
Tmatrix addition: Given any tmatrices \({A}_{\mathrm{TM}} \in C^{D_1 \times D_2}\) and \({B}_{\mathrm{TM}} \in C^{D_1\times D_2}\), the addition, denoted by \({C}_{\mathrm{TM}} \doteq {A}_{\mathrm{TM}} + {B}_{\mathrm{TM}} \in C^{D_1\times D_2}\), is entrywise, such that \( C_{\mathrm{TM},\alpha ,\beta } = A_{\mathrm{TM},\alpha ,\beta } + B_{\mathrm{TM},\alpha ,\beta }\), for \(1\le \alpha \le D_1\) and \(1\le \beta \le D_2\).

(2)
Tmatrix multiplication: Given any tmatrices \({A}_{\mathrm{TM}} \in C^{D_1\times Q}\) and \({B}_{\mathrm{TM}} \in C^{Q\times D_2}\), their product, denoted by \({C}_{\mathrm{TM}} \doteq {A}_{\mathrm{TM}} \circ {B}_{\mathrm{TM}} \), is the tmatrix in \(C^{D_1\times D_2}\) defined by
$$\begin{aligned} \begin{matrix} C_{\mathrm{TM},\alpha ,\beta } = \sum \nolimits _{\gamma =1}^{Q} A_{\mathrm{TM},\alpha ,\gamma } \circ B_{\mathrm{TM},\gamma , \beta } \end{matrix} \end{aligned}$$for all indices \(1\le \alpha \le D_1, 1\le \beta \le D_2\).
An example of tmatrix multiplication \(C_{\mathrm{TM}} = A_{\mathrm{TM}} \circ B_{\mathrm{TM}}\)\(\in \)\(C^{2\times 1} \equiv {\mathbb {C}}^{3\times 3\times 2\times 1}\) where \(A_{\mathrm{TM}} \in \)\(C^{2\times 2} \equiv {\mathbb {C}}^{3\times 3\times 2\times 2}\) and \(B_{\mathrm{TM}} \in \)\(C^{2\times 1} \equiv {\mathbb {C}}^{3\times 3\times 2\times 1}\) is given in a supplementary file.

(3)
Identity tmatrix: The identity tmatrix is the diagonal tmatrix, in which each diagonal entry is equal to the identity tscalar \(E_{T}\) in Definition 4. The \(D\times D\) identity tmatrix is denoted by \(I_{\mathrm{TM}}^{(D)} \doteq {\text {diag}}(\underset{D}{\underbrace{E_{T},\cdots ,E_{T}}}) \).
Given any \(X_{\mathrm{TM}} \in C^{D_1\times D_2}\), it follows that \(I_{\mathrm{TM}}^{(D_1)} \circ X_{\mathrm{TM}} = X_{\mathrm{TM}} \circ I_{\mathrm{TM}}^{(D_2)} = X_{\mathrm{TM}}\). The identity tmatrix \(I_{\mathrm{TM}}^{(D)}\) is also denoted by \(I_{\mathrm{TM}}\) if the value of D can be inferred from context.

(4)
Scalar multiplication: Given any \({A}_{\mathrm{TM}} \in C^{D_1\times D_2}\) and \(\lambda \in {\mathbb {C}}\), their multiplication, denoted by \({B}_{\mathrm{TM}} \doteq \lambda \cdot {A}_{\mathrm{TM}} \), is the tmatrix in \(C^{D_1\times D_2}\) defined by
$$\begin{aligned} B_{{\mathrm{TM}},\alpha ,\beta } = \lambda \cdot A_{{\mathrm{TM}},\alpha ,\beta },~~1\le \alpha \le D_1, 1\le \beta \le D_2, \end{aligned}$$where the products with \(\lambda \) are computed as in Definition 6.

(5)
Tscalar multiplication: Given any \({A}_{\mathrm{TM}} \in C^{D_1\times D_2}\) and \(\lambda _{T} \in C\), their product, denoted by \({B}_{\mathrm{TM}} \doteq \lambda _{T} \circ {A}_{\mathrm{TM}} \), is the tmatrix in \(C^{D_1\times D_2}\) defined by
$$\begin{aligned} B_{{\mathrm{TM}},\alpha ,\beta } = \lambda _{T} \circ A_{{\mathrm{TM}},\alpha ,\beta },~~1\le \alpha \le D_1, 1\le \beta \le D_2. \end{aligned}$$ 
(6)
Conjugate transpose of a tmatrix: Given any tmatrix \(X_{\mathrm{TM}} \in C^{D_1\times D_2}\), its conjugate transpose, denoted by \(X_{\mathrm{TM}}^{{\mathcal {H}}}\), is the tmatrix in \(C^{D_2\times D_1}\) given by
$$\begin{aligned}&X_{{\mathrm{TM}},\beta ,\alpha }^{{\mathcal {H}}} = {\text {conj}}( X_{{\mathrm{TM}},\alpha ,\beta })\in C,\\&\quad ~~1\le \alpha \le D_1, 1\le \beta \le D_2. \end{aligned}$$A square matrix \(U_{\mathrm{TM}}\) is said to be orthogonal if \(U_{\mathrm{TM}}^{{\mathcal {H}}}\) is the inverse tmatrix of \(U_{\mathrm{TM}}\), i.e., \(U_{\mathrm{TM}}^{{\mathcal {H}}} \circ U_{\mathrm{TM}} = U_{\mathrm{TM}} \circ U_{\mathrm{TM}}^{{\mathcal {H}}} = I_{\mathrm{TM}}\). The Fourier transform F is extended to tmatrices elementwise, i.e., \(F(X_{\mathrm{TM}})\) is the \(D_1\times D_2\) tmatrix defined by
$$\begin{aligned} F(X_{\mathrm{TM}})_{\alpha , \beta } = F(X_{{\mathrm{TM}}, \alpha , \beta })\;\;, \end{aligned}$$(18)for all indices \(1 \le \alpha \le D_1\) and \(1 \le \beta \le D_2\).
It is not difficult to prove that
$$\begin{aligned} F(X_{\mathrm{TM}}^{{\mathcal {H}}})_{i,\beta ,\alpha } = \overline{F(X_{\mathrm{TM}})_{i, \alpha ,\beta }} \in {\mathbb {C}}~, \end{aligned}$$for all indices \(1\le i\le I, 1\le \alpha \le D_1, 1\le \beta \le D_2\).

(7)
Tvector dot product and the Frobenius norm: Given any two tvectors (i.e., two tmatrices, each having only one column) \(X_{\mathrm{TV}} \) and \(Y_{\mathrm{TV}} \) of the same length D, their dot product is the tscalar defined by
$$\begin{aligned} \langle X_{\mathrm{TV}}, Y_{\mathrm{TV}} \rangle \doteq \sum _{\alpha =1}^{D} {\text {conj}}(X_{\mathrm{TV},\alpha }) \circ Y_{{\mathrm{TV}},\alpha }\;\;. \end{aligned}$$If \(\langle X_{\mathrm{TV}}, Y_{\mathrm{TV}} \rangle = Z_{T}\), then \(X_{\mathrm{TV}}\) and \(Y_{\mathrm{TV}}\) are said to be orthogonal. The nonnegative tscalar \(\sqrt{\langle X_{\mathrm{TV}}, X_{\mathrm{TV}} \rangle }\) is called the generalized norm of \(X_{\mathrm{TV}}\) and denoted by
$$\begin{aligned} \Vert X_{\mathrm{TV}}\Vert _{F} \doteq \sqrt{\langle X_{\mathrm{TV}}, X_{\mathrm{TV}} \rangle } \equiv \left( {\sum \limits _{\alpha =1}^{D}r(X_{{\mathrm{TV}},\alpha })^{2}}\right) ^{1/2}, \end{aligned}$$(19)where \(r(\cdot )\) is the absolute tvalue as defined by Eq. (16). The generalized Frobenius norm of a \(D_{1}\times D_{2}\) tmatrix \(W_{\mathrm{TM}}\) is defined by
$$\begin{aligned} \Vert W_{\mathrm{TM}}\Vert _{F} \doteq \left( {\sum \limits _{\alpha =1}^{D_{1}}\sum \limits _{\beta =1}^{D_{2}}r(W_{{\mathrm{TM}}, \alpha ,\beta })^{2}}\right) ^{1/2}. \end{aligned}$$(20)In order to have a mechanism to connect tmatrices with matrices with elements in \({\mathbb {C}}\) or \({\mathbb {R}}\), the slices of a tmatrix are defined as follows.

(8)
Slice of a tmatrix: Any tmatrix \(X_{\mathrm{TM}} \in C^{D_1\times D_2}\), organized as an array in \({\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times D_2}\), can be sliced into \(\prod \nolimits _{n=1}^{N}I_{n}\) matrices in \({\mathbb {C}}^{D_1\times D_2}\), indexed by the multiindex i. Let \(X_{\mathrm{TM}}(i) \in {\mathbb {C}}^{D_1\times D_2}\) be the ith slice. The entries of \(X_{\mathrm{TM}}(i)\) are complex numbers in \({\mathbb {C}}\) given by
$$\begin{aligned} (X_{\mathrm{TM}}(i))_{\alpha ,\beta } = X_{{\mathrm{TM}},i, \alpha ,\beta } \in {\mathbb {C}}~~ \end{aligned}$$for all indices \(1\le i\le I, 1\le \alpha \le D_1, 1\le \beta \le D_2\).
The tvectors with a given dimension form an algebraic structure called a module over the ring C [16]. Modules are generalizations of vector spaces [17]. The tvector whose entries are all equal to \(Z_{T}\) is denoted by \(Z_{\mathrm{TV}}\) and called the zero tvector. The next step is to define what is meant by a set of linearly independent tvectors and what is meant by a full column rank tmatrix.

(9)
Linear independence in tvector module: The tvectors in a subset \(\{X_{{\mathrm{TV}}, 1}, X_{{\mathrm{TV}}, 2},\ldots , X_{{\mathrm{TV}}, K} \}\) of a tvector module are said to be linearly independent if the equation \( \sum \nolimits _{k=1}^{K}\lambda _{T,k} \circ X_{{\mathrm{TV}},k} = Z_{\mathrm{TV}} \) holds true if and only if \(\lambda _{T,k} = Z_{T}\), \(1\le k\le K\).
If the tvectors \(X_{{\mathrm{TV}},i}\), \(1\le i\le K\), are linearly independent, then they are said to have a rank of K. If the tvectors \(Y_{{\mathrm{TV}},i}\) for \(1\le i\le K'\) are linearly independent and span the same submodule as the \(X_{{\mathrm{TM}},i}\), then \(K = K'\). For further information, see [16].

(10)
Full column rank tmatrix: A tmatrix is said to be of full column rank if all its column tvectors are linearly independent.
TMatrix Analysis Via the Fourier Transform
The Fourier transform of the tmatrix \(X_{\mathrm{TM}} \in {C}^{D_1\times D_2}\) is the tmatrix in \(C^{D_1\times D_2}\) given by Eq. (18).
Many tmatrix computations can be carried out efficiently using the Fourier transform. For example, any multiplication \(C_{\mathrm{TM}} = X_{\mathrm{TM}} \circ Y_{\mathrm{TM}} \in {\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times D_2}\), where \(X_{\mathrm{TM}} \in {\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times Q}\), \(Y_{\mathrm{TM}} \in {\mathbb {C}}^{I_1\times \cdots \times I_N\times Q\times D_2}\), can be decomposed to \(\prod \nolimits _{n=1}^{N}I_{n}\) matrix multiplications over the complex numbers, namely
for all indices \(1\le i\le I, 1\le \alpha \le D_1, 1\le \beta \le D_2\).
The conjugate transpose \(X_{\mathrm{TM}}^{{\mathcal {H}}} \in \)\({\mathbb {C}}^{I_1\times \cdots \times I_N\times D_2\times D_1}\) of any tmatrix \(X_{\mathrm{TM}} \in {\mathbb {C}}^{I_1\times \cdots \times I_N\times D_1\times D_2}\) can be decomposed to \(\prod \nolimits _{n=1}^{N}I_{n}\) canonical conjugate transposes of matrices:
for all indices \(1\le i\le I,1\le \alpha \le D_1,1\le \beta \le D_2\). Each slice of \(F\left( I_{\mathrm{TM}}^{(D)}\right) \) is the canonical identity matrix with elements in \({\mathbb {C}}\).
The Fourier transform decomposes a tmatrix computation such as multiplication to \(\prod \nolimits _{n=1}^{N}I_{n}\) independent complex matrix computations in the Fourier domain. The ith (\(1 \le i \le I\)) computation involves only the ith slices of the associated tmatrices. This fact underlies an approach for speedingup tmatrix algorithms using parallel computations. This independence of the data in the Fourier domain makes it possible to implement parallel computing using the socalled vectorization programming (also known as array programming), which is supported by many programming languages including MATLAB, R, NumPy, Julia and Fortran.
Pooling
Sometimes, it is necessary to have a pooling mechanism to transform tscalars to scalars in \({\mathbb {R}}\) or \({\mathbb {C}}\). Given any tscalar \(X_{T} \in C\), its pooling result \(P(X_{T}) \in {\mathbb {C}}\) is defined by
The pooling operation for tmatrices transforms each tscalar entry to a scalar. More formally, given any tmatrix \(Y_{\mathrm{TM}} \in C^{D_1\times D_2}\), its pooling result \(P(Y_{\mathrm{TM}})\) is by definition the matrix in \({\mathbb {C}}^{D_1\times D_2}\) given by
The pooling of tvectors is a special case of Eq. (24).
Generalized Tensors
Generalized tensors, called gtensors, generalize tmatrices and canonical tensors. The generalized tensors defined in this section are used to construct the higherorder TSVD in Sect. 4.2. A gtensor, denoted by \(X_{GT} \in C^{D_1\times D_2 \times \cdots \times D_M}\), is a generalized tensor with tscalar entries (i.e., an orderM array of tscalars). Its tscalar entries are indexed by \((X_{GT})_{\alpha _1,\ldots ,\alpha _M}\). Then, a generalized modek multiplication of \(X_{GT}\), denoted by \(M_{GT} \doteq X_{GT} ~ \circ _{k}~ Y_{\mathrm{TM}}\) where \(Y_{\mathrm{TM}} \in C^{J\times D_k}\) and \(1 \le k \le M\), is a gtensor in \(C^{D_1\times \cdots \times D_{k1} \times J \times D_{k+1} \times \cdots \times D_{M} }\) defined as follows:
The generalized modekflattening of a gtensor \(X_{GT} \in C^{D_1\times D_2 \times \cdots \times D_M}\) is an \((K_1, K_2)\)reshaping where \(K_1 = \{k\}\) and \(K_2 = \{1,\ldots ,M\} \setminus \{k\}\). The result is a tmatrix in \(C^{D_{k} \times D_k^{1}\cdot {\prod \nolimits _{m=1}^{M} D_{m}}}\). Each column of the matrix is obtained by holding the indices in \(K_{2}\) fixed and varying the index in \(K_{1}\).
The generalized modek multiplication defined in Eq. (25) can also be expressed in terms of unfolded gtensors:
where \(M_{{GT}(k)} \in C^{J\times (D_1 \ldots D_{k1} D_{k+1} \ldots D_M)}\) and \(X_{{GT}(k)} \in C^{D_k\times (D_1 \ldots D_{k1} D_{k+1} \ldots D_M)}\) are, respectively, the generalized modek flattening of the gtensors \(M_{{GT}}\) and \(X_{{GT}}\).
An example of a generalized tensor (gtensor) \(X_{GT} \)\(\in \)\(C^{2\times 3\times 2}\equiv {\mathbb {C}}^{3\times 3\times 2\times 3\times 2}\), its modek flattening and its mode2 multiplication with a tmatrix \(Y_{\mathrm{TM}} \in C^{2\times 3}\equiv {\mathbb {C}}^{3\times 3\times 2\times 3}\) are given in a supplementary file.
Tensor Singular Value Decomposition
The singular value decomposition (SVD) is a wellknown factorization of real or complex matrices [12]. It generalizes the eigendecomposition of positive semidefinite normal matrices to nonsquare and nonnormal matrices. The SVD has a wide range of applications in data analytics, including computing the pseudoinverse of a matrix, solving linear least square problems, lowrank approximation and linear and multilinear component analysis. A tensor version TSVD of the SVD is described in Sect. 4.1 and then applied in Sect. 4.2 to obtain a tensor version, THOSVD, of the higherorder SVD (HOSVD). Further information about the TSVD can be found in [18] and [41].
TSVD: Tensorial SVD
Algorithm
A tensor version, TSVD, of the singular value decomposition is described in this section and then applied in Sect. 4.2 to obtain a tensor version of the highorder SVD (HOSVD). See [41] and [18].
Given a tmatrix \(X_{\mathrm{TM}} \in C^{D_1\times D_2}\), let \(Q \doteq \min (D_1, D_2)\). The TSVD of \(X_{\mathrm{TM}}\) yields the following three tmatrices \(U_{\mathrm{TM}} \in C^{D_1 \times Q}\), \(S_{\mathrm{TM}} \in C^{Q\times Q}\) and \(V_{\mathrm{TM}} \in C^{D_2\times Q}\), such that
where \(U_{\mathrm{TM}}^{{\mathcal {H}}} \circ U_{\mathrm{TM}} = V_{\mathrm{TM}}^{{\mathcal {H}}} \circ V_{\mathrm{TM}} = I_{\mathrm{TM}}^{(Q)}\), \(S_{\mathrm{TM}} = {\text {diag}}(\lambda _{T, 1},\ldots ,\lambda _{T, Q})\) and \(\lambda _{T, 1}, \ldots , \lambda _{T, Q} \in C\) are nonnegative and satisfy \(F(\lambda _{T, 1})_{i} \ge \cdots \ge F(\lambda _{T, Q})_{i}\ge 0\;,~1\le i\le I. \) The tmatrices \(U_{{\mathrm{TM}}}\) and \(V_{{\mathrm{TM}}}\) are generalizations of the orthogonal matrices in the SVD of a matrix with elements in \({\mathbb {R}}\) or \({\mathbb {C}}\).
Although it is possible to compute \(U_{\mathrm{TM}}\), \(S_{\mathrm{TM}}\) and \(V_{\mathrm{TM}}\) in the spatial domain, it is preferable to organize the TSVD algorithm in the Fourier domain, because of the observation in Sect. 2.3 that the Fourier transform converts the convolution product to the Hadamard product. The TSVD of \(X_{\mathrm{TM}}\) can be decomposed into \(\prod \nolimits _{n=1}^{N}I_{n}\) SVDs of complex number matrices given by the slices of the Fourier transform \(F(X_{{\mathrm{TM}}})\). The tmatrices \(U_{\mathrm{TM}}\), \(S_{\mathrm{TM}}\) and \(V_{\mathrm{TM}}\) in Eq. (26) are obtained in Algorithm 1.
If \(X_{{\mathrm{TM}}}\) is defined over \({\mathbb {R}}\), then \(U_{{\mathrm{TM}}}\), \(S_{{\mathrm{TM}}}\) and \(V_{{\mathrm{TM}}}\) can be chosen such that they are defined over \({\mathbb {R}}\). It is sufficient to choose the slices \({\tilde{U}}_{{\mathrm{TM}}}(i)\), \({\tilde{S}}_{{\mathrm{TM}}}(i)\) and \({\tilde{V}}_{{\mathrm{TM}}}(i)\) such that \({\tilde{U}}_{{\mathrm{TM}}}(i) = \overline{{\tilde{U}}}_{{\mathrm{TM}}}(2i)\), \({\tilde{S}}_{{\mathrm{TM}}}(i) = \overline{{\tilde{S}}}_{{\mathrm{TM}}}(2i)\) and \({\tilde{V}}_{{\mathrm{TM}}}(i) = \overline{{\tilde{V}}}_{{\mathrm{TM}}}(2i)\). When the tscalar dimensions are given by \(N = 1\), \(I_{1}=1\), TSVD reduces to the canonical SVD of a matrix in \({\mathbb {C}}^{D_1\times D_2}\). The properties of the SVD can be used to show that the tmatrix \(S_{\mathrm{TM}}\) in Algorithm 1 is unique. The tmatrices \(U_{\mathrm{TM}}\) and \(V_{\mathrm{TM}}\) are not unique.
TSVD Approximation
TSVD can be used to approximate data. Given a tmatrix \(X_{\mathrm{TM}} \in {C}^{D_1\times D_2}\), let \(Q \doteq \min (D_1, D_2)\) and let the TSVD of \(X_{{\mathrm{TM}}}\) be computed as in Eq. (26). The lowrank approximation \({\hat{X}}_{{\mathrm{TM}}}\) of \(X_{\mathrm{TM}}\) with rank of r (\(1 \le r \le Q\)) is defined by
where \({\hat{S}}_{\mathrm{TM}} = {\text {diag}}(\lambda _{T, 1}, \ldots ,\lambda _{T, r}, \underset{Qr}{\underbrace{Z_{T}, \ldots , Z_{T}}} )\) and \(\lambda _{T, 1}, \ldots , \lambda _{T, r} \ne Z_T\).
When the tscalar dimensions are given by \(N = 1\), \(I_{1} = 1\), Eq. (27) reduces to the SVD lowrank approximation to a matrix in \({\mathbb {C}}^{D_1\times D_2}\).
Furthermore, we contend that the approximation \({\hat{X}}_{\mathrm{TM}}\) computed as in Eq. (27) is the solution of the following optimization problem:
where \(\Vert \cdot \Vert _F\) denotes the generalized Frobenius norm of a tmatrix, which is a nonnegative tscalar, as defined in Eq. (20). The result \(X_{\mathrm{TM}}^{\mathrm{approx}}\) generalizes the Eckart–Young–Mirsky theorem [7].
To have an optimization problem in the form of (28), the notation \({\text {rank}}(\cdot )\), i.e., the rank of a tmatrix, and \(\min (\cdot )\), i.e., the minimization of a nonnegative tscalar variable belonging to a subset of \(S^{\mathrm{nonneg}}\), and the ordering relationship \(\le \) between two nonnegative tscalars need to be defined.
These definitions generalize their canonical counterparts. The definitions and the generalized Eckart–Young–Mirsky theorem are discussed in the Appendix.
THOSVD: Tensor HigherOrder SVD
In multilinear algebra, the higherorder singular value decomposition (HOSVD), also known as the orthogonal Tucker decomposition of a tensor, is a generalization of the SVD. It is commonly used to extract directional information from multiway arrays [6, 30]. The applications of HOSVD include data analytics [29, 32], machine learning [23, 33, 34], DNA and RNA analysis [25, 26] and texture mapping in computer graphics [35].
On using the tscalar algebra, the HOSVD can be generalized further to obtain a tensorial HOSVD, called THOSVD. The THOSVD is obtained by replacing the complex number elements of each multiway array by tscalar elements. Based on the definitions of gtensors in Sect. 3.5, the THOSVD of \(X_{GT} \in C^{D_1\times D_2 \times \cdots \times D_M}\) is given by the following generalized modek multiplications:
where \(S_{GT} \in C^{Q_1\times Q_2 \times \cdots \times Q_M}\) is called the core gtensor, \(U_{{\mathrm{TM}}, k} \in C^{D_k\times Q_k}\) is the modek factor tmatrix and \(Q_k \doteq \min (D_k, D_{k}^{1}\prod \nolimits _{m=1}^{M}D_m)\) for \(1\le k\le M\).
Given a gtensor \(X_{GT} \in C^{D_1\times D_2 \times \cdots \times D_M}\), the THOSVD of \(X_{GT}\), as in Eq. (29), is obtained in Algorithm 2, using a strategy analogous to that of Tucker [30] and De Lathauwer et al. [6] for computing the HOSVD of a tensor with elements in \({\mathbb {R}}\) or \({\mathbb {C}}\).
Note that THOSVD generalizes the HOSVD for canonical tensors, TSVD for tmatrices and SVD for canonical matrices. Many SVD and HOSVDbased algorithms can be generalized by TSVD and THOSVD, respectively.
TensorBased Algorithms
Three tensorbased algorithms are proposed. They are tensorial principal component analysis (TPCA), tensorial twodimensional principal component analysis (T2DPCA) and tensorial Grassmannian component analysis (TGCA). TPCA and T2DPCA are generalizations of the wellknown algorithms PCA and 2DPCA [37]. TGCA is a generalization of the recent GCA algorithm [13, 14]. It is possible to generalize many other linear or multilinear algorithms using similar methods.
TPCA: Tensorial Principal Component Analysis
Principal component analysis (PCA) is a wellknown algorithm for extracting the prominent components of observed vectors. PCA is generalized to TPCA in a straightforward manner. Let \(X_{{\mathrm{TV}}, 1}, \ldots , X_{{\mathrm{TV}}, K} \in C^{D}\) be K given tvectors. Then, the covariancelike tmatrix \(G_{\mathrm{TM}} \in C^{D\times D}\) is defined by
where \({\bar{X}}_{\mathrm{TV}} = (1/K)~ \sum \nolimits _{k=1}^{K} X_{{\mathrm{TV}}, k}\). It is not difficult to verify that \(G_{\mathrm{TM}}\) is Hermitian, namely \(G_{\mathrm{TM}}^{{\mathcal {H}}} = G_{\mathrm{TM}}\).
The tmatrix \(U_{\mathrm{TM}} \in C^{D\times D}\) is computed from the TSVD of \(G_{\mathrm{TM}}\) as in Algorithm 1. Then, given any tvector \(Y_{\mathrm{TV}} \in C^{D}\), its feature tvector \(Y^{\mathrm{feat}}_{\mathrm{TV}} \in C^{D}\) is defined by
To reduce \(Y^{\mathrm{feat}}_{\mathrm{TV}}\) from a tvector in \(C^{D}\) to a tvector in \(C^{d}\) (\(D>d\)), simply discard the last \((Dd)\) tscalar entries of \(Y^{\mathrm{feat}}_{\mathrm{TV}}\).
In algebraic terminology, the column tvectors of \(U_{\mathrm{TM}}\) span a linear submodule of tvectors, which is a generalization of a vector subspace [3]. In this sense, each tscalar entry of \(Y^{\mathrm{feat}}_{\mathrm{TV}}\) is a generalized coordinate of the projection of the tvector \((Y_{\mathrm{TV}}  {\bar{X}}_{\mathrm{TV}} )\) onto the submodule. The lowrank reconstruction \(Y_{\mathrm{TV}}^{\mathrm{rec}}\in C^{D}\) with the parameter d is given by
where \( (U_{\mathrm{TM}}) _{:, 1:d} \in C^{D\times d}\) denotes the tmatrix containing the first d tvector columns of \(U_{\mathrm{TM}} \in C^{D\times D}\) and \( (Y^{\mathrm{feat}}_{\mathrm{TV}}) _{1:d} \in C^{d}\) denotes the tvector containing the first d tscalar entries of \(Y^{\mathrm{feat}}_{\mathrm{TV}} \in C^{D}\).
Note that PCA is a special case of TPCA. When the tscalar dimensions are given by \(N = 1\), \(I_{1}=1\), TPCA reduces to PCA.
T2DPCA: Tensorial TwoDimensional Principal Component Analysis
The algorithm 2DPCA is an extension of PCA proposed by Yang et al. [37] for analyzing the principal components of matrices. Although 2DPA is written in a noncentered rowvectororiented form in the original paper [37], it is rewritten here in a centered columnvectororiented form, which is consistent with the formulation of PCA. The centered columnvectororiented form of 2DPCA is chosen for discussing its generalization to T2DPCA (Tensorial 2DPCA).
Similar to TPCA, T2DPCA also finds submodules, but they are obtained by analyzing tmatrices. Let \(X_{{\mathrm{TM}}, 1}, \ldots , X_{{\mathrm{TM}}, K} \in C^{D_1\times D_2}\) be the K observed tmatrices. Then, the Hermitian covariancelike tmatrix \(G_{\mathrm{TM}}\in C^{D_1\times D_1}\) is given by
where \({\bar{X}}_{\mathrm{TM}} = (1/K)~ \sum \nolimits _{k=1}^{K} X_{{\mathrm{TM}}, k}\).
Then, the tmatrix \(U_{\mathrm{TM}} \in C^{D_1\times D_1}\) is computed from the TSVD of \(G_{\mathrm{TM}}\) as in Algorithm 1. Given any tmatrix \(Y_{\mathrm{TM}} \in C^{D_1\times D_2}\), its feature tmatrix \(Y^{\mathrm{feat}}_{\mathrm{TM}} \in C^{D_1\times D_2}\) is a centered tmatrix projection (i.e., a collection of centered column tvector projections) on the module spanned by \(U_{\mathrm{TM}}\), namely
To reduce \(Y^{\mathrm{feat}}_{\mathrm{TM}}\) from a tmatrix in \(C^{D_1\times D_2}\) to a tmatrix in \(C^{d\times D_2}\) (\(D_1>d\)), simply discard the last \((D_1d)\) row tvectors of \(Y^{\mathrm{feat}}_{\mathrm{TM}}\).
The T2DPCA reconstruction with the parameter d is given by \(Y_{\mathrm{TM}}^{\mathrm{rec}} \in C^{D_1\times D_2}\) as follows:
where \( (U_{\mathrm{TM}}) _{:, 1:d} \in C^{D_1\times d}\) denotes the tmatrix containing the first d column tvectors of \(U_{\mathrm{TM}}\) and \( (Y_{\mathrm{TM}}^{\mathrm{feat}}) _{1:d,:}\)\(\in \)\(C^{d\times D_2}\) denotes the tmatrix containing the first d row tvectors of \(Y^{\mathrm{feat}}_{\mathrm{TM}}\).
When the tscalar dimensions are given by \(N = 1\), \(I_{1}=1\), T2DPCA reduces to 2DPCA. In addition, TPCA is a special case of T2DPCA. When \(D_2 = 1\), T2DPCA reduces to TPCA. Furthermore, when \(N=1, I_1 = 1\) and \(D_2 = 1\), T2DPCA reduces to PCA.
TGCA: Tensorial Grassmannian Component Analysis
A tmatrix algorithm which generalizes the recent algorithm for Grassmannian component analysis (GCA) is proposed. An example of GCA can be found in [13], where it forms part of an algorithm for sparse coding on Grassmannian manifolds. In this section, GCA is extended to its generalized version called TGCA (tensorial GCA).
In TGCA, each measurement is a set of tvectors organized into a “thin” tmatrix, with the number of rows larger than the number of columns. Let \(X_{{\mathrm{TM}}, 1},\ldots ,X_{{\mathrm{TM}}, K}\)\(\in \)\(C^{D\times d}\) (\(D > d\)) be the observed tmatrices. Then, the tvector columns of each tmatrix are first orthogonalized. Using the tscalar algebra, it is straightforward to generalize the classical Gram–Schmidt orthogonalization process for tvectors. The TSVD can also be used to orthogonalize a set of tvectors. In GCA and TGCA, the choice of orthogonalization algorithm does not matter as long as the algorithm is consistent for all sets of vectors and tvectors.
Given a tmatrix \(Y_{\mathrm{TM}} \in C^{D\times d}\), let \({\dot{Y}}_{\mathrm{TM}} \in C^{D\times d}\) be the corresponding unitary orthogonalized tmatrix (namely, \({\dot{Y}}_{\mathrm{TM}}^{{\mathcal {H}}} \circ {\dot{Y}}_{\mathrm{TM}} = I_{\mathrm{TM}}^{(d)}\) ) computed from \(Y_{\mathrm{TM}}\). Let \((Y_{\mathrm{TM}})_{:, k}\) be the kth column tvector of \(Y_{\mathrm{TM}}\), and let \(({\dot{Y}}_{\mathrm{TM}})_{:, k}\) be the kth column tvector of \({\dot{Y}}_{\mathrm{TM}}\) for \(1\le k\le d\). The generalized Gram–Schmidt orthogonalization is given by Algorithm 3.
Let \({\dot{X}}_{{\mathrm{TM}},k} \in C^{D\times d}\) be the unitary orthogonalized tmatrices computed from \(X_{{\mathrm{TM}}, k}\) for \(1\le k\le K\). Then, for \(1 \le k,k' \le K\), the \((k,k')\) tscalar entry of the symmetric tmatrix \(G_{\mathrm{TM}} \in C^{K\times K}\) is nonnegative and given by
where \(\Vert \cdot \Vert _{F}\) is the generalized Frobenius norm of a tmatrix, as defined by Eq. (20).
Given any query tmatrix sample \(Y_{\mathrm{TM}} \in C^{D\times d}\), let \({\dot{Y}}_{\mathrm{TM}} \in C^{D\times d}\) be the corresponding unitary orthogonalized tmatrix computed from \(Y_{\mathrm{TM}}\). Then, the kth tscalar entry of \(K_{\mathrm{TV}} \in C^{K}\) is computed as follows:
Since \(G_{\mathrm{TM}}\), computed as in Eq. (36), is symmetric, the TSVD of \(G_{\mathrm{TM}}\) has the following form:
Furthermore, if it is assumed that the diagonal entries \(S_{\mathrm{TM}} \doteq {\text {diag}}(\lambda _{T, 1},\cdots ,\lambda _{T, K}) \) are all strictly positive, then the multiplicative inverse of \(\lambda _{T,k}\) exists for \(1\le k\le K\). The tmatrix \(S_{\mathrm{TM}}^{1/2} \doteq {\text {diag}}(\sqrt{\lambda _{T, 1}},\cdots ,\sqrt{\lambda _{T, K}}) \) is called the tmatrix square root of \(S_{\mathrm{TM}}\), and the tmatrix \(S_{\mathrm{TM}}^{1/2} \doteq {\text {diag}}(\frac{E_T}{\sqrt{\lambda _{T, 1}}},\cdots , \frac{E_T}{\sqrt{\lambda _{T, K}}}) \) is called the inverse tmatrix of \(S_{\mathrm{TM}}^{1/2}\).
Thus, the features of the tmatrix sample \(Y_{\mathrm{TM}} \in C^{D\times d}\) are given by the tvector \(Y_{\mathrm{TV}}^{\mathrm{feat}} \in C^{K}\) as
and the features of the kth measurement \(X_{{\mathrm{TM}}, k}\) are given by the tvector \(X_{{\mathrm{TV}}, k}^{\mathrm{feat}}\) as follows:
where \( (G_{\mathrm{TM}}) _{:, k}\) denotes the kth tvector column of \(G_{\mathrm{TM}}\). It is not difficult to verify that \(S_{\mathrm{TM}}^{{1}/{2}} \circ U_{\mathrm{TM}}^{{\mathcal {H}}} \circ G_{\mathrm{TM}} \equiv S_{\mathrm{TM}}^{{1}/{2}} \circ U_{\mathrm{TM}}^{{\mathcal {H}}}\). This yields the following compact form for \(X_{{\mathrm{TV}},k}^{\mathrm{feat}}\).
where \( (S_{\mathrm{TM}}^{{1}/{2}} \circ U_{\mathrm{TM}}^{{\mathcal {H}}}) _{:, k}\) denotes the kth tvector column of the tmatrix \((S_{\mathrm{TM}}^{{1}/{2}} \circ U_{\mathrm{TM}}^{{\mathcal {H}}})\). Equation (41) is more efficient in computations than Eq. (40).
The dimension of a TGCA feature tvector is reduced from K to \(K'\) (\(K > K'\)) by discarding the last \((KK')\) tscalar entries. It is noted that GCA is a special case of TGCA when the dimensions of the tscalars are given by \(N = 1\), \(I_{1} = 1\).
Experiments
The results obtained from TSVD, THOSVD, TPCA, T2DPCA, TGCA and their precursors are compared in applications to lowrank approximation in Sect. 6.1, reconstruction in Sect. 6.2 and supervised classification of images in Sect. 6.3.
In these experiments, “vertical” and “horizontal” comparisons between generalized algorithms and the corresponding canonical algorithms are made.
In a “vertical” experiment, tensorized data are obtained from the canonical data in \(3\times 3\) neighborhoods. The associated tscalar is a \(3\times 3\) array. To make the vertical comparison fair, the central slices of a generalized result are put into the original canonical form and then compared with the result of the associated canonical algorithm.
In a “horizontal” comparison, a generalized orderN array of ordertwo tscalars is equivalent to a canonical order\((N+2)\) array of scalars. Therefore, a generalized algorithm based on orderN arrays of ordertwo tscalars is compared with a canonical algorithm based on order\((N+2)\) arrays of scalars.
LowRank Approximation
TSVD approximation is computed as in Eq. (27). THOSVD approximation generalizes lowrank approximation by TSVD and lowrank approximation by HOSVD. To simplify the calculations, the approximation is obtained for a gtensor \(X_{{GT}}\) in \(C^{D_{1}\times D_{2}\times D_{3}}\). Let \(Q_k \doteq \min (D_k, D_{k}^{1}D_1D_2D_3)\) for \(k = 1, 2, 3\). The THOSVD of \(X_{GT}\) yields
where \(U_{{\mathrm{TM}}, k} \in C^{D_k \times Q_k}\) for \(k=1,2,3\) and \(S_{\mathrm{TM}} \in C^{Q_1\times Q_2\times Q_3}\).
The lowrank approximation \({\hat{X}}_{GT} \in C^{D_1\times D_2\times D_3}\) to \(X_{{GT}}\) and with multilinear rank tuple \((r_1, r_2, r_3)\), (\(1 \le r_k \le Q_k\) for all \( k = 1,2, 3\)), is computed as in Eq. (43), where \((U_{{\mathrm{TM}}, k})_{:, 1: r_k}\) denotes the tmatrix containing the first \(r_k\) tvector columns of \(U_{{\mathrm{TM}}, k}\) for \(k = 1,2,3\) and \((S_{GT})_{1: r_1, 1: r_2, 1: r_3} \in C^{r_1\times r_2\times r_3}\) denotes the gtensor containing the first \(r_1\times r_2 \times r_3\) tscalar entries of \(S_{GT}\).
When the tscalar dimensions are given by \(N = 1\), \(I_{1} = 1\), Eq. (43) reduces to the HOSVD lowrank approximation of a tensor in \({\mathbb {C}}^{D_1 \times D_2 \times D_3}\). When the gtensor dimension \(D_3 = 1\), Eq. (43) reduces to the SVD lowrank approximation of a canonical matrix in \({\mathbb {C}}^{D_1\times D_2}\).
TSVD Versus SVD—A “Vertical” Comparison
The lowrank approximation performances of TSVD and SVD are compared. In the experiment, the test sample is the \(512\times 512\times 3\) RBG Lena image downloaded from Wikipedia.^{Footnote 1}
For the SVD lowrank approximations, the RGB Lena image is split into three \(512\times 512\) monochrome images. Each monochrome image is analyzed using the SVD. The three extracted monochrome Lena images are ordertwo arrays in \({\mathbb {R}}^{512\times 512}\). Each monochrome Lena image is tensorized to produce a timage (a generalized monochrome image) in \(R^{512\times 512} \equiv {\mathbb {R}}^{3 \times 3\times 512\times 512}\). In the tensorized version of the image, each pixel value is replaced by a \(3\times 3\) square of values obtained from the \(3\times 3\) neighborhood of the pixel. Padding with 0 is used where necessary at the boundary of the image.
To evaluate the TSVD approximations in a manner relevant to the SVD approximations, upon obtaining a timage approximation \({\hat{X}}_{\mathrm{TM}} \in {\mathbb {R}}^{3\times 3\times 512\times 512}\), the part \({\hat{X}}_{MT}(i)_{i = (2,2)} \in {\mathbb {R}}^{512\times 512}\), i.e., the central slice of the TSVD approximation, is used for comparisons.
Given an array X of any order over the real numbers \({\mathbb {R}}\), let \({\hat{X}}\) be an approximation to X. Then, the PSNR (peak signaltonoise ratio) for \({\hat{X}}\) is defined as in [1] by
where \(N^{\mathrm{entry}}\) denotes the number of real number entries of X, \(\Vert X {\hat{X}}\Vert _F\) is the canonical Frobenius norm of the array \((X{\hat{X}})\) and MAX is the maximum possible value of the entries of X. In all the experiments, MAX = 255. In this experiment comparing TSVD and SVD, \(N^{\mathrm{entry}} = 512\times 512 = 262144\).
Figure 1 shows the PSNR curves of the SVD and TSVD approximations as functions of the rank of \({\hat{X}}\). It is clear that the PSNR of the TSVD approximation is consistently higher than that of SVD approximation. When the rank \(r = 500\), the PSNRs of TSVD and SVD differ by more than 37 dBs.
TSVD Versus HOSVD—A “Horizontal” Comparison
Given a monochrome Lena image as an ordertwo array in \({\mathbb {R}}^{512\times 512}\) and its tensorized form as an orderfour array in \({\mathbb {R}}^{3\times 3\times 512\times 512}\), TSVD yields an approximation array in \({\mathbb {R}}^{3\times 3\times 512\times 512}\). Since the HOSVD is applicable to orderfour arrays in \({\mathbb {R}}^{3\times 3\times 512\times 512}\), we give a “horizontal” comparison of the performances of TSVD and HOSVD.
More specifically, given a generalized monochrome Lena image \(X_{\mathrm{TM}} \equiv X \in C^{512\times 512} \equiv \)\({\mathbb {R}}^{3\times 3 \times 512\times 512}\) and a specified rank r, the TSVD approximation yields a tmatrix \({\hat{X}}_{\mathrm{TM}} \in C^{512\times 512} \equiv {\mathbb {R}}^{3\times 3 \times 512\times 512}\), which is computed as in Eq. (27) with \(D_1 = 512\) and \(D_2 = 512\).
Let the HOSVD of \(X \in {\mathbb {R}}^{3\times 3\times 512\times 512}\) be \( X = S ~\times _1~ U_1 ~\times _2~ U_2 ~\times _3~ U_3 ~\times _4~ U_4 \) where \(S \in {\mathbb {R}}^{3\times 3 \times 512\times 512} \) denotes the core tensor, and \(U_1 \in {\mathbb {R}}^{3\times 3}\), \(U_2 \in {\mathbb {R}}^{3\times 3}\), \(U_3 \in {\mathbb {R}}^{512\times 512}\), \(U_4 \in {\mathbb {R}}^{512\times 512}\) are all orthogonal matrices. Then, to give a “horizontal” comparison with the TSVD approximation \({\hat{X}}_{\mathrm{TM}}\) with rank r, the HOSVD approximation \({\hat{X}} \in {\mathbb {R}}^{3\times 3\times 512\times 512}\) is given by the multimode product
The PSNRs TSVD and HOSVD are computed as in Eq. (44) with \(\hbox {MAX} = 255\) and \(N^{\mathrm{entry}} = 3\times 3\times 512\times 512 = 2359296\).
For each of the generalized monochrome Lena images (respectively marked by the channel type “red,” “green” and “blue”), as a \({3\times 3\times 512\times 512}\) real number array, the PSNRs of TSVD and HOSVD are shown in Fig. 2.
As rank r is varied, the PSNR of TSVD approximation is always higher than that of the corresponding HOSVD approximation. When rank r is equal to 500, the PSNRs of TSVD and HOSVD approximations differ significantly.
THOSVD Versus HOSVD—A “Vertical” Comparison
The lowrank approximation performances of THOSVD and HOSVD are compared. For the HOSVD approximations, the RGB Lena image, which is a tensor in \({\mathbb {R}}^{512\times 512\times 3}\), is used as the test sample. For the THOSVD, the \(3\times 3\) neighborhood (with zeropadding) strategy is used to tensorize each real number entry of the RGB Lena image. The obtained timage \(X_{GT}\) is a gtensor in \({R}^{512\times 512\times 3}\), i.e., an orderfive array in \({\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\).
To give a “vertical” comparison, on obtaining an approximation \({\hat{X}}_{GT}\)\(\in \)\({\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\), \({\hat{X}}_{GT}(i)_{i = (2, 2)}\)\(\in \)\({\mathbb {R}}^{512\times 512\times 3}\), i.e., the central slice of the THOSVD approximation, is compared with the HOSVD approximation on the RGB Lena image.
Figure 3 shows a “vertical” comparison of the PSNR maps of THOSVD and HOSVD approximations and the tabulated PSNRs for some representative multilinear rank tuples \((r_1, r_2, r_3)\). It shows that the PSNR of the THOSVD approximation is consistently higher than the PSNR of the HOSVD approximation. When \((r_1, r_2, r_3) = (500, 500, 3)\), the approximations obtained by THOSVD and HOSVD differ by 30.29 dB in their PSNR values.
THOSVD Versus HOSVD—A “Horizontal Comparison”
Given a fifthorder array \(X \in {\mathbb {R}}^{3\times 3\times 512\times 512\times 5}\) tensorized from the RGB Lena image, which is a thirdorder array in \({\mathbb {R}}^{512 \times 512\times 3}\), both THOSVD and HOSVD can be applied to the same data X.
THOSVD takes X as a gtensor \(X_{GT} \in C^{512\times 512\times 3} \equiv {\mathbb {R}}^{3 \times 3\times 512\times 512\times 3}\), while HOSVD takes X merely as a canonical fifthorder array in \({\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\).
Then, given a rank tuple \((r_1, r_2, r_3)\) subject to \(1\le r_1 \le 512\), \(1\le r_2 \le 512\) and \(1\le r_3 \le 3\), the THOSVD approximation \({\hat{X}}_{GT} \in C^{512 \times 512\times 3} \) is computed as in Eq. (43).
Let the HOSVD of \(X \in {\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\) be \( X = S ~\times _1~ U_1 ~\times _2~ U_2 ~\times _3~ U_3 ~\times _4~ U_4 ~\times _5~ U_5 \) where \(S \in {\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\) is the core tensor and \(U_1 \in {\mathbb {R}}^{3\times 3} \), \(U_2 \in {\mathbb {R}}^{3\times 3} \), \(U_3 \in {\mathbb {R}}^{512\times 512} \), \(U_4 \in {\mathbb {R}}^{512\times 512} \), \(U_5 \in {\mathbb {R}}^{3\times 3} \) are all orthogonal matrices.
Then, to give a “horizontal” comparison with the THOSVD approximation \({\hat{X}}_{GT} \in C^{512\times 512\times 3}\) with a rank tuple \((r_1, r_2, r_3)\), the HOSVD approximation \({\hat{X}} \in {\mathbb {R}}^{3\times 3\times 512\times 512\times 3}\) is given by the following multimode product:
Figure 4 shows the “horizontal” comparison of THOSV approximations and HOSVD approximations on the same array with different rank tuples \((r_1, r_2, r_3)\). Albeit somewhat smaller in PSNRs, the results in Fig. 4 are similar to the results in Fig. 3 (a “vertical” comparison), corroborating the claim that a THOSVD approximation outperforms, in terms of PSNR, the corresponding HOSVD approximation on the same data.
Reconstruction
The qualities of the lowrank reconstructions produced by TPCA and PCA and by T2DPCA and 2DPCA, as described by Eqs. (32) and (35), are compared.
The effectiveness of PCA, 2DPCA, TPCA and T2DPCA for reconstruction is assessed using the ORL dataset. The dataset contains 400 face images in 40 classes, i.e., 10 images/class \(\times \) 40 classes. Each image has \(112\times 92\) pixels.^{Footnote 2} The first 200 images (5 images/class \(\times \) 40 classes) are used as the observed images, and the remaining 200 images are the query images.
For the experiments with TPCA/T2DPCA, all ORL images are tensorized to timages in \(R^{112\times 92}\), namely orderfour arrays in \({\mathbb {R}}^{3\times 3\times 112\times 92 }\). Eigendecompositions and teigendecompositions are computed on the observed images and timages, respectively. Reconstructions are computed for the query images and timages, respectively. The number of PSNRs for the reconstructed images and timages is 200. It is convenient to use the average of the PSNRs (denoted by A), the standard deviation of PSNRs (denoted by S) and the ratio, A/S. A larger value of A with a smaller value of S indicates a better quality of reconstruction.
TPCA Versus PCA—A “Vertical” Comparison
To make the TPCA and PCA reconstructions computationally tractable, each image is resized to \(56 \times 46\) pixels by bicubic interpolation. The resized images are also tensorized to timages, i.e., orderfour arrays in \({\mathbb {R}}^{3\times 3\times 56\times 46}\). The obtained images and timages are then transformed to vectors and tvectors, respectively, by stacking their columns. The central slices of the TPCA reconstructions are compared with the PCA reconstructions.
Figure 5 shows graphs and some tabulated values of A, S and A/S for a number of eigenvectors and eigentvectors. Note that K linearly independent observed vectors or tvectors yield at most \((K1)\) eigenvectors or eigentvectors. Thus, the maximum number of eigenvectors and eigentvectors in Fig. 5 is 199 (\(K = 200\)).
The average PSNR for TPCA is consistently higher than the average PSNR for PCA. The PSNR standard deviation for TPCA is slightly larger than the PSNR standard deviation for PCA, but the ratio A/S for TPCA is generally smaller than the ratio A/S for PCA. This indicates that TPCA outperforms PCA in terms of reconstruction quality.
T2DPCA Versus 2DPCA—A “Vertical” Comparison
The same observed samples from the ORL dataset (the first 200 images, 5 images/class \(\times \) 40 classes) and query samples (the remaining 200 images) are used to compare the reconstruction performances of T2DPCA and 2DPCA. The central slices of the T2DPCA are compared with the 2DPCA reconstructions.
Figure 6 shows the reconstruction curves and some tabulated values yielded by T2PCA and 2DPCA as functions of the number d of eigenvectors or eigentvectors. The average PSNR obtained by T2DPCA is consistently higher than the average PSNR obtained by 2DPCA. When the parameter d equals 111, the gap between the two average PSNRs is 31.98 dBs. Furthermore, the PSNR standard deviation for T2DPCA is also generally smaller than the PSNR standard deviation for 2DPCA. In terms of reconstruction quality, T2DPCA outperforms 2DPCA.
Classification
TGCA and GCA are applied to the classification of the pixel values in hyperspectral images. Hyperspectral images have hundreds of spectral bands, in contrast with RGB images which have only three spectral bands. The multiple spectral bands and high resolution make hyperspectral imagery essential in remote sensing, target analysis, classification and identification [10, 15, 21, 24, 36, 38, 40]. Two publicly available datasets are used to evaluate the effectiveness of TGCA and GCA for supervised classification.
Datasets
The first hyperspectral image dataset is the Indian Pines cube (Indian cube for short), which consists of \(145 \times 145\) hyperspectral pixels (hyperpixels for short) and has 220 spectral bands, yielding an array of order three in \({\mathbb {R}}^{145 \times 145 \times 220}\). The Indian cube comes with groundtruth labels for 16 classes [31]. The second hyperspectral image dataset is the Pavia University cube (Pavia cube for short), which consists of \(610 \times 340\) hyperpixels with 103 spectral bands, yielding an array of order three in \({\mathbb {R}}^{610 \times 340 \times 103}\). The ground truth contains 9 classes [31].
Tensorization
Given a hyperspectral cube, let \(D_1\) be the number of rows, \(D_2\) the number of columns and D the number of spectral bands. A hyperpixel is represented by a vector in \({\mathbb {R}}^{D}\). Each pixel is tensorized by its \(3\times 3\) neighborhood. The tensorized hyperspectral cube is represented by an array in \({\mathbb {R}}^{3\times 3\times D_1\times D_2\times D}\). Each tensorized hyperpixel, called thyperpixel in this paper, is represented by a tvector in \(R^{D}\), i.e., an array in \({\mathbb {R}}^{ 3\times 3\times D}\).
Figure 7 shows the tensorization of a canonical vector extracted from a hyperspectral cube. The tensorization of all vectors yields a tensorized hyperspectral cube in \({\mathbb {R}}^{3\times 3\times D_1\times D_2\times D}\).
Input Matrices and TMatrices
To classify a query hyperpixel, it is necessary to extract features from the hyperpixel. A thyperpixel in TGCA is represented by a set of tvectors in the \(5\times 5\) neighborhood of the thyperpixel. These tvectors are used to construct a tmatrix. A similar construction is used for GCA.
In GCA for example, let the vectors in the \(5\times 5\) neighborhood of a hyperpixel be \(X_{\mathrm{vec}, 1}, \ldots , X_{\mathrm{vec}, 25}\). The ordering of the vectors should be the same for all hyperpixels. The raw matrix \(X_{\mathrm{mat}}\) representing the hyperpixel is given by marshalling these vectors as the columns of \(X_{\mathrm{mat}}\), namely \(X_{\mathrm{mat}} \doteq [X_{\mathrm{vec}, 1}, \ldots , X_{\mathrm{vec}, 25}] \in {\mathbb {R}}^{D\times 25}\). The associated tmatrix \(X_{\mathrm{TM}} \in C^{D\times 25}\) in TGCA is obtained by marshalling the associated 25 tvectors.
After obtaining each matrix and tmatrix, the columns are orthogonalized. The resulting matrices and tmatrices are input samples for GCA and TGCA, respectively.
Classification
To evaluate GCA, TGCA and the competing methods, the overall accuracies (OA) and the Cohen’s \(\kappa \) indices of the supervised classification of hyperpixels (i.e., prediction of class labels of hyperpixels) are used. The overall accuracies and \(\kappa \) indices are obtained for different component analyzers and classifiers. Higher values of OA or \(\kappa \) indicate a higher component analyzer performance [9]. Let K be the number of query samples, and let \(K'\) be the number of correctly classified samples. The overall accuracy is simply defined by \({OA} = K^{'} / K \). The \(\kappa \) index is defined by [5]
where \(N^{\mathrm{class}}\) is the number of classes, \(a_{j}\) is the number of samples belonging to the jth class and \(b_{j}\) is the number of samples classified to the jth class.
Two classical component analyzers, namely PCA and LDA, and four stateoftheart component analyzers, namely TDLA [40], LTDA [42], GCA [13] and TPCA (ours), are evaluated against TGCA. As an evaluation baseline, the results obtained with the original raw canonical vectors for hyperpixels are given. These raw vectors are denoted as the “original” (ORI for short) vectors. Three vectororiented classifiers, NN (nearest neighbor), SVM (support vector machine) and RF (random forest), are employed to evaluate the effectiveness of the features extracted by these component analyzers.
In the experiments, the background hyperpixels are excluded, because they do not have labels in the ground truth. A total of 10% of the foreground hyperpixels are randomly and uniformly chosen without replacement as the observed samples (i.e., samples whose class labels are known in advance). The rest of the foreground hyperpixels are chosen as the query samples, that is, samples with the class labels to be determined.
In order to use the vectororiented classifiers NN, SVM and RF, the tvector results, generated by TGCA or TPCA, are transformed by pooling them to yield canonical vectors. For TGCA, the canonical vectors obtained by pooling are referred to as TGCAI features and the tvectors without pooling are referred to as the TGCAII features.
To assess the effectiveness of the TGCAII features, a generalized classifier which deals with tvectors is needed. It is possible to generalize many canonical classifiers from vectororiented to tvectororiented; however, a comprehensive discussion of these generalizations is outside the scope of this paper. Nevertheless, it is very straightforward to generalize NN. The ddimensional tvectors are not only elements of the module \(C^{d}\), but also the elements in the vector space \({\mathbb {C}}^{3\times 3 \times d}\). This enables the use of the canonical Frobenius norm to measure the distance between two tvectors, as the elements in \({\mathbb {C}}^{3\times 3 \times d}\). The canonical Frobenius norm should not be confused with the generalized Frobenius norm defined in Eq. (20).
Figure 8 shows the highest classification accuracies obtained by each pair of component analyzer and classifier on the two hyperspectral cubes. The highest accuracies are obtained by traversing the set of feature dimensions \(d \in \{5, 10,\ldots , D_m\}\), where \(D_{m}\) is the maximum dimension valid for the associated component analyzer. Figure 8 shows that the results obtained by the algorithms TPCA, TGCAI and TGCAII are consistently better than those obtained by their canonical counterparts. Even working with a relatively weak classifier NN, TGCA achieves the highest accuracies and highest \(\kappa \) indices in the experiments. Further results are shown in Figs. 9 and 10. It is clear that the pair TGCA and NN yield the best results, outperforming any other pair of analyzer and classifier.
TGCA Versus GCA
It is noted that the maximum dimension of the TGCA and GCA features is equal to the number of observed training samples and therefore is much higher than the original dimension, which is equal to the number of spectral bands. Thus, taking the original dimension as the baseline, one can employ TGCA or GCA either for dimension reduction or dimension increase. When the socalled curse of dimension is the concern, one can discard the insignificant entries of the TGCA and GCA features. When the accuracy is the primary concern, one can use higherdimensional features.
The performances of TGCA and GCA for varying feature dimension are compared using accuracy curves generated by TGCA (i.e., TGCAI and TGCAII) and GCA, as shown in Fig. 11. The results are obtained for low feature dimensions and for high feature dimensions. It is clear that the classification accuracies obtained using TGCA and TGCAII are consistently higher than the accuracies obtained using GCA.
TPCA Versus PCA
The classification accuracies of TPCA and PCA are compared, although the highest classification accuracies are not obtained from TPCA or PCA. The classification accuracy curves obtained by TPCA and PCA (with classifiers NN, SVM and RF) are shown in Fig. 12. It is clear that, no matter which classifier and feature dimension are chosen, the accuracy using TPCA is consistently higher than the accuracy using PCA.^{Footnote 3}
Computational Cost
The run times of tmatrix manipulations with different tscalar sizes \(I_1 \times I_2\) are shown in Fig. 13. The size of tscalars ranges from \(1 \le I_1, I_2\le 32\). The evaluated tmatrix manipulations include addition, conjugate transposition, multiplication and TSVD. The run time is evaluated using MATLAB R2018B on a notebook PC with Intel i74700MQ CPU at 2.40 GHz and 16 GB memory.
Each time point in the figure is obtained by averaging 100 manipulations on random tmatrices in \({\mathbb {R}}^{I_1\times I_2\times 64 \times 64}\). Each tmatrix with \((I_1, I_2) \ne (1, 1)\) is transformed to the Fourier domain and manipulated via its \(I_1\cdot I_2\) slices. The results are transferred back to the original domain by the inverse Fourier transform. Note that when \((I_1, I_2) = (1, 1)\), a tmatrix manipulation is reduced to canonical matrix manipulation. The reported run time of a canonical matrix manipulation does not include the time spent on the Fourier transform and its inverse transform.
From Fig. 13, it can be seen that the run time is essentially an increasing linear function of the number of slices, i.e., \(I_1\cdot I_2\).
Conclusion
An algebraic framework of tensorial matrices is proposed for generalized visual information analysis. The algebraic framework generalizes the canonical matrix algebra, combining the “multiway” merits of highorder arrays and the “twoway” intuition of matrices. In the algebraic framework, scalars are extended to tscalars, which are implemented as highorder numerical arrays of a fixed size. With appropriate operations, the tscalars are trinitarian in the following sense. (1) Tscalars are generalized complex numbers. (2) Tscalars are elements of an algebraic ring. (3) Tscalars are elements of a linear space.
Tensorial matrices, called tmatrices, are constructed with tscalar elements. The resulting tmatrix algebra is backward compatible with the canonical matrix algebra. Using this talgebra framework, it is possible to generalize many canonical matrix and vector constructions and algorithms.
To demonstrate the “multiway” merits and “twoway” matrix intuition of the proposed tensorial algebra and its applications to generalized visual information analysis, the canonical matrix algorithms SVD, HOSVD, PCA, 2DPCA and GCA are generalized. Experiments with lowrank approximation, reconstruction and supervised classification show that the generalized algorithms compare favorably with their canonical counterparts on visual information analysis.
Notes
 1.
 2.
 3.
To use the same classifiers, pooling is used to transform the tvectors by TPCA to canonical vectors.
 4.
The partial order “<” is defined between nonnegative tscalars. The inequality \(Z_T< {\text {rank}}(X_T) <E_T\) means \(Z_T \le {\text {rank}}(X_T) \le E_T\) and \({\text {rank}}(X_T) \ne Z_T\) and \({\text {rank}}(X_T) \ne E_T\).
References
 1.
Almohammad, A., Ghinea, G.: Stego image quality and the reliability of PSNR. In: 2010 2nd International Conference on Image Processing Theory, Tools and Applications, pp. 215–220 (2010)
 2.
Bracewell, R.N., Bracewell, R.N.: The Fourier Transform and Its Applications, 3rd edn, pp. 108–112. McGrawHill, New York (1999)
 3.
Braman, K.: Thirdorder tensors as linear operators on a space of matrices. Linear Algebra Appl. 433(7), 1241–1253 (2010)
 4.
Chen, Z., Wang, B., Niu, Y., Xia, W., Zhang, J.Q., Hu, B.: Change detection for hyperspectral images based on tensor analysis. In: Geoscience and Remote Sensing Symposium, pp. 1662–1665 (2015)
 5.
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
 6.
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
 7.
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)
 8.
Fan, H., Li, C., Guo, Y., Kuang, G., Ma, J.: Spatialspectral total variation regularized lowrank tensor decomposition for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 56(10), 6196–6213 (2018)
 9.
Fitzgerald, R.W., Lees, B.G.: Assessing the classification accuracy of multisource remote sensing data. Remote Sens. Environ. 47(3), 362–368 (1994)
 10.
Fu, W., Li, S., Fang, L., Kang, X., Benediktsson, J.A.: Hyperspectral image classification via shapeadaptive joint sparse representation. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 9(2), 556–567 (2016)
 11.
Gleich, D.F., Chen, G., Varah, J.M.: The power and Arnoldi methods in an algebra of circulants. Numer. Linear Algebra Appl. 20(5), 809–831 (2013)
 12.
Golub, G., Loan, C.V.: Matrix Computations, Chap. 2. North Oxford Academic, Oxford (1983)
 13.
Harandi, M., Hartley, R., Shen, C., Lovell, B., Sanderson, C.: Extrinsic methods for coding and dictionary learning on Grassmann manifolds. Int. J. Comput. Vis. 114(2), 113–136 (2015)
 14.
Harandi, M.T., Hartley, R., Lovell, B., Sanderson, C.: Sparse coding on symmetric positive definite manifolds using Bregman divergences. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1294–1306 (2015)
 15.
He, Z., Li, J., Liu, L.: Tensor blocksparsity based representation for spectralspatial hyperspectral image classification. Remote Sens. 8(8), 636 (2016)
 16.
Hungerford, T.: Algebra, Graduate Texts in Mathematics, Chap. IV, vol. 73. Springer, New York (1974)
 17.
Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Thirdorder tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)
 18.
Kilmer, M.E., Martin, C.D.: Factorization strategies for thirdorder tensors. Linear Algebra Appl. 435(3), 641–658 (2011)
 19.
Kolda, T., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
 20.
Liao, L., Maybank, S.J., Zhang, Y., Liu, X.: Supervised classification via constrained subspace and tensor sparse representation. In: International Joint Conference on Neural Networks, pp. 2306–2313 (2017)
 21.
Liu, Z., Tang, B., He, X., Qiu, Q., Wang, H.: Sparse tensorbased dimensionality reduction for hyperspectral spectralspatial discriminant feature extraction. IEEE Geosci. Remote Sens. Lett. 1775–1779(99), 1–5 (2017)
 22.
Lu, C., Feng, Y., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis: exact recovery of corrupted low rank tensors via convex optimization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5249–5257 (2016)
 23.
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)
 24.
Ma, X., Wang, H., Geng, J.: Spectralspatial classification of hyperspectral image based on deep autoencoder. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 9(9), 4073–4085 (2016)
 25.
Muralidhara, C., Gross, A.M., Gutell, R.R., Alter, O.: Tensor decomposition reveals concurrent evolutionary convergences and divergences and correlations with structural motifs in ribosomal RNA. PloS One 6(4), e18768 (2011)
 26.
Omberg, L., Golub, G.H., Alter, O.: A tensor higherorder singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc. Natl. Acad. Sci. U S A 104, 18371–18376 (2007)
 27.
Papalexakis, N.S.L.D.X.F.K.H.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)
 28.
Ren, Y., Liao, L., Maybank, S.J., Zhang, Y., Liu, X.: Hyperspectral image spectralspatial feature extraction via tensor principal component analysis. IEEE Geosci. Remote Sens. Lett. 14(9), 1431–1435 (2017)
 29.
Taguchi, Y.H.: Tensor decompositionbased unsupervised feature extraction applied to matrix products for multiview data processing. Plos One 12(8), e0183933 (2017)
 30.
Tucker, L.R.: Some mathematical notes on threemode factor analysis. Psychometrika 31(3), 279–311 (1966)
 31.
University of the Basque country: hyperspectral remote sensing scenes. http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes
 32.
Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: A new truncation strategy for the higherorder singular value decomposition. SIAM J. Sci. Comput. 34(2), A1027–A1052 (2012)
 33.
Vasilescu, M.A.O.: Human motion signatures: analysis, synthesis, recognition. Proc. Int. Conf. Pattern Recognit. 3, 456–460 (2002)
 34.
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: TensorFaces. In: European Conference on Computer Vision, pp. 447–460. Springer (2002)
 35.
Vasilescu, M.A.O., Terzopoulos, D.: TensorTextures: multilinear imagebased rendering. ACM Trans. Graph. 23(3), 336–342 (2004)
 36.
Wei, Y., Zhou, Y., Li, H.: Spectralspatial response for hyperspectral image classification. Remote Sens. 9(3), 203–233 (2017)
 37.
Yang, J., Zhang, D., Frangi, A.F., Yang, J.Y.: Twodimensional PCA: a new approach to appearancebased face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 131–7 (2004)
 38.
Zhang, E., Zhang, X., Jiao, L., Li, L., Hou, B.: Spectralspatial hyperspectral image ensemble classification via joint sparse representation. Pattern Recognit. 59, 42–54 (2016)
 39.
Zhang, J., Saibaba, A.K., Kilmer, M., Aeron, S.: A randomized tensor singular value decomposition based on the tproduct. Numer. Linear Algebra Appl. 25(5), e2179 (2018)
 40.
Zhang, L., Zhang, L., Tao, D., Huang, X.: Tensor discriminative locality alignment for hyperspectral image spectralspatial feature extraction. IEEE Trans. Geosci. Remote Sens. 51(1), 242–256 (2013)
 41.
Zhang, Z., Ely, G., Aeron, S., Hao, N., Kilmer, M.: Novel methods for multilinear data completion and denoising based on tensorSVD. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3842–3849 (2014)
 42.
Zhong, Z., Fan, B., Duan, J., Wang, L., Ding, K., Xiang, S., Pan, C.: Discriminant tensor spectralspatial feature extraction for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 12(5), 1028–1032 (2015)
Acknowledgements
Liang Liao would like to thank professor Pinzhi Fan (Southwestern Jiaotong University, China) for his support and some insightful suggestions to this work. Liang Liao also would like to thank Yuemei Ren, Chengkai Yang, Haichang Ye, Jie Yang and Xuechun Zhang for their supports to some early stage experiments of this work. All prospective support and collaborations to this research are welcome. Contact email: liaolangis@126.com or liaoliang2018@gmail.com.
Author information
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (No. U1404607) and the Highend Foreign Experts Program (No. GDW20186300351) of State Administration of Foreign Experts Affairs.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Before giving a proof of the equivalence of Eqs. (27) and (28), namely the generalized Eckart–Young–Mirsky theorem, some notations need to be defined.
First, \({\text {rank}}(\cdot )\) denotes the rank of a tmatrix, which generalizes the rank of a canonical matrix and is defined as follows.
Definition I, rank of a tmatrix. Given a tmatrix, the rank \(Y_{T} \doteq {\text {rank}}(X_{\mathrm{TM}})\) is a nonnegative tscalar such that
where \(F(X_{\mathrm{TM}})(i)\) denotes the ith slice of the Fourier transform \(F(X_{\mathrm{TM}})\).
Definition II, partial ordering of nonnegative tscalars. Given two nonnegative tscalars \(X_{T}\) and \(Y_{T}\), the notation \(X_T \le Y_T\) is equivalent to the following condition:
Definition III, minimization of nonnegative tscalar variable. For a nonnegative tscalar variable \(X_T\) varying in a subset of \( S^{\mathrm{nonneg}}\), \(Y_T \doteq \min (X_T)\) is the nonnegative tscalar infimum of the subset, satisfying the following condition:
where \(F(Y_T)\) and \(F(X_T)\), respectively, denote the Fourier transforms of \(Y_T\) and \(X_T\).
Given two nonnegative tscalars \(X_{T}\) and \(Y_{T}\), let \(M_{T}\) be the nonnegative tscalar defined by \(M_{T} = \min (X_{T}, Y_{T})\), namely
The above definitions are not casual ones. Following the above definitions, it is not difficult to verify that many generalized rank properties hold in the analogous form of their canonical counterparts.
For examples, given any tmatrices \( X_{\mathrm{TM}} \in C^{D_1\times D_2}\) and \(Y_{\mathrm{TM}} \in C^{D_1\times D_2}\) or \(Y_{\mathrm{TM}} \in C^{D_2\times D_3}\), the following inequalities hold:
Since a tscalar is a tmatrix of one row and one column, the rank of a tscalar can be obtained.
Given any tscalar \(X_T\), let \(G_T \doteq {\text {rank}}(X_T)\) be the rank of \(X_T\). Then, following Eq. (48), it is not difficult to prove the ith entry of the Fourier transform \(F(G_T)\) which is given as follows:
Following the partial ordering given as in (49) and Eq. (48), it is not difficult to prove that the following propositions hold:
It follows from (56) that \(Z_T< {\text {rank}}(X_T) < E_T \) iff the tscalar \(X_T\) is nonzero and noninvertible.^{Footnote 4}
Generalized rank from a TSVD perspective. Given any tmatrix \(X_{\mathrm{TM}} = U_{\mathrm{TM}} \circ S_{\mathrm{TM}} \circ V_{\mathrm{TM}}^{{\mathcal {H}}} \) where \(S_{\mathrm{TM}} \doteq {\text {diag}}(\lambda _{T, 1}, \ldots , \lambda _{T, k}, \ldots \lambda _{T, Q} )\) and \(\lambda _{T, k} \in C\) is a tscalar for all k, then the following equation holds and generalizes its canonical counterpart:
Let the approximation be \( {\hat{X}}_{\mathrm{TM}} = U_{\mathrm{TM}} \circ {\hat{S}}_{\mathrm{TM}} \circ V_{\mathrm{TM}}^{{\mathcal {H}}} \) where \({\hat{S}}_{\mathrm{TM}} = {\text {diag}}(\lambda _{T, 1}, \ldots ,\lambda _{T, r}, \underset{Qr}{\underbrace{Z_{T}, \ldots , Z_{T}}} )\). Then, \({\hat{X}}_{\mathrm{TM}}\) is a lowrank approximation to \(X_{\mathrm{TM}}\) since the following rank inequality holds:
Furthermore, it is not difficult to verify that Eq. (28) is equivalent to the following equation in the form of canonical matrices (i.e., slices of Fouriertransformed tmatrices).
where \({\tilde{X}}_{\mathrm{TM}}^{\mathrm{approx}}\) and \({\tilde{X}}_{\mathrm{TM}}\), respectively, denote the Fourier transform of \({X}_{\mathrm{TM}}^{\mathrm{approx}}\) and \({X}_{\mathrm{TM}}\) in Eq. (28), and \({\text {rank}}(Y_{\mathrm{mat}})\) is the rank of a complex matrix \(Y_{\mathrm{mat}}\) in \({\mathbb {C}}^{D_1\times D_2}\).
On the other hand, by applying the Fourier transforms to both sides of Eq. (27), Eq. (27) is transformed to the following equation in the form of canonical matrices (i.e., slices of Fouriertransformed tmatrices):
where \({\tilde{X}}_{\mathrm{TM}}^{\mathrm{svd}} \), \({\tilde{U}}_{\mathrm{TM}}\), \({\tilde{S}}_{\mathrm{TM}}^{\mathrm{approx}}\) and \({\tilde{V}}_{\mathrm{TM}}^{\mathrm{approx}}\), respectively, denote the Fourier transforms of \({\hat{X}}_{\mathrm{TM}} \), \(U_{\mathrm{TM}} \), \({\hat{S}}_{\mathrm{TM}} \) and \(V_{\mathrm{TM}} \) in Eq. (27).
The canonical Eckart–Young–Mirsky theorem guarantees the equivalence of Eqs. (59) and (60).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liao, L., Maybank, S.J. Generalized Visual Information Analysis Via Tensorial Algebra. J Math Imaging Vis (2020). https://doi.org/10.1007/s10851020009469
Received:
Accepted:
Published:
Keywords
 Commutative ring
 Grassmannian manifold
 Image classification
 Tensor singular value decomposition
 Tensors