Tensor Decomposition

Taguchi, Y-h.

doi:10.1007/978-3-030-22456-1_3

Tensor Decomposition

Y-h. Taguchi³

Chapter
First Online: 24 August 2019

743 Accesses
4 Altmetric

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

Tensor decomposition (TD) is a natural extension of matrix factorization (MF), introduced for matrices in the previous chapter, when tensors instead of matrices are considered. In contrast to the MF that is usually represented as a product of two matrices, TD has various forms. In contrast to the matrices that were extensively studied over long period, tensor has much shorter history of extensive investigations, especially from the application point of views. Thus, there are no de facto standards to be used for the specific application. Similar to the aim of MF, that of TD is also to reduce the degrees of freedoms. Nevertheless, how the degrees of freedom can be reduced has many variations for TD. In this chapter, we introduce three principal realizations of TD: sum of outer product of vectors, product summation of (smaller) tensor and matrices, and product summation of (smaller) tensors. These three methods have their own unique pros and cons. In addition to the algorithm to perform each of TDs, we will also discuss about these pros and cons of three methods introduced.

I painted her as an unapproachable enigma and never even tried to see her for who she was.Ichigo, Darling in the FranXX, Season 1, Episode 16

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Although the detailed algorithms of individual TDs will be presented in the later sections, readers might feel that they would like to try them in advance with reading prior sections that demonstrate examples. In that case, see Appendix A where I list some of the implementations on various platforms.
2.
See Appendix for more details about Moore-Penrose pseudoinverse. Alternatively, one can simply execute linear regression analysis , Eq. (3.35).

References

Barata, J.C.A., Hussein, M.S.: The Moore–Penrose pseudoinverse: a tutorial review of the theory. Braz. J. Phys. 42(1), 146–165 (2012). https://doi.org/10.1007/s13538-011-0052-z
Article Google Scholar
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). https://doi.org/10.1137/07070111X
Article MathSciNet Google Scholar
Oseledets, I.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011). https://doi.org/10.1137/090752286
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physics, Chuo University, Tokyo, Japan
Y-h. Taguchi

Authors

Y-h. Taguchi
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

1.1 I Moore-Penrose Pseudoinverse

Moore-Penrose pseudoinverse [1] , which is denoted as A ^†, of matrix A satisfies the following conditions:

AA ^† A = A
A ^† AA ^† = A ^†
(A ^† A)^T = A ^† A
(AA ^†)^T = AA ^†

Suppose we need to find $\boldsymbol {x} \in \mathbb {R}^M$ that satisfies

$$\displaystyle \begin{aligned} A \boldsymbol{x} = \boldsymbol{b} {} \end{aligned} $$

(3.72)

where $A \in \mathbb {R}^{N \times M}$ and $\boldsymbol {b} \in \mathbb {R}^N$. It is known that there is a unique solution only when N = M.

Moore-Penrose pseudoinverse can solve Eq. (3.72) because

$$\displaystyle \begin{aligned} \boldsymbol{x} = A^\dagger \boldsymbol{b} \end{aligned} $$

(3.73)

gives

the unique solution of Eq. (3.72) when N = M.
the x that satisfies Eq. (3.72) with minimum |x| when N < M (i.e., when no unique solutions are available).
the x with minimum |A x −b| when N > M (equivalent to the so-called linear regression analysis ).

When N < M, there are infinitely large number of solutions that satisfy Eq. (3.72). Moore-Penrose pseudoinverse allows us to select one of them, which has minimum |x|. On the other hand, when N > M, there are not always solutions that satisfy Eq. (3.72). Moore-Penrose pseudoinverse allows us to select the solution having the minimum |A x −b|, i.e., the smallest residuals. Thus, by computing Moore-Penrose pseudoinverse, we can always compute x that satisfies Eq. (3.72) as much as possible in some sense.

How to compute A ^† is as follows. Apply SVD to A as

$$\displaystyle \begin{aligned} A = U \varSigma V^T \end{aligned} $$

(3.74)

$U \in \mathbb {R}^{N \times M}, \varSigma , V \in \mathbb {R}^{M \times M}$ for N > M and $U, \varSigma \in \mathbb {R}^{N \times N}, V \in \mathbb {R}^{M \times N}$ for N < M. When U or V is not a square matrix, U ^T U = V ^T V = I, but UU ^T ≠ I and V V ^T ≠ I. When U and V are square matrices, U ^T U = UU ^T = V ^T V = V V ^T = I.

Then A ^† can be defined as

$$\displaystyle \begin{aligned} A^\dagger = V \varSigma^{-1} U^T \end{aligned} $$

(3.75)

It is not difficult to show that A ^† = V Σ ⁻¹ U ^T satisfies the required conditions because

$$\displaystyle \begin{aligned} A A^\dagger = \left ( U \varSigma V^T \right)\left (V \varSigma^{-1} U^T \right)= U \varSigma \varSigma^{-1} U^T = U I U^T = \left \lbrace \begin{array}{cc} U U^T, N > M \\ I , N \leq M \end{array} \right. \end{aligned} $$

(3.76)

and

$$\displaystyle \begin{aligned} A^\dagger A = \left (V \varSigma^{-1} U^T \right) \left( U \varSigma V^T \right) = V \varSigma^{-1} \varSigma V^T = V I V^T = \left \lbrace \begin{array}{cc} I, N \geq M \\ VV^T, N<M \end{array} \right. \end{aligned} $$

(3.77)

where V ^T V = I for N > M and U ^T U = I for N < M are used.

Then when N > M,

$$\displaystyle \begin{aligned} \begin{array}{rcl} A A^\dagger A &\displaystyle =&\displaystyle A\left (A^\dagger A \right ) = A I = A , \end{array} \end{aligned} $$

(3.78)

$$\displaystyle \begin{aligned} \begin{array}{rcl} A^\dagger A A^\dagger &\displaystyle = &\displaystyle \left (A^\dagger A \right ) A^\dagger = I A^\dagger = A^\dagger \end{array} \end{aligned} $$

(3.79)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A A^\dagger \right )^T &\displaystyle = &\displaystyle \left (UU^T \right)^T = \left (U^T \right)^T U^T = U U^T = A A^\dagger \end{array} \end{aligned} $$

(3.80)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A^\dagger A \right)^T &\displaystyle =&\displaystyle I^T = I = A^\dagger A \end{array} \end{aligned} $$

(3.81)

On the other hand, when N < M,

$$\displaystyle \begin{aligned} \begin{array}{rcl} A A^\dagger A &\displaystyle =&\displaystyle \left (AA^\dagger \right) A = I A = A , \end{array} \end{aligned} $$

(3.82)

$$\displaystyle \begin{aligned} \begin{array}{rcl} A^\dagger A A^\dagger &\displaystyle = &\displaystyle A^\dagger \left (A A^\dagger \right) = A^\dagger I = A^\dagger \end{array} \end{aligned} $$

(3.83)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A A^\dagger \right )^T &\displaystyle = &\displaystyle I^T = I = A A^\dagger \end{array} \end{aligned} $$

(3.84)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left(A^\dagger A \right)^T &\displaystyle =&\displaystyle \left (VV^T \right )^T= \left (V^T \right )^T V^T = VV^T = A^\dagger A \end{array} \end{aligned} $$

(3.85)

When N = M, these are obvious because AA ^† = A ^† A = I.

The reason why we can treat Eq. (3.72) using Moore-Penrose pseudoinverse as mentioned in the above is as follows. Define

$$\displaystyle \begin{aligned} \boldsymbol{x}_0 = A^\dagger \boldsymbol{b} + \left (I - A^\dagger A \right ) \boldsymbol{w} {} \end{aligned} $$

(3.86)

with arbitrary vector w. Then because

$$\displaystyle \begin{aligned} A \boldsymbol{x}_0 = A A^\dagger \boldsymbol{b} + \left ( A - A A^\dagger A \right) \boldsymbol{w} =A A^\dagger \boldsymbol{b} \end{aligned} $$

(3.87)

when AA ^† = I, i.e., N ≤ M, A x ₀ = b, x ₀ is a solution of Eq. (3.72). This corresponds to the cases where there are no unique solutions because the number of variables, M, is larger than the number of equations, N. x ₀ can be a unique solution only when A ^† A = I as well, i.e., N = M because of Eq. (3.86). This corresponds to the cases where there is a unique solution because the number of variables, M, is equal to the number of equations, N.

Here one should notice that $A^\dagger \boldsymbol {b} \perp \left (I - A^\dagger A \right ) \boldsymbol {w}$ because

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (I - A^\dagger A \right ) \boldsymbol{w} \cdot A^\dagger \boldsymbol{b} &\displaystyle = &\displaystyle \left ( \left (I - A^\dagger A \right ) \boldsymbol{w} \right)^T A^\dagger \boldsymbol{b} = \boldsymbol{w}^T \left (I- A^\dagger A \right)^T A^\dagger \boldsymbol{b} \\&\displaystyle =&\displaystyle \boldsymbol{w}^T \left(I- A^\dagger A \right)A^\dagger \boldsymbol{b} =\boldsymbol{w}^T ( A^\dagger - A^\dagger A A^\dagger ) \boldsymbol{b}\\ &\displaystyle = &\displaystyle \boldsymbol{w}^T 0 \boldsymbol{b}=0. \end{array} \end{aligned} $$

(3.88)

Thus from Eq. (3.86)

$$\displaystyle \begin{aligned} \left |\boldsymbol{x}_0 \right |{}^2 = \left |A^\dagger \boldsymbol{b} \right |{}^2 + \left |\left (I - A^\dagger A \right) \boldsymbol{w} \right |{}^2 \end{aligned} $$

(3.89)

This means $|\boldsymbol {x}_0| > \left |A^\dagger \boldsymbol {b} \right |$. Therefore, A ^† b is the solution that satisfies Eq. (3.72) and has the smallest |x ₀| (in other words, the solution with the L2 regulation term).

When AA ^†≠ I, i.e., N > M, there are no solutions. This corresponds to the cases where there are no solutions because the number of variables, M, is smaller than the number of equations, N. In this case, x = A ^† b is known to be optimal (i.e., the solution with minimum |A x −b|). In order to prove this, first we need to compute A ^T(AA ^† b −b) as

$$\displaystyle \begin{aligned} \begin{array}{rcl} A^T \left (AA^\dagger \boldsymbol{b}-\boldsymbol{b} \right ) &\displaystyle = &\displaystyle A^T \left ( \left (AA^\dagger \right)^T \boldsymbol{b}-\boldsymbol{b}\right) =\left(\left(AA^\dagger A\right)^T -A^T\right)\boldsymbol{b}\\ &\displaystyle = &\displaystyle \left(AA^\dagger A-A \right)^T \boldsymbol{b}= 0 \boldsymbol{b}=0 \end{array} \end{aligned} $$

(3.90)

With taking transposition of the above, we can also get

$$\displaystyle \begin{aligned} \left (AA^\dagger \boldsymbol{b}-\boldsymbol{b} \right)^T A =0 \end{aligned} $$

(3.91)

Using these, we can show

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left | A \boldsymbol{x}- \boldsymbol{b} \right |{}^2 &\displaystyle = &\displaystyle \left | \left (A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right )+ \left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right)\right|{}^2 \end{array} \end{aligned} $$

(3.92)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right |{}^2 + \left (A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right)^T\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right) \\ &\displaystyle &\displaystyle +\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right)^T\left(A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$

(3.93)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right|{}^2 + \left (\boldsymbol{x} - A^\dagger \boldsymbol{b} \right)^TA^T\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right) \\ &\displaystyle &\displaystyle +\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b} \right)^TA\left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$

(3.94)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right|{}^2 + \left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right)^T 0 \\ &\displaystyle &\displaystyle +0\left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$

(3.95)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left|A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right |{}^2 +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$

(3.96)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \geq &\displaystyle \left|AA^\dagger \boldsymbol{b}- \boldsymbol{b} \right|{}^2 \end{array} \end{aligned} $$

(3.97)

This means that x = A ^† b is an optimal solution of Eq. (3.72).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Taguchi, Yh. (2020). Tensor Decomposition. In: Unsupervised Feature Extraction Applied to Bioinformatics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-22456-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-22456-1_3
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22455-4
Online ISBN: 978-3-030-22456-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Appendix

Appendix

1.1 I Moore-Penrose Pseudoinverse

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation