Skip to main content

Tensor Decomposition

  • Chapter
  • First Online:

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

Tensor decomposition (TD) is a natural extension of matrix factorization (MF), introduced for matrices in the previous chapter, when tensors instead of matrices are considered. In contrast to the MF that is usually represented as a product of two matrices, TD has various forms. In contrast to the matrices that were extensively studied over long period, tensor has much shorter history of extensive investigations, especially from the application point of views. Thus, there are no de facto standards to be used for the specific application. Similar to the aim of MF, that of TD is also to reduce the degrees of freedoms. Nevertheless, how the degrees of freedom can be reduced has many variations for TD. In this chapter, we introduce three principal realizations of TD: sum of outer product of vectors, product summation of (smaller) tensor and matrices, and product summation of (smaller) tensors. These three methods have their own unique pros and cons. In addition to the algorithm to perform each of TDs, we will also discuss about these pros and cons of three methods introduced.

I painted her as an unapproachable enigma and never even tried to see her for who she was.Ichigo, Darling in the FranXX, Season 1, Episode 16

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Although the detailed algorithms of individual TDs will be presented in the later sections, readers might feel that they would like to try them in advance with reading prior sections that demonstrate examples. In that case, see Appendix A where I list some of the implementations on various platforms.

  2. 2.

    See Appendix for more details about Moore-Penrose pseudoinverse. Alternatively, one can simply execute linear regression analysis , Eq. (3.35).

References

  1. Barata, J.C.A., Hussein, M.S.: The Moore–Penrose pseudoinverse: a tutorial review of the theory. Braz. J. Phys. 42(1), 146–165 (2012). https://doi.org/10.1007/s13538-011-0052-z

    Article  Google Scholar 

  2. Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). https://doi.org/10.1137/07070111X

    Article  MathSciNet  Google Scholar 

  3. Oseledets, I.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011). https://doi.org/10.1137/090752286

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

1.1 I Moore-Penrose Pseudoinverse

Moore-Penrose pseudoinverse [1] , which is denoted as A , of matrix A satisfies the following conditions:

  • AA A = A

  • A AA  = A

  • (A A)T = A A

  • (AA )T = AA

Suppose we need to find \(\boldsymbol {x} \in \mathbb {R}^M\) that satisfies

$$\displaystyle \begin{aligned} A \boldsymbol{x} = \boldsymbol{b} {} \end{aligned} $$
(3.72)

where \(A \in \mathbb {R}^{N \times M}\) and \(\boldsymbol {b} \in \mathbb {R}^N\). It is known that there is a unique solution only when N = M.

Moore-Penrose pseudoinverse can solve Eq. (3.72) because

$$\displaystyle \begin{aligned} \boldsymbol{x} = A^\dagger \boldsymbol{b} \end{aligned} $$
(3.73)

gives

  • the unique solution of Eq. (3.72) when N = M.

  • the x that satisfies Eq. (3.72) with minimum |x| when N < M (i.e., when no unique solutions are available).

  • the x with minimum |A x −b| when N > M (equivalent to the so-called linear regression analysis ).

When N < M, there are infinitely large number of solutions that satisfy Eq. (3.72). Moore-Penrose pseudoinverse allows us to select one of them, which has minimum |x|. On the other hand, when N > M, there are not always solutions that satisfy Eq. (3.72). Moore-Penrose pseudoinverse allows us to select the solution having the minimum |A x −b|, i.e., the smallest residuals. Thus, by computing Moore-Penrose pseudoinverse, we can always compute x that satisfies Eq. (3.72) as much as possible in some sense.

How to compute A is as follows. Apply SVD to A as

$$\displaystyle \begin{aligned} A = U \varSigma V^T \end{aligned} $$
(3.74)

\(U \in \mathbb {R}^{N \times M}, \varSigma , V \in \mathbb {R}^{M \times M}\) for N > M and \(U, \varSigma \in \mathbb {R}^{N \times N}, V \in \mathbb {R}^{M \times N}\) for N < M. When U or V is not a square matrix, U T U = V T V = I, but UU T ≠ I and VV T ≠ I. When U and V are square matrices, U T U = UU T = V T V = VV T = I.

Then A can be defined as

$$\displaystyle \begin{aligned} A^\dagger = V \varSigma^{-1} U^T \end{aligned} $$
(3.75)

It is not difficult to show that A  = VΣ −1 U T satisfies the required conditions because

$$\displaystyle \begin{aligned} A A^\dagger = \left ( U \varSigma V^T \right)\left (V \varSigma^{-1} U^T \right)= U \varSigma \varSigma^{-1} U^T = U I U^T = \left \lbrace \begin{array}{cc} U U^T, N > M \\ I , N \leq M \end{array} \right. \end{aligned} $$
(3.76)

and

$$\displaystyle \begin{aligned} A^\dagger A = \left (V \varSigma^{-1} U^T \right) \left( U \varSigma V^T \right) = V \varSigma^{-1} \varSigma V^T = V I V^T = \left \lbrace \begin{array}{cc} I, N \geq M \\ VV^T, N<M \end{array} \right. \end{aligned} $$
(3.77)

where V T V = I for N > M and U T U = I for N < M are used.

Then when N > M,

$$\displaystyle \begin{aligned} \begin{array}{rcl} A A^\dagger A &\displaystyle =&\displaystyle A\left (A^\dagger A \right ) = A I = A , \end{array} \end{aligned} $$
(3.78)
$$\displaystyle \begin{aligned} \begin{array}{rcl} A^\dagger A A^\dagger &\displaystyle = &\displaystyle \left (A^\dagger A \right ) A^\dagger = I A^\dagger = A^\dagger \end{array} \end{aligned} $$
(3.79)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A A^\dagger \right )^T &\displaystyle = &\displaystyle \left (UU^T \right)^T = \left (U^T \right)^T U^T = U U^T = A A^\dagger \end{array} \end{aligned} $$
(3.80)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A^\dagger A \right)^T &\displaystyle =&\displaystyle I^T = I = A^\dagger A \end{array} \end{aligned} $$
(3.81)

On the other hand, when N < M,

$$\displaystyle \begin{aligned} \begin{array}{rcl} A A^\dagger A &\displaystyle =&\displaystyle \left (AA^\dagger \right) A = I A = A , \end{array} \end{aligned} $$
(3.82)
$$\displaystyle \begin{aligned} \begin{array}{rcl} A^\dagger A A^\dagger &\displaystyle = &\displaystyle A^\dagger \left (A A^\dagger \right) = A^\dagger I = A^\dagger \end{array} \end{aligned} $$
(3.83)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (A A^\dagger \right )^T &\displaystyle = &\displaystyle I^T = I = A A^\dagger \end{array} \end{aligned} $$
(3.84)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left(A^\dagger A \right)^T &\displaystyle =&\displaystyle \left (VV^T \right )^T= \left (V^T \right )^T V^T = VV^T = A^\dagger A \end{array} \end{aligned} $$
(3.85)

When N = M, these are obvious because AA  = A A = I.

The reason why we can treat Eq. (3.72) using Moore-Penrose pseudoinverse as mentioned in the above is as follows. Define

$$\displaystyle \begin{aligned} \boldsymbol{x}_0 = A^\dagger \boldsymbol{b} + \left (I - A^\dagger A \right ) \boldsymbol{w} {} \end{aligned} $$
(3.86)

with arbitrary vector w. Then because

$$\displaystyle \begin{aligned} A \boldsymbol{x}_0 = A A^\dagger \boldsymbol{b} + \left ( A - A A^\dagger A \right) \boldsymbol{w} =A A^\dagger \boldsymbol{b} \end{aligned} $$
(3.87)

when AA  = I, i.e., N ≤ M, A x 0 = b, x 0 is a solution of Eq. (3.72). This corresponds to the cases where there are no unique solutions because the number of variables, M, is larger than the number of equations, N. x 0 can be a unique solution only when A A = I as well, i.e., N = M because of Eq. (3.86). This corresponds to the cases where there is a unique solution because the number of variables, M, is equal to the number of equations, N.

Here one should notice that \(A^\dagger \boldsymbol {b} \perp \left (I - A^\dagger A \right ) \boldsymbol {w}\) because

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left (I - A^\dagger A \right ) \boldsymbol{w} \cdot A^\dagger \boldsymbol{b} &\displaystyle = &\displaystyle \left ( \left (I - A^\dagger A \right ) \boldsymbol{w} \right)^T A^\dagger \boldsymbol{b} = \boldsymbol{w}^T \left (I- A^\dagger A \right)^T A^\dagger \boldsymbol{b} \\&\displaystyle =&\displaystyle \boldsymbol{w}^T \left(I- A^\dagger A \right)A^\dagger \boldsymbol{b} =\boldsymbol{w}^T ( A^\dagger - A^\dagger A A^\dagger ) \boldsymbol{b}\\ &\displaystyle = &\displaystyle \boldsymbol{w}^T 0 \boldsymbol{b}=0. \end{array} \end{aligned} $$
(3.88)

Thus from Eq. (3.86)

$$\displaystyle \begin{aligned} \left |\boldsymbol{x}_0 \right |{}^2 = \left |A^\dagger \boldsymbol{b} \right |{}^2 + \left |\left (I - A^\dagger A \right) \boldsymbol{w} \right |{}^2 \end{aligned} $$
(3.89)

This means \(|\boldsymbol {x}_0| > \left |A^\dagger \boldsymbol {b} \right |\). Therefore, A b is the solution that satisfies Eq. (3.72) and has the smallest |x 0| (in other words, the solution with the L2 regulation term).

When AA ≠ I, i.e., N > M, there are no solutions. This corresponds to the cases where there are no solutions because the number of variables, M, is smaller than the number of equations, N. In this case, x = A b is known to be optimal (i.e., the solution with minimum |A x −b|). In order to prove this, first we need to compute A T(AA b −b) as

$$\displaystyle \begin{aligned} \begin{array}{rcl} A^T \left (AA^\dagger \boldsymbol{b}-\boldsymbol{b} \right ) &\displaystyle = &\displaystyle A^T \left ( \left (AA^\dagger \right)^T \boldsymbol{b}-\boldsymbol{b}\right) =\left(\left(AA^\dagger A\right)^T -A^T\right)\boldsymbol{b}\\ &\displaystyle = &\displaystyle \left(AA^\dagger A-A \right)^T \boldsymbol{b}= 0 \boldsymbol{b}=0 \end{array} \end{aligned} $$
(3.90)

With taking transposition of the above, we can also get

$$\displaystyle \begin{aligned} \left (AA^\dagger \boldsymbol{b}-\boldsymbol{b} \right)^T A =0 \end{aligned} $$
(3.91)

Using these, we can show

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left | A \boldsymbol{x}- \boldsymbol{b} \right |{}^2 &\displaystyle = &\displaystyle \left | \left (A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right )+ \left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right)\right|{}^2 \end{array} \end{aligned} $$
(3.92)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right |{}^2 + \left (A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right)^T\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right) \\ &\displaystyle &\displaystyle +\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right)^T\left(A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$
(3.93)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b} \right|{}^2 + \left (\boldsymbol{x} - A^\dagger \boldsymbol{b} \right)^TA^T\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right) \\ &\displaystyle &\displaystyle +\left(AA^\dagger \boldsymbol{b}- \boldsymbol{b} \right)^TA\left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$
(3.94)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left |A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right|{}^2 + \left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right)^T 0 \\ &\displaystyle &\displaystyle +0\left(\boldsymbol{x} - A^\dagger \boldsymbol{b}\right) +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$
(3.95)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left|A\boldsymbol{x} - AA^\dagger \boldsymbol{b}\right |{}^2 +\left|AA^\dagger \boldsymbol{b}- \boldsymbol{b}\right|{}^2 \end{array} \end{aligned} $$
(3.96)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \geq &\displaystyle \left|AA^\dagger \boldsymbol{b}- \boldsymbol{b} \right|{}^2 \end{array} \end{aligned} $$
(3.97)

This means that x = A b is an optimal solution of Eq. (3.72).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Taguchi, Yh. (2020). Tensor Decomposition. In: Unsupervised Feature Extraction Applied to Bioinformatics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-22456-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22456-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22455-4

  • Online ISBN: 978-3-030-22456-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics