# A Note on Nonclosed Tensor Formats

## Abstract

Various tensor formats exist which allow a data-sparse representation of tensors. Some of these formats are not closed. The consequences are (i) possible non-existence of best approximations and (ii) divergence of the representing parameters when a tensor within the format tends to a border tensor outside. The paper tries to describe the nature of this divergence. A particular question is whether the divergence is uniform for all border tensors.

## Introduction

Given (finite-dimensional) vector spaces Vj we denote the corresponding tensor space by

$$\mathbf{V}={\bigotimes}_{j=1}^{d}V_{j}.$$

Since usually the dimension $${\prod }_{j=1}^{d}\dim (V_{j})$$ of V is rather huge, the numerical treatment of tensor needs special data-sparse representation techniques. The oldest one is the r-term format: given a representation rank r we form all tensors which can be written as a sum of r elementary tensors, where $$r\in \mathbb {N}_{0}:=\mathbb {N} \cup \{0\}$$. This yields the subset

$$\mathcal{R}_{r}=\left\{\mathbf{v} \in \mathbf{V}:~\mathbf{v}={\sum}_{\nu=1}^{r}{\bigotimes}_{j=1}^{d}v_{\nu}^{(j)}~\text{ with }~v_{\nu}^{(j)}\in V_{j}\right\}$$

of V. Under certain conditions a tensor in $$\mathcal {R}_{r}$$ might have an essentially unique representation, i.e., different representations only differ by the order of the terms and the scaling of the vectors $$\{v_{\nu }^{(j)}:1\leq j\leq d\}$$ (cf. Section 3.2).

Although this approach may be very successful for certain problems, it also has an unpleasant property which we are going to explain.

The tensor rank of vV is defined as the smallest r with $$\mathbf {v}\in \mathcal {R}_{r}$$:Footnote 1

$$\text{rank}(\mathbf{v}):=\min\left\{r\in\mathbb{N}_{0}:~\mathbf{v}\in\mathcal{R}_{r}\right\}.$$

This allows to describe $$\mathcal {R}_{r}$$ by {vV : rank(v) ≤ r}. In the case of d = 2, tensor spaces are isomorphic to matrix spaces. Then the tensor rank coincides with the usual matrix rank. For matrices it is well known that a convergent sequence {Mk} with rank(Mk) ≤ r has a limit M with rank(M) ≤ r, i.e., the set of matrices of rank ≤ r is closed. This is not true for tensors of order d ≥ 3. As an example consider the tensor-valued function

$$\mathbf{w}(t):=(a+tb) \otimes (a+tb) \otimes (a+tb) \in \otimes^{3}V,$$

where a, bV are linearly independent vectors (i.e., V = Vj and $$\dim (V)\geq 2$$). The derivative $$\mathbf {v}:=\mathbf {w}^{\prime }(0)$$ is the symmetric tensor

$$\mathbf{v}=b\otimes a\otimes a+a\otimes b\otimes a+a\otimes a\otimes b.$$
(1)

The derivative can be approximated by Newton’s divided difference quotient

$${\tilde{\mathbf{v}}}(h):=\frac{1}{h}(\mathbf{w}(h)-\mathbf{w}(0)) \rightarrow \mathbf{v}\qquad \text{for }h\rightarrow0.$$
(2)

### Remark 1

1. (a)

It can be proved that rank(v) = 3 for v in (1).

2. (b)

Since w(t) is of rank 1, the approximation satisfies $$\text {rank}({\tilde {\mathbf {v}}}(h))\leq 2,$$ i.e., $${\tilde {\mathbf {v}}}(h)\in \mathcal {R}_{2}$$.

3. (c)

From (a) and (b) we conclude that $$\mathcal {R}_{2}$$ is not closed.

Part (c) states that, in general, the r-term representation is not closed. This leads to the notation of the border rank

$$\underline{\text{rank}}(\mathbf{v}):=\min\left\{r:\mathbf{v}\in \overline{\mathcal{R}}_{r}\right\}.$$

It is related to the usual rank by $$\underline {\text {rank}}(\mathbf {v})\leq \text {rank}(\mathbf {v})\leq \left (\underline {\text {rank}}(\mathbf {v})\right )^{d-1}$$. The first inequality is trivial. For the second one, let $$\mathbf {v}_{i}\in \mathcal {R}_{r}$$ ($$r:=\underline {\text {rank}}(\mathbf {v})$$) be tensors converging to v. Then $$\mathbf {v}_{i}\in {\bigotimes }_{j=1}^{d}U_{j}^{\min \limits }(\mathbf {v}_{i})$$ holds for the minimal subspaces $$U_{j}^{\min \limits }(\mathbf {v}_{i})$$ (cf. Hackbusch [9, Section 6]). $$\mathbf {v}_{i}\in \mathcal {R}_{r}$$ implies $$\dim U_{j}^{\min \limits }(\mathbf {v}_{i})\leq r$$. [9, Theorem 6.24] proves $$\dim U_{j}^{\min \limits }(\mathbf {v})\leq r$$ for the corresponding subspaces in $$\mathbf {v}\in \mathbf {U}:={\bigotimes }_{j=1}^{d}U_{j}^{\min \limits }(\mathbf {v})$$. The maximal rank in U is bounded by rd− 1 proving the second inequality (cf. [9, Section 3.2.6.4]).

The nonclosedness implies that a typical approximation problem as

$$\text{find the minimiser of }\inf\left\{\|\mathbf{v}-\mathbf{w}\|:~\mathbf{w}\in\mathcal{R}_{r}\right\} \quad\text{for some }\mathbf{v}\in\mathbf{V}$$
(3)

might be unsolvable since any convergent sequence $$\mathbf {w}_{k}\in \mathcal {R}_{r}$$ with $$\| \mathbf {v}{-}\mathbf {w}_{k}\| \!\rightarrow \! \inf \|\mathbf {v}{-}\mathbf {w}\|$$ may tend to a tensor w of larger rank (but with $$\underline {\text {rank}}(\mathbf {w})\leq r$$) outside of $$\mathcal {R}_{r}$$. Since the border set $$\overline {\mathcal {R}_{r}}\backslash \mathcal {R}_{r}$$ is of measure zero, one might consider this as a marginal problem. However, De Silva–Lim [4] prove that those vV for which problem (3) is unsolvable have a positive measure. To be precise, this result holds for $$\mathbb {R}$$ as underlying field. A new result by Qi–Michałek–Lim [13] states that in the complex case this exceptional set is of measure zero.

Many numerical approaches lead to optimisation problems within the set $$\mathcal {R}_{r}$$. If the minimiser does not exist, any numerical method is in trouble. As in the case of (2) the coefficients increase to infinity although its sum is bounded. This fact leads to the typical numerical cancellation. The instability of divided difference quotients is well known in numerical mathematics.

The paper is not restricted to the format $$\mathcal {R}_{r}$$ but to rather general nonclosed formats. An example of another nonclosed tensor format is the cyclic matrix product representation. A special variant is the site-independent cyclic matrix product representation for the case of Vj = V (cf. Perez–Garcia et al. [12, Section 3.2.1]). Let $$n:=\dim (V)$$. For tuples (M[i] : 1 ≤ in) of r × r matrices define the corresponding tensor v ∈⊗dV componentwise by

$$\mathbf{v}[i_{1},i_{2},\ldots,i_{d}]=\text{trace}\left( M[i_{1}]M[i_{2}]\cdot\ldots\cdot M[i_{d}]\right).$$

Such tensors define the set $$\mathcal {C}_{\text {ind}}(d,r,n)$$. Already $$\mathcal {C}_{\text {ind}}(3,2,3)$$ and $$\mathcal {C}_{\text {ind}}(4,2,2)$$ are nonclosed (cf. [5]). The general conclusion is that, in general, graph-based tensor format are nonclosed if the graph does not degenerate to a tree (cf. Landsberg [11, Theorem 14.1.2.2]).

In the following we characterise the divergence of the coefficients of a sequence $$\mathbf {w}_{k}\in \mathcal {F}$$ converging to a border tensor (i.e., a tensor in $$\overline {\mathcal {F}}\backslash \mathcal {F}$$). In particular it is interesting to know how strong the divergence is and whether it is uniform for all such tensors. If the divergence could be arbitrarily weak, the numerical instability would be negligible.

We also study the order of divergence in a neighbourhood of a border tensor and whether this quantity behaves continuously.

## Notations and Definitions

### Tensor Representation

$$\mathbb {K}\in \{\mathbb {R},\mathbb {C}\}$$ denotes the field on the following vector spaces. In general, a tensor representation is given by a map ρ from a parameter set into the tensor space. We suppose that

$$\begin{array}{@{}rcl@{}} P\text{ i{\kern-.2pt}s{\kern-.2pt} a v{\kern-.2pt}e{\kern-.2pt}c{\kern-.2pt}t{\kern-.2pt}o{\kern-.2pt}r space with }\dim(P) & {<}& \infty,\quad\mathcal{D} \subset P\text{ a closed subset}, \end{array}$$
(4a)
$$\begin{array}{@{}rcl@{}} \mathbf{V}\text{ is a tensor space with }\dim(\mathbf{V}) & <&\infty,\quad\mathcal{F}\subset\mathbf{V}\text{ subset of two-sided cone structure},{\kern9pt}\quad \end{array}$$
(4b)
$$\begin{array}{@{}rcl@{}} \rho &:&\mathcal{D}\rightarrow \mathcal{F}\quad \text{ continuous and surjective}, \end{array}$$
(4c)
$$\begin{array}{@{}rcl@{}} 0 &\in&\mathcal{D}\quad\text{ and }\rho(0)=0. \end{array}$$
(4d)

In the sequel, (4d4d) are assumed to be valid.

By definition, the tensor subset $$\mathcal {F}$$ is the range of ρ. The cone structure ensures that with v also λv belongs to $$\mathcal {F}$$ for all $$\lambda \in \mathbb {K}$$. In most of the examples, ρ is not injective, and $$\mathcal {D}=P$$ holds (cf. Section 2.2). The standard representations ρ are multilinear so that (4d4d) is an easy consequence. In the following, we choose some norms on P and V, both denoted by ∥⋅∥. Because of the finite dimensions, the choice of the norm is not essential for the following considerations.

Let v = ρ(p). The ratio ∥p∥/∥v∥ = ∥p∥/∥ρ(p)∥ may be considered as a stability measure for the representation of v by p (cf. Section 1). Since ρ is not necessarily injective, there might be many p with v = ρ(p). Therefore, we define

$$\sigma(\mathbf{v}):=\inf\{\| p\| :~\mathbf{v}=\rho(p),~p\in\mathcal{D}\}.$$

By compactness, the infimum may be replaced by a minimum:

### Remark 2

Each $$\mathbf {v}\in \mathcal {F}$$ has at least one $$p_{\mathbf {v}}\in \mathcal {D}$$ with v = ρ(pv) and σ(v) = ∥pv∥.

Since, in general, there is no scale invariance (cf. Section 2.2), we shall consider the ratio ∥p∥/∥v∥ only for normalised tensors so that ∥p∥/∥v∥ = ∥p∥.

In numerical applications we often work with approximations instead of the exact tensor. Therefore, the quantity $$\sigma ({\tilde {\mathbf {v}}})$$ is of interest for $${\tilde {\mathbf {v}}}$$ in a neighbourhood of v. For this purpose we define the ε-neighbourhood of some $$\mathbf {v}\in \overline {\mathcal {F}}$$ by

$$U_{\mathcal{F},\varepsilon}(\mathbf{v}):=\{\mathbf{w}\in\mathcal{F}:~\| \mathbf{v}-\mathbf{w}\| <\varepsilon\}$$

and the stability quantity by

$$\sigma_{\varepsilon}(\mathbf{v}):=\sup\{\sigma(\mathbf{w}):~\mathbf{w}\in U_{\mathcal{F},\varepsilon}(\mathbf{v})\} \qquad\text{for }~\varepsilon>0.$$

Note that σε is defined for all $$\mathbf {v}\in \overline {\mathcal {F}}$$ and that $$\sigma _{\varepsilon } (\mathbf {v})=\infty$$ may happen.

Since σε(v) is weakly decreasing as ε ↘ 0 and bounded from below by zero, the improper limit

$$\sigma_{0}(\mathbf{v}):=\lim\limits_{\varepsilon\searrow0}\sigma_{\varepsilon }(\mathbf{v})$$

exists ($$\sigma _{0}(\mathbf {v})=\infty$$ holds if $$\sigma _{\varepsilon }(\mathbf {v})=\infty$$ for all ε > 0).

### Standard Example $$\mathcal {F}=\mathcal {R}_{r}$$

Let $$\mathbf {V}={\bigotimes }_{j=1}^{d}V_{j}$$. In the case of the r-term format $$\mathcal {F}=\mathcal {R}_{r}$$ we may choose the vector space P = (V1 ×⋯ × Vd)r, the domain $$\mathcal {D}=P$$, and the mapping

$$\rho\left( \left( (v_{i}^{(j)})_{i=1,\ldots,r}\right)_{j=1,\ldots,d}\right) =\sum\limits_{i=1}^{r}{\bigotimes}_{j=1}^{d}v_{i}^{(j)}.$$

A natural choice of the norm on P is $$\|(v_{i}^{(j)})\|=\sqrt {{\sum }_{i=1}^{r}{\sum }_{j=1}^{d}\|v_{i}^{(j)}\|^{2}}$$. Since ρ(λp) = λdρ(p), the ratio ∥p∥/∥ρ(p)∥ is not invariant with respect to scaling.

To ensure scalability we may choose P := Vr instead of which we restrict to the domain of r-tuples of elementary tensors:

$$\mathcal{D}=\{\mathbf{v}\in\mathbf{V}:~\text{rank}(\mathbf{v})\leq1\}^{r}=(\mathcal{R}_{1})^{r}.$$

Since $$\mathcal {R}_{1}$$ is closed (cf. Hackbusch [9, Lemma 9.11]), $$\mathcal {D}\subset P$$ satisfies (4d). Now the tensor representation is

$$\rho\left( (e_{i})_{i=1,\ldots,r}\right) = \sum\limits_{i=1}^{r}e_{i}\qquad(e_{i}\in\mathcal{R}_{1})$$

with the norm $$\|(e_{i})_{i=1,\ldots ,r}\| = \sqrt {{\sum }_{i=1}^{r}\|e_{i}\|^{2}}$$. As ρ(λp) = λρ(p), the ratio ∥p∥/∥ρ(p)∥ is scale invariant.

## Nonclosed Formats

As mentioned above, we assume throughout the article that (4d4d) hold.

### Instability Properties

Now we suppose that the format $$\mathcal {F}$$ is not closed. Then there is a nonempty set $${\mathscr{B}}$$ such that the closure of $$\mathcal {F}$$ can be split into

$$\overline{\mathcal{F}}=\mathcal{F}\cup \mathcal{B}\qquad\text{(disjoint union).}$$

We call $${\mathscr{B}}$$ the border set since in the case of $$\mathcal {F}=\mathcal {R}_{r}$$ it consists of tensors with border rank ≤ r, while the usual rank is > r.

Any tensor $$\mathbf {v}\in {\mathscr{B}}$$ is the limit of tensors vi in $$\mathcal {F}$$. The next statement is a simple observation, but fundamental for the following.

### Lemma 1

Let $$\mathbf {v}_{i}\in \mathcal {F}$$ with vi := ρ(pi) be a convergent sequence with the limit $$\mathbf {v}=\lim \mathbf {v}_{i}$$. Then $$\sup _{i}\|p_{i}\| <\infty$$ implies $$\mathbf {v}\in \mathcal {F}$$. The condition $$\sup _{i}\|p_{i}\| <\infty$$ can be replaced by $$\sup _{i}\sigma (\mathbf {v}_{i})<\infty$$.

### Proof

Since the set $$\{p\in \mathcal {D}:~\|p\| \leq C\}$$ with $$C:=\sup _{i}\|p_{i}\|$$ is compact (cf. (4d)), there is a subsequence—again denoted by (pi)—with $$p_{i}\rightarrow p\in \mathcal {D}$$. Continuity of ρ (cf. (4d)) implies that $$\mathbf {v}=\lim \rho (p_{i})=\rho (p)$$. Hence v belongs to the range $$\mathcal {F}$$ of ρ, i.e., $$\mathbf {v}\in \mathcal {F}$$. □

By negation we conclude the following result.

### Lemma 2

Let $$\mathbf {v}_{i}:=\rho (p_{i})\in \mathcal {F}$$ converge to $$\mathbf {v}\in {\mathscr{B}}$$. Then $$\|p_{i}\| \rightarrow \infty$$.

Note that σε(v) ≥ 0 is well-defined for $$\mathbf {v}\in {\mathscr{B}}$$ and ε > 0 since $$U_{\mathcal {F},\varepsilon }(\mathbf {v})$$ is a nonempty subset of $$\mathcal {F}$$. A consequence of Lemma 2 is

### Conclusion 1

If $$\mathbf {v}\in {\mathscr{B}}$$, then $$\sigma _{\varepsilon }(\mathbf {v})=\infty$$ holds for all ε > 0 and leads to $$\sigma _{0}(\mathbf {v})=\infty$$.

### Proof

If $$\sigma _{\varepsilon }(\mathbf {v})=:C<\infty$$ we can choose $$\mathbf {v}_{i}\in U_{\mathcal {F},\varepsilon }(\mathbf {v})$$ with $$\mathbf {v}_{i}\rightarrow \mathbf {v}$$ and parameters $$p_{i}\in \mathcal {D}$$ with vi = ρ(pi) and ∥pi∥≤ C (cf. Remark 2). Lemma 2 yields the contradiction $$\mathbf {v}\notin {\mathscr{B}}$$. □

In Section 3.2 we shall comment on the continuity of σ. A general negative result follows.

### Remark 3

If $${\mathscr{B}}\neq \emptyset$$ (i.e., if the format is nonclosed), σ is discontinuous at 0 ∈V.

### Proof

(4d) implies σ(0) = 0. Assume that σ is continuous at 0. There is a neighbourhood $$U_{\mathcal {F},\varepsilon }(0)=:U_{\mathcal {F},\varepsilon }$$ for some ε > 0 with σ(v) ≤ 1 for all $$\mathbf {v}\in U_{\mathcal {F},\varepsilon }$$. Hence, σε−∥w(w) ≤ 1 holds for all $$\mathbf {w}\in U_{\mathcal {F},\varepsilon }$$. Conclusion 1 implies that $$\overline {U_{\mathcal {F},\varepsilon }}\cap {\mathscr{B}}=\emptyset$$. On the other hand, there is some $$0\neq \mathbf {v}\in {\mathscr{B}}$$. The cone property (4d) implies that $$\lambda \mathbf {v}\in {\mathscr{B}}$$ for all λ≠ 0. For sufficiently small λ≠ 0, $$\lambda \mathbf {v}\in \overline {U_{\mathcal {F},\varepsilon }}$$ yields the contradiction. □

### Discussion of $$\mathcal {F}=\mathcal {R}_{r}$$

Let $$\mathbb {K}=\mathbb {C}$$ be the underlying field. An interesting question is whether the quantity σ(v) is continuous. As known from algebraic geometry general tensorsFootnote 2 in $$\mathcal {R}_{r}$$ admit only finitely many (essentially different) decompositions (cf. Section 1) and these decompositions depend continuously on the tensor, at least for r not too large. Then $$\sigma _{\varepsilon }(\mathbf {v})<\infty$$ holds for sufficiently small ε > 0 and has the limit σ0(v) = σ(v).

A particular positive result holds if the representation $$\rho :{\mathcal {D}}_{0}\subset \mathcal {D}\rightarrow \mathbf {V}$$ is injective for a certain subset $$\mathcal {D}_{0}$$ and the inverse map—the decomposition—$$\rho ^{-1}:\mathcal {F}_{0}:=\rho (P_{0})\rightarrow P$$ is continuous. Then σε(v) is bounded for $$\mathbf {v}\in \mathcal {F}_{0}$$ and σ(v) = σ0(v). This situation occurs under the conditions studied in Sørensen et al. [14,15,16] and Domanov–De Lathauwer [6,7,8].

Above we require that r be not too large. In the case of d = 3 and $$\mathbf {V}=\mathbb {K}^{n}\otimes \mathbb {K}^{m}\otimes \mathbb {K}^{p}$$ the concrete condition is as follows. For r ≤ (n − 1)(m − 1) general tensors have a unique decomposition as stated in Domanov–De Lathauwer [7, Corollary 1.7]. However, if r ≤ (n − 1)(m − 1) + 1 andFootnote 3$$\mathbb {K}=\mathbb {C}$$, general tensors have finitely many decompositions (cf. Chiantini–Ottaviani [2, Proposition 5.4]).

The term ‘general tensor’ admits the existence of exceptional tensors. A particular exceptional situation holds for the tensor in

### Example 1

Set V = ⊗3V with $$\dim (V)\geq 2$$ and choose any linearly independent vectors a, bV. In the case of the 2-term format $$\mathcal {F}=\mathcal {R}_{2}$$, the tensorFootnote 4

$$\mathbf{v}(t):=(a+tb)\otimes a\otimes a+a\otimes b\otimes a+a\otimes a\otimes b$$

belongs to $${\mathscr{B}}$$ for t > 0, while $$\mathbf {v}(0)\in \mathcal {F}$$.

### Proof

1. (a)

For t = 0 we rewrite v(0) as $$a\otimes a\otimes (a+b) + a\otimes b\otimes a\in \mathcal {R}_{2}=\mathcal {F}$$.

2. (b)

Let t > 0. For c linearly independent of a, the tensor w := caa + aca + aac is well known to be in $${\mathscr{B}}=\overline {\mathcal {R}_{2}}\backslash \mathcal {R}_{2}$$ (cf. Remark 1a). Let φ be an isomorphism on V with φ(a) = a and $$\varphi (c)=\frac {1}{t}c$$, while ψ = tid. Then Λ := ψφφ is an isomorphism on V with u := Λ(w) = tcaa + aca + aac. Hence also $$\mathbf {u}\in {\mathscr{B}}$$. Substitution $$c=b+\frac {1}{t+2}a$$ yields u = v(t) and proves $$\mathbf {v}(t)\in {\mathscr{B}}$$ for t > 0.

The interesting conclusion from Example 1 is that the set $${\mathscr{B}}$$ is not closed. We define

$$\overline{\mathcal{B}}=\mathcal{B}\cup\mathcal{\partial B}\qquad(\text{disjoint union)}.$$
(5)

Note that the tensor v(0) defined in Example 1 belongs to $$\mathcal {\partial B}$$.

### General Case

If $${\mathscr{B}}$$ is nonempty, also $$\mathcal {\partial B}\neq \emptyset$$ holds since

$$0\in\mathcal{\partial B}$$
(6)

is always true (consider λv with $$\mathbf {v} \in {\mathscr{B}}$$ for $$\lambda \rightarrow 0$$ and note that $$0\in \mathcal {F}$$ because of (4dd)). Example 1 ensures that $$\mathcal {\partial B}$$ may also contain nontrivial tensors of $$\mathcal {F}=\mathcal {R}_{2}$$. We remark that

$$\mathcal{\partial B}\subset\mathcal{F}$$
(7)

since $$\mathcal {\partial B}\subset \overline {{\mathscr{B}}}\subset \overline {\mathcal {F}}=\mathcal {F\cup B}$$ and $$\mathcal {\partial B}\cap {\mathscr{B}}=\emptyset$$ (cf. (5)).

### Conclusion 2

Let $$0\neq \mathbf {v}\in \mathcal {\partial B}$$. Then $$\sigma _{\varepsilon }(\mathbf {v})=\infty$$ for all ε > 0 and $$\sigma _{0}(\mathbf {v})=\infty$$, although $$\sigma (\mathbf {v})<\infty$$.

### Proof

By definition of $$\mathcal {\partial B}$$ there is some $$\mathbf {w}\in {\mathscr{B}}$$ with 0 < η := ∥vw∥ < ε/2 and $$\sigma _{\eta }(\mathbf {w})=\infty$$ (cf. Conclusion 1). Since $$U_{\mathcal {F},\eta }(\mathbf {w})\subset U_{\mathcal {F},\varepsilon }(\mathbf {v})$$, σε(v) ≥ ση(w) yields the assertion. □

A consequence is the discontinuity of σ on $$\mathcal {\partial B}$$. Conclusion 2 ensures the existence of a sequence $$\mathbf {v}_{i}\rightarrow \mathbf {v}$$ with $$\lim \sigma (\mathbf {v}_{i})\rightarrow \infty >\sigma (\mathbf {v})$$. This proves:

### Conclusion 3

σ is not continuous at $$\mathbf {v}\in \mathcal {\partial B}\backslash \{0\}$$.

## On the Strength of Divergence

The numerical instability is caused by the fact that $$\sigma (\mathbf {v}_{i})\rightarrow \infty$$ holds for any sequence $$\mathcal {F}\ni \mathbf {v}_{i}\rightarrow \mathbf {v}\in {\mathscr{B}}$$. Whether this is a severe problem or not depends on the order of divergence. The introductory example (2) is the classical one-sided difference quotient. Using the step size h for v(h), we get $$\sigma ({\tilde {\mathbf {v}}}(h))=\mathcal {O}(1/h)$$ and the accuracy $$\varepsilon =\mathcal {O}(h)$$. Expressing the quantity σ as a function of the accuracy ε, $$\sigma =\mathcal {O}(1/\varepsilon )$$ shows divergence of first order. However, this is not the general behaviour. We may choose the central difference quotient. Since still $$\sigma ({\tilde {\mathbf {v}}}(h))=\mathcal {O}(1/h)$$ but $$\varepsilon =\mathcal {O}(h^{2})$$, we now have the weaker divergence $$\sigma =\mathcal {O}(1/\sqrt {\varepsilon })$$.

This example shows that, given an accuracy ε > 0, we have to look for an approximation $$\mathbf {w}\in \mathcal {F}$$ with minimal σ(w). This leads to the following definition.

### Definition 1

Let $$\mathbf {v}\in {\mathscr{B}}$$ and ε > 0. The instability of the approximation problem in $$\mathcal {F}$$ is described by

$$\delta(\mathbf{v},\varepsilon):=\inf\{\sigma(\mathbf{w}):~\mathbf{w}\in U_{\mathcal{F},\varepsilon}(\mathbf{v})\}.$$

Note that δ(v, ε) is the infimum, whereas σε(v) is the supremum over the same set. Again, δ(v, ε) diverges as $$\varepsilon \rightarrow 0$$.

### Proposition 1

Weakly monotone divergence $$\delta (\mathbf {v},\varepsilon )\nearrow \infty$$ holds for all $$\mathbf {v}\in {\mathscr{B}}$$ as ε ↘ 0.

### Proof

For an indirect proof assume that $$\delta (\mathbf {v},\varepsilon )\leq K<\infty$$ for all $$\varepsilon =\frac {1}{n}>0$$. Then, for any $$n\in \mathbb {N}$$, there are $$\mathbf {w}_{n}\in U_{\mathcal {F},1/n}$$ (i.e., $$\mathbf {w}_{n}\in \mathcal {F}$$ and ∥vwn∥≤ 1/n) with σ(wn) ≤ K + 1. Since $$\mathbf {w}_{n}\rightarrow \mathbf {v}$$, Lemma 1 proves the contradicting statement $$\mathbf {v}\in \mathcal {F}$$. □

### Uniform Strength of Divergence

#### Definitions

The function δ(v,⋅) is the exact description of the kind of divergence at $$\mathbf {v}\in {\mathscr{B}}$$. In the case of the model example (1) and $$\mathcal {F}=\mathcal {R}_{2}$$ we have seen that the divergence is not stronger than $$\mathcal {O}(1/\sqrt {\varepsilon })$$—so that $$\delta (\mathbf {v},\varepsilon )\lesssim 1/\sqrt {\varepsilon }$$—if we use the central difference quotient. There are difference formulae of higher consistency order, but they involve more than two terms, i.e., such approximations are not in $$\mathcal {R}_{2}$$. This leads to the conjecture that $$\delta (\mathbf {v},\varepsilon )\sim 1/\sqrt {\varepsilon }$$ holds for v in (1). The next question is as to whether this behaviour might hold for all $$\mathbf {v}\in {\mathscr{B}}$$. The answer will be negative. There are particular tensors which behave differently. On the algebraic side, one might expand the difference quotient with step size h into a power series $$\mathbf {v}+{\sum }_{j}\mathbf {v}_{j}h^{j}$$. In general, the central difference leads to v1 = 0 and some v2. However, for certain v it might happen that v2 = v3 = ⋯ = vk− 1 = 0 so that $$\delta (\mathbf {v},\varepsilon )\sim \varepsilon ^{-1/k}$$. The characterisation of the largest possible k seems to be an unsolved problem. In the following we try to treat this problem by analytic tools.

#### Uniform Divergence

The strongest formulation of uniform divergence would be an inequality

$$\delta(\mathbf{v},\varepsilon)\geq\delta_{0}(\varepsilon)\qquad\text{for all }\mathbf{v}\in\mathcal{B}\text{ with }\|\mathbf{v}\|=1,$$
(8a)

where

$$\delta_{0}(\varepsilon)\nearrow\infty\quad\text{as}\quad\varepsilon \searrow0.$$
(8b)

The best possible δ0 satisfying (8a) is

$$\begin{array}{@{}rcl@{}} \delta_{0}(\varepsilon) & :=&\inf\{\delta(\mathbf{v},\varepsilon):~\mathbf{v}\in\mathcal{B},~\|\mathbf{v}\| =1\}\\ & =&\inf\{\sigma(\mathbf{w}):~\mathbf{w}\in\mathcal{F},~\|\mathbf{v}-\mathbf{w}\| \leq\varepsilon,~\mathbf{v}\in \mathcal{B},~\|\mathbf{v}\| =1\}. \end{array}$$
(9)

By definition, δ0(ε) is weakly increasing. The crucial question is whether $$\lim _{\varepsilon \rightarrow 0}\delta _{0}(\varepsilon )<\infty$$ or $$=\infty$$.

### Proposition 2

Uniform divergence as in (8a8b) holds if and only if $${\mathscr{B}}\cup \{0\}$$ is closed.

### Proof

As remarked in (6), zero does not belong to $${\mathscr{B}}$$, but to its closure. Therefore, closedness of $${\mathscr{B}}\cup \{0\}$$ means that $$\overline {{\mathscr{B}}}$$ contains no nontrivial tensor. In particular $${\mathscr{B}}_{1}{:=}{\mathscr{B}}\cap \{\mathbf {v}{:}~\|\mathbf {v}\|{=}1\}$$ would be closed.(a) Let $${\mathscr{B}}\cup \{0\}$$ be closed. For an indirect proof assume $$\lim _{\varepsilon \rightarrow 0}\delta _{0}(\varepsilon )=:K<\infty$$. Then for any ε = 1/n, $$n\in \mathbb {N}$$, there are $$\mathbf {v}_{n}\in {\mathscr{B}}_{1}$$ and $$\mathbf {w}_{n}\in \mathcal {F}$$ with σ(wn) ≤ K + 1 and ∥vnwn∥≤ 1/n. By compactness we may take subsequences—again denoted by vn, wn—so that $$\mathbf {v}_{n}\rightarrow \mathbf {v}$$ and $$\mathbf {w}_{n}\rightarrow \mathbf {w}$$. Since $${\mathscr{B}}_{1}$$ is closed, we obtain $$\mathbf {v}\in {\mathscr{B}}_{1}\subset {\mathscr{B}}$$. As σ(wn) is uniformly bounded, the limit belongs to $$\mathcal {F}$$ (cf. Lemma 1), i.e., $$\mathbf {w}\in \mathcal {F}$$. Now ∥vnwn∥≤ 1/n yields the contradiction v = w ($$\mathcal {F}$$ and $${\mathscr{B}}$$ are disjoint!).(b) If $${\mathscr{B}}\cup \{0\}$$ is not closed, there is some $$0\neq \mathbf {w}\in \partial {\mathscr{B}}:=\overline {{\mathscr{B}}}\backslash {\mathscr{B}}$$. Thanks to the cone property (4db), we may assume without loss of generality that ∥w∥ = 1. Note that $$\partial {\mathscr{B}}\subset \mathcal {F}$$ (cf. (7)). Hence w has a finite value ω := σ(w). For any ε > 0 we find some $$\mathbf {v}\in {\mathscr{B}}_{1}$$ with ∥vw∥≤ ε. Now (9) implies that δ0(ε) ≤ ω for all ε > 0, i.e., the property (8a) is not valid. □

In the interesting case of $$\mathcal {F}=\mathcal {R}_{r}$$ we know that $${\mathscr{B}}\cup \{0\}$$ is not closed (cf. Example 1). Hence uniform divergence (8a8a) does not hold for $$\mathcal {F}=\mathcal {R}_{r}$$. Nevertheless it is possible to refine the definition of divergence.

#### Weaker Form of Uniform Divergence

In the case of $$\mathcal {F}=\mathcal {R}_{r}$$, the exceptional set $$\partial {\mathscr{B}}=\overline {{\mathscr{B}}}\backslash {\mathscr{B}}$$ is a rather small subset of $$\mathcal {F}$$. In the following we formulate an inequality involving the distance from $$\partial {\mathscr{B}}$$.

### Theorem 4

There is a function δ 1 with

$$\delta_{1}(\varepsilon)\nearrow\infty\quad\text{as}\quad\varepsilon\searrow0$$
(10a)

such that

$$\delta(\mathbf{v},\varepsilon)\geq\text{dist}(\mathbf{v},\partial\mathcal{B}) \delta_{1}(\varepsilon)\qquad\text{for all }\mathbf{v}\in\mathcal{B}\text{ with } \|\mathbf{v}\| =1.$$
(10b)

### Proof

(a) If $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})=0$$, the estimate δ(v, ε) ≥ 0 is trivial.(b) In the following we consider those v with $$\mathbf {v}\in {\mathscr{B}}$$, ∥v∥ = 1, and $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})>0$$. In this case the best possible δ1(ε) is

$$\begin{array}{@{}rcl@{}} \delta_{1}(\varepsilon) & :=&\inf\{\delta(\mathbf{v},\varepsilon)/\text{dist}(\mathbf{v},\partial\mathcal{B}):~\mathbf{v}\in\mathcal{B},~ \|\mathbf{v}\| =1,~\text{dist}(\mathbf{v},\partial\mathcal{B})>0\}\\ &{=}&\inf\{\sigma(\mathbf{w})/\text{dist}(\mathbf{v},\partial\mathcal{B}){:}~\mathbf{w}\in\mathcal{F},~\|\mathbf{v}{-}\mathbf{w}\| \leq\varepsilon,~ \mathbf{v}\in\mathcal{B},~\|\mathbf{v}\| =1,~\text{dist}(\mathbf{v},\partial\mathcal{B})>0\}. \end{array}$$

δ1 is weakly increasing as $$\varepsilon \rightarrow 0$$. For an indirect proof of (10a) we assume that $$\delta _{1}(\varepsilon )\leq K<\infty$$. As in the proof of Proposition 2 there are convergent subsequences $$\mathbf {w}_{n}\in \mathcal {F}$$, $$\mathbf {v}_{n}\in {\mathscr{B}}$$ with

$$\begin{array}{@{}rcl@{}} \mathbf{v} & =&\lim\limits_{n\rightarrow\infty}\mathbf{v}_{n},\qquad\mathbf{w}=\lim\limits_{n\rightarrow\infty}\mathbf{w}_{n},\\ \|\mathbf{v}_{n}-\mathbf{w}_{n}\| & \leq&\frac{1}{n},~ \mathbf{w}_{n}\in\mathcal{F},~ \mathbf{v}_{n}\in\mathcal{B},~\|\mathbf{v}_{n}\| =1,\text{dist}(\mathbf{v}_{n},\partial\mathcal{B})>0,\quad\text{ and }\\ \sigma(\mathbf{w}_{n}) & \leq& (K+1) \text{dist}(\mathbf{v}_{n},\partial\mathcal{B}). \end{array}$$

We conclude from (6) that $$\text {dist}(\mathbf {v}_{n},\partial {\mathscr{B}})\leq \text {dist}(\mathbf {v}_{n},0)=\|\mathbf {v}_{n}\| =1$$. Therefore σ(wn) ≤ K + 1 implies

$$\mathbf{w}\in\mathcal{F}$$
(11)

(cf. Lemma 1).

Next we check the limit $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})=\lim _{n\rightarrow \infty }\text {dist}(\mathbf {v}_{n},\partial {\mathscr{B}})>0$$. Assume that $$\text {dist}(\mathbf {v}_{n},\partial {\mathscr{B}})\rightarrow 0$$. Then also $$\sigma (\mathbf {w}_{n})\leq (K+1)\text {dist}(\mathbf {v}_{n},\partial {\mathscr{B}})\rightarrow 0$$. By Remark 2 there are parameters $$p_{n}\in \mathcal {D}$$ with wn = ρ(pn) and σ(wn) = ∥pn∥. Now $$\sigma (\mathbf {w}_{n})=\|p_{n}\| \rightarrow 0$$ proves $$p_{n}\rightarrow 0$$, while (4d4d) show that $$\mathbf {w}=\lim \mathbf {w}_{n}=\lim \rho (p_{n})=\rho (0)=0$$. However, since the norm is continuous, ∥w∥ = 0 is a contradiction to ∥w∥ = 1. The latter equality follows from $$\|\mathbf {v}_{n}-\mathbf {w}_{n}\| \leq \frac {1}{n}$$ and ∥vn∥ = 1. Hence $$\lim _{n\rightarrow \infty }\text {dist}(\mathbf {v}_{n},\partial {\mathscr{B}})=\text {dist}(\mathbf {v},\partial {\mathscr{B}})>0$$ holds and implies that $$\mathbf {v}\notin \partial {\mathscr{B}}$$. Since $$\mathbf {v}_{n}\in {\mathscr{B}}$$, the limit v is in $$\overline {{\mathscr{B}}}={\mathscr{B}}\cup \partial {\mathscr{B}}$$ (cf. (5)) and $$\mathbf {v}\notin \partial {\mathscr{B}}$$ proves

$$\mathbf{v}\in\mathcal{B}.$$
(12)

From $$\|\mathbf {v}_{n}-\mathbf {w}_{n}\| \leq \frac {1}{n}$$ we conclude v = w which is a contradiction since both tensors are in disjoint sets (cf. (11), (12)). □

The interpretation of Theorem 4 depends on the topological structure of $$\partial {\mathscr{B}}$$ as seen next.

### Remark 4

If $$\partial {\mathscr{B}}$$ is closed, the distance $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})$$ is positive for all $$\mathbf {v}\in {\mathscr{B}}$$. This yields a nontrivial estimate (??) for all $$\mathbf {v}\in {\mathscr{B}}$$.

### Proof

$$\mathbf {v}\in {\mathscr{B}}$$ and $$\partial {\mathscr{B}}\subset \mathcal {F}$$ implies $$\mathbf {v}\notin \partial {\mathscr{B}}$$. Note that $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})=0$$ for a closed set $$\partial {\mathscr{B}}$$ is equivalent to $$\mathbf {v}\in \partial {\mathscr{B}}$$. □

Finally we consider the case of a nonclosed set $$\partial {\mathscr{B}}$$. We split the closure $$\overline {\partial {\mathscr{B}}}$$ into disjoint sets

$$\overline{\partial\mathcal{B}}=\partial\mathcal{B}\cup\mathcal{C}.$$

### Remark 5

(a) $$\mathcal {C}$$ is a subset of $${\mathscr{B}}$$. (b) $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})=0$$ holds for $$\mathbf {v}\in {\mathscr{B}}$$ if and only if $$\mathbf {v}\in \mathcal {C}$$.

### Proof

(a) $$\partial {\mathscr{B}}\subset \overline {{\mathscr{B}}}$$ implies $$\overline {\partial {\mathscr{B}}}\subset \overline {{\mathscr{B}}}={\mathscr{B}}\cup \partial {\mathscr{B}}$$ and $$\mathcal {C}\subset {\mathscr{B}}\cup \partial {\mathscr{B}}$$. Since $$\mathcal {C}\cap \partial {\mathscr{B}}=\emptyset$$, $$\mathcal {C}\subset {\mathscr{B}}$$ is proved.(b) Note that $$\text {dist}(\mathbf {v},\partial {\mathscr{B}})=0$$ is equivalent to $$\text {dist}(\mathbf {v},\overline {\partial {\mathscr{B}}})=0$$. In the latter case there is some $$\mathbf {w}\in \overline {\partial {\mathscr{B}}}$$ with $$\|\mathbf {v}-\mathbf {w}\|=\text {dist}(\mathbf {v},\overline {\partial {\mathscr{B}}})=0$$, i.e., v = w. Comparing $$\mathbf {v}\in {\mathscr{B}}$$ and $$\mathbf {w}\in \overline {\partial {\mathscr{B}}}=\partial {\mathscr{B}}\cup \mathcal {C}$$ and noting that $$\partial {\mathscr{B}}\subset \mathcal {F}$$, it follows that $$\mathbf {v}=\mathbf {w}\in \mathcal {C}$$. □

In case of a nonclosed $$\partial {\mathscr{B}}$$, the estimate (??) degenerates to δ(v, ε) ≥ 0 if and only if $$\mathbf {v}\in \mathcal {C}$$.

### Remark 6

For $$\mathcal {F}=\mathcal {R}_r$$ it is not hard to prove that the divergence behaviour only depends on the border rank and the order d of the tensor, but not on $$\dim ({V}_{j})$$.

#### Example $${\otimes }^{3}{\mathbb {R}}^{2}$$

Obviously, it is interesting to know more about the topological structure of $${\mathscr{B}}$$ for various nonclosed tensor formats. Finally we consider the tensor space

$$\mathbf{V}=\otimes^{3}\mathbb{R}^{2}$$

which is the smallest nontrivial example. The maximal rank in V is 3 (cf. Kruskal [10]). Hence $$\mathcal {R}_{3}$$ coincides with V and is obviously closed. As seen by the tensor (1), $$\mathcal {R}_{2}$$ is not closed. In fact, (1) describes all border tensors up to tensor space isomorphisms:

$$\mathcal{B}=\left\{\left( {\bigotimes}_{j=1}^{3}\phi^{(j)}\right) (\mathbf{v)}:~\phi^{(j)}\in L(\mathbb{R}^{2},\mathbb{R}^{2})~\text{ isomorphism}\right\},$$

where v is defined in (1) with {a, b} being a fixed basis of $$\mathbb {R}^{2}$$. Let ϕ(2) = ϕ(3) be the identity and define ϕ(1) by ϕ(1)(a) = a, ϕ(1)(a) = a + tb. For t≠ 0, ϕ(1) is an isomorphism, whereas for t = 0 it is not invertible. Note that with these mappings $$\left ({\bigotimes }_{j=1}^{3}\phi ^{(j)}\right )(\mathbf {v)}$$ coincides with the tensor in Example 1. For t = 0 we obtain the tensor

$$\mathbf{w}_{1}=a\otimes a\otimes a+a\otimes b\otimes a+a\otimes a\otimes b\in\partial\mathcal{B}.$$

The same construction with respect to the directions j = 2 and j = 3 yields

$$\begin{array}{@{}rcl@{}} \mathbf{w}_{2} & =&b\otimes a\otimes a+a\otimes a\otimes a+a\otimes a\otimes b\in\partial\mathcal{B},\\ \mathbf{w}_{3} & =&b\otimes a\otimes a+a\otimes b\otimes a+a\otimes a\otimes a\in\partial\mathcal{B}. \end{array}$$

We obtain all tensors in $$\partial {\mathscr{B}}$$ by $$\left ({\bigotimes }_{j=1}^{3}\phi ^{(j)}\right )(\mathbf {v)}$$ when at least one $$\phi ^{(j)}\in L(\mathbb {R}^{2},\mathbb {R}^{2})$$ is not invertible. Such a tensor can be written as $$\left ({\bigotimes }_{j=1}^{3}\psi ^{(j)}\right ) (\mathbf {w}_{i}\mathbf {)}$$ with wi, i ∈{1,2,3}, and general linear maps $$\psi ^{(j)}\in L(\mathbb {R}^{2},\mathbb {R}^{2})$$, i.e.,

$$\partial\mathcal{B}=\bigcup\limits_{1\leq i\leq3}\left\{\left( {\bigotimes}_{j=1}^{3}\psi^{(j)}\right)(\mathbf{w}_{i}):~\psi^{(j)}\in L(\mathbb{R}^{2},\mathbb{R}^{2})\right\}.$$

Since $$L(\mathbb {R}^{2},\mathbb {R}^{2})$$ is closed we obtain the desired result:

### Proposition 3

$$\partial {\mathscr{B}}$$ is closed.

## Notes

1. 1.

Since we only consider finite-dimenional tensor spaces, all tensors are algebraic tensors, i.e., their rank is finite.

2. 2.

The term ‘general’ tensor means all tensors for which a polynomial does not vanish. The set of exceptional tensors is of measure zero.

3. 3.

The case of $$\mathbb {K}=\mathbb {R}$$ is more involved (cf. Angelini–Bocci–Chiantini [1, Theorem 4.2]).

4. 4.

Private communication by M. Michałek.

## References

1. 1.

Angelini, E., Bocci, C., Chiantini, L.: Real identifiability vs. complex identifiability. Linear Multilinear Algebra 66, 1257–1267 (2018)

2. 2.

Chiantini, L., Ottaviani, G.: On generic identifiability of 3-tensors of small rank. SIAM J. Matrix Anal. Appl. 33, 1018–1037 (2012)

3. 3.

Coppi, R., Bolasco, S. (eds.): Multiway Data Analysis. North-Holland, Amsterdam (1989)

4. 4.

De Silva, V., Lim, L. H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30, 1084–1127 (2008)

5. 5.

Czapliński, A., Michałek, M., Seynnaeve, T.: Uniform matrix product states from an algebraic geometer’s point of view. arXiv:1904.07563 (2019)

6. 6.

Domanov, I., De Lathauwer, L.: Canonical polyadic decomposition of third-order tensors: Reduction to generalized eigenvalue decomposition. SIAM J. Matrix Anal. Appl. 35, 636–660 (2014)

7. 7.

Domanov, I., De Lathauwer, L.: Generic uniqueness conditions for the canonical polyadic decomposition and INDSCAL. SIAM J. Matrix Anal. Appl. 36, 1567–1589 (2015)

8. 8.

Domanov, I., De Lathauwer, L.: Canonical polyadic decomposition of third-order tensors: Relaxed uniqueness conditions and algebraic algorithm. Linear Algebra Appl. 513, 342–375 (2017)

9. 9.

Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Springer, Berlin (2012). 2nd edn. appears in 2020

10. 10.

Kruskal, J. B.: Rank, decomposition, and uniqueness for 3-way and N-way arrays. In: Coppi, Bolasco (eds.) Multiway Data Analysis, pp 7–18. North-Holland, Amsterdam (1989)

11. 11.

Landsberg, J. M.: Tensors: Geometry and Applications. AMS, Providence (2012)

12. 12.

Perez-García, D., Verstraete, F., Wolf, M. M., Cirac, J. I.: Matrix product state representations. Quantum Inf. Comput. 7, 401–430 (2007)

13. 13.

Qi, Y., Michałek, M., Lim, L.H.: Complex best r-term approximations almost always exist in finite dimensions. Appl. Comput. Harmon. Anal. Available online (2019)

14. 14.

Sørensen, M., De Lathauwer, L.: Coupled canonical polyadic decompositions and (coupled) decompositions in multilinear rank-(lr, n, Lr, n, 1) terms—part I: Uniqueness. SIAM J. Matrix Anal. Appl. 36, 496–522 (2015)

15. 15.

Sørensen, M., De Lathauwer, L., Comon, P., Icart, S., Deneire, L.: Canonical polyadic decomposition with a columnwise orthonormal factor matrix. SIAM J. Matrix Anal. Appl. 33, 1190–1213 (2012)

16. 16.

Sørensen, M., Domanov, I., De Lathauwer, L.: Coupled canonical polyadic decompositions and (coupled) decompositions in multilinear rank-(lr, n, Lr, n, 1) terms—part II: Algorithms. SIAM J. Matrix Anal. Appl. 36, 1015–1045 (2015)

## Acknowledgments

Open access funding provided by Max Planck Society. I thank Mateusz Michałek (Leipzig) for many instructive discussions. From him I got better insight into the nature of the set $$\partial {\mathscr{B}}$$.

## Author information

Authors

### Corresponding author

Correspondence to W. Hackbusch.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Dedicated to the 65th birthday of Volker Mehrmann.

## Rights and permissions

Reprints and Permissions

Hackbusch, W. A Note on Nonclosed Tensor Formats. Vietnam J. Math. 48, 621–631 (2020). https://doi.org/10.1007/s10013-019-00372-4

• Accepted:

• Published:

• Issue Date:

### Keywords

• Tensor representation
• Tensor format
• Nonclosed tensor format
• Numerical instability

• 15A69
• 65F99