1 Introduction

Linear matrix equations have played a crucial role in control theory and differential equations; see, e.g., [14]. There was much attention given to the following matrix equations: the equation \(AXB=C\), the Sylvester equation \(AX+XB = C\), the Kalman–Yakubovich equation \(AXB+X =C\), and, more generally, the equation \(AXB+CXD = F\). Using the notions of the matrix Kronecker product and the vector operator, we can obtain their exact solutions. However, matrices with high dimensions (e.g., A, B of size \(10^{2} \times 10^{2}\)) cause their Kronecker product dimension to be very high (\(10^{4} \times 10^{4}\), in that case). The dimension problem leads to a computational difficulty due to exceeding computer memory when computing an inverse of the large matrix.

In practical applications, we solve the linear matrix equations of large dimensions by effective iterative methods. There are several ideas to formulate an iterative procedure, namely, one can use matrix sign function [5], block recursion [6, 7], Krylov subspace [8, 9], Hermitian and skew-Hermitian splitting [10, 11], and other related research works; see, e.g., [1215]. In the recent decade, the ideas of gradients, hierarchical identification and minimization of associated norm-error functions have encouraged and brought about many researches; see, e.g., [1628]. Such iterative schemes turn out to have wide applications in many engineering problems, especially in systems identification for parameter estimation; see, e.g., [2931].

In 2005, Ding and Chen applied the hierarchical identification principle to develop the gradient-based iterative (GI) algorithms for solving the equation \(\sum_{j=1}^{p}A_{j}XB_{j}=C\), which includes the Sylvester equation, as follows.

Proposition 1.1

([32])

If the matrix equation\(\sum_{j=1}^{p}A_{j}XB_{j}=C\)has a unique solutionX, then the iterative solution\(X(k)\)obtained from the gradient-based iterative (GI) algorithm given by

$$\begin{aligned}& X(k) = \bigl( X_{1}(k)+X_{2}(k)+\cdots + X_{p}(k) \bigr)/p, \\& X_{i}(k) = X(k-1) + \mu A_{i}^{T} \Biggl( C-\sum _{j=1}^{p}A_{j}X(k-1)B_{j} \Biggr) B_{i}^{T}, \\& \frac{1}{\mu } = \sum_{j=1}^{p}\lambda _{\max } \bigl( A_{j}A_{j}^{T} \bigr) \lambda _{\max } \bigl( B_{j}^{T}B_{j} \bigr) \quad \textit{or}\quad \frac{1}{\mu } = \sum_{j=1}^{p} \Vert A_{j} \Vert ^{2} \Vert B_{j} \Vert ^{2} \end{aligned}$$

converges to the solution X.

In 2008, Ding, Liu, and Ding derived the following three iterative methods for the equation \(AXB=C\), and the equation \(\sum_{j=1}^{p} A_{j}XB_{j}=F\).

Proposition 1.2

([33])

If the equation\(AXB=C\)has a unique solution\(X^{*}\), then the gradient-based iterative (GI) algorithm,

$$\begin{aligned}& X(k+1) = X(k)+\mu A^{T} \bigl( C-AX(k)B \bigr)B^{T}, \\& 0< \mu < \frac{2}{\lambda _{\max }(AA^{T})\lambda _{\max }(B^{T}B)} \quad \textit{or} \quad \mu \leqslant \frac{2}{ \Vert A \Vert ^{2} \Vert B \Vert ^{2}}, \end{aligned}$$

is such that\(X(k)\rightarrow X^{*}\).

Proposition 1.3

([33])

If the equation\(AXB=C\)has a unique solution\(X^{*}\), then the least squares (LS) iterative algorithm,

$$\begin{aligned} X(k+1) = X(k)+\mu \bigl(A^{T}A \bigr)^{-1}A^{T} \bigl( C-AX(k)B \bigr) B^{T} \bigl(BB^{T} \bigr)^{-1}, \quad 0< \mu < 2 \end{aligned}$$

is such that\(X(k)\rightarrow X^{*}\).

Proposition 1.4

([33])

If the matrix equation\(\sum_{j=1}^{p} A_{j}XB_{j} = F\)has a unique solutionX, then the iterative solution\(X(k)\)obtained from the least-squares-iterative (LSI) algorithm given by

$$\begin{aligned} X(k) = X(k-1)+\mu \sum_{i=1}^{p}(A_{i}^{T}A_{)}^{-1}A_{i}^{T} \Biggl(F- \sum_{j=1}^{p}A_{j}X(k-1)B_{j} \Biggr) B_{i}^{T} \bigl(B_{i}B_{i}^{T} \bigr)^{-1}, \end{aligned}$$

where\(0<\mu <2p\), converges to the solution X.

There are many variations and modifications of the GI algorithm [32], namely the RGI algorithm [34], the MGI algorithm [35], the JGI algorithm [36], and the AJGI algorithm [36].

In this paper, we introduce a gradient-descent iterative algorithm for solving the generalized Sylvester equation that takes the form

$$\begin{aligned} \sum_{t=1}^{p}A_{t}XB_{t}=C. \end{aligned}$$
(1)

Note that this equation includes all mentioned matrix equations as special cases. The obtained algorithm is based on the vector representation and the variants of the previous works in [32, 33]. The algorithm aims to minimize an error at each iteration by the idea of gradient-descent. We show that the proposed algorithm can be applied to any problems with any initial matrices as long as such problem has a unique solution. The convergence rate and error estimates are given in terms of the condition number of the associated iteration matrix. Numerical simulations reveal that our proposed algorithm performs well compared to the mentioned iterative methods. Moreover, our algorithm can be employed to a discretization of famous partial differential equations namely, the one-dimensional heat equation and the two-dimensional Poisson’s equation. Both equations are widely used in many areas of theoretical physics, electrostatic and mechanical engineering; see, e.g., [37] and [38]. According to our numerical results, the algorithm is applicable to both heat and Poisson’s equations comparing to their analytical solutions.

The outline of this paper is as follows. In Sect. 2, we supply auxiliary tools to solve linear matrix equations and to make a convergence analysis of an iterative method for solving such equations. We propose new algorithms for the equations \(AXB=C\) and \(\sum_{t=1}^{p}A_{t}XB_{t}\) in Sects. 3 and 4, respectively. In Sect. 5, we presented numerical simulations for various kinds of the linear matrix equations. In Sects. 6 and 7, we apply our algorithm to the one-dimensional heat equation and the two-dimensional Poisson’s equation, respectively. The numerical simulations for heat and Poisson’s equations are provided in their own sections. Finally, we present a conclusion in Sect. 8.

2 Preliminaries on matrix analysis

Throughout this paper, all considered matrices are real. Denote the set of \(m \times n\) matrices by \(M_{m,n}\). When \(m=n\), we write \(M_{n}\) instead of \(M_{n,n}\). Let I be an identity matrix of compatible dimension. The \((i,j)\)th entry of a matrix A is denoted by \(A(i,j)\) or \(a_{ij}\).

Recall the Löwner partial order ⪯ for real symmetric matrices:

$$\begin{aligned} A\preceq B \quad \Leftrightarrow \quad B-A \text{ is positive definite} \quad \Leftrightarrow \quad x^{T}A x\leqslant x^{T}Bx,\quad \text{for all }x\in \mathbb {R}^{n}. \end{aligned}$$

The Kronecker (tensor) product of \(A=[a_{ij}]\in M_{m,n}\) and \(B\in M_{p,q}\) is defined by

$$\begin{aligned} A\otimes B = [a_{ij}B]_{ij}\in M_{mp,nq}. \end{aligned}$$

The vector operator is defined for each \(A=[a_{ij}]\in M_{m,n}\) by

$$\begin{aligned} \operatorname {Vec}(A) = \begin{bmatrix} a_{11}\ \cdots\ a_{m1} & a_{12}\ \cdots\ a_{m2} & \cdots & a_{1n}\ \cdots\ a_{mn} \end{bmatrix}^{T}. \end{aligned}$$

It is clear that the vector operator is linear and injective.

Lemma 2.1

([39])

The Kronecker product and the vector operator posses the following properties provided that all matrices are compatible:

  1. (i)

    \((A\otimes B)^{T} = A^{T} \otimes B^{T}\),

  2. (ii)

    \((A \otimes B)(C \otimes D) = AC \otimes BD\),

  3. (iii)

    \(\operatorname {Vec}(ABC) = (C^{T}\otimes A)\operatorname {Vec}(B)\).

To perform convergence analysis, the spectral norm, the Frobenius norm, and the condition number of \(A\in M_{m,n}\) are used and respectively defined by

$$\begin{aligned} \Vert A \Vert _{2} = \sqrt{\lambda _{\max } \bigl(A^{T}A \bigr)}, \quad \quad \Vert A \Vert _{F} = \sqrt{ \operatorname {tr}\bigl(A^{T}A \bigr)}, \quad \quad \kappa (A)= \biggl( \frac{\lambda _{\max }(A^{T}A)}{\lambda _{\min }(A^{T}A)} \biggr)^{1/2}. \end{aligned}$$

We recall the following properties:

Lemma 2.2

([40])

For any compatible matricesAandB, we have

  1. (i)

    \(\Vert A^{T}A \Vert _{2}= \Vert A \Vert ^{2}_{2}\),

  2. (ii)

    \(\Vert A^{T} \Vert _{2} = \Vert A \Vert _{2}\),

  3. (iii)

    \(\Vert AB \Vert _{F}\leqslant \Vert A \Vert _{2} \Vert B \Vert _{F}\).

3 The equation \(AXB=C\)

Consider the matrix equation

$$\begin{aligned} AXB = C, \end{aligned}$$
(2)

where \(A\in M_{p,m}\) has full column-rank, \(B\in M_{n,q}\) has full row-rank, \(C\in M_{p,q}\) is a known constant matrix, and \(X\in M_{m,n}\) is unknown. The hypotheses imply the invertibility of \(A^{T} A\) and \(BB^{T}\), and thus we obtain the unique solution to be

$$\begin{aligned} X^{*} = \bigl(A^{T} A \bigr)^{-1}A^{T}CB^{T} \bigl(B B^{T} \bigr)^{-1}. \end{aligned}$$
(3)

However, to compute \((A^{T} A)^{-1}\) and \((B B^{T})^{-1}\) requires a large amount of data storage if the sizes of matrices are large. Thus, in this section, we shall propose a new iterative method to solve (2) based on gradients and the steepest descend which provides an appropriate sequence of convergent factors for minimizing an error at each iteration. Moreover, the discussion in this section leads to a treatment for a general matrix equation in Sect. 4.

3.1 Proposed algorithm

We consider the Frobenius norm-error \(\Vert AXB-C \Vert _{F}\) which can be equally transform into \(\Vert ( B^{T}\otimes A )\operatorname {Vec}(X)-\operatorname {Vec}(C) \Vert _{F}\) via Lemma 2.1(iii). So, we define the quadratic norm-error function \(f:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) by

$$\begin{aligned} f(x) := \frac{1}{2} \bigl\Vert \bigl( B^{T}\otimes A \bigr) x- \operatorname {Vec}(C) \bigr\Vert _{F}^{2}. \end{aligned}$$

We know that a norm function is a convex function, so f is convex. We assume that the exact solution \(X^{*}\) of (2) is uniquely determined, hence an optimal matrix \(X^{*}\) of f exists. We start by having an arbitrary initial matrix \(X(0)\) and then at every step \(k>0\) we iteratively move to the next matrix \(X(k+1)\) along an appropriate direction, i.e., the negative gradient of f, together with a suitable step size. In the kth step, the step size \(\tau _{k+1}\) is changed appropriately in order to incur the minimum error. The gradient-descent iterative method thus can be described through the following recursive rule:

$$\begin{aligned} X(k+1) = X(k)-\tau _{k+1}\nabla f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr). \end{aligned}$$

In order to do that, we recall the following gradient formula:

$$ \frac{d}{dX} \operatorname {tr}(AX) = \frac{d}{dX} \operatorname {tr}\bigl(X^{T}A^{T} \bigr)=A^{T}. $$

Now, we find the gradient of function f and deduce its derivatives in detail. Letting \(S = B^{T}\otimes A\), \(x=\operatorname {Vec}(X)\), and \(\widehat {c}= \operatorname {Vec}(C)\), we have

$$\begin{aligned} \nabla f(x) &= \frac{df(x)}{dx} = \frac{1}{2}\frac{d}{dx}\operatorname {tr}\bigl( ( Sx-\widehat {c})^{T} ( Sx-\widehat {c}) \bigr) \\ &= \frac{1}{2}\frac{d}{dx}\operatorname {tr}\bigl( Sxx^{T}S^{T}- \widehat {c}x^{T}S^{T}-Sx \widehat {c}^{T}+\widehat {c}\widehat {c}^{T} \bigr) \\ &= S^{T}(Sx-\widehat {c}). \end{aligned}$$
(4)

Thus, our new iterative equation is in the form

$$\begin{aligned} \operatorname {Vec}\bigl(X(k+1) \bigr) = \operatorname {Vec}\bigl(X(k) \bigr)+\tau _{k+1} \bigl( B^{T} \otimes A \bigr)^{T} \bigl( \operatorname {Vec}(C)- \bigl( B^{T}\otimes A \bigr) \operatorname {Vec}(X) \bigr). \end{aligned}$$

Using Lemma 2.1, we have

$$\begin{aligned} X(k+1) = X(k)+\tau _{k+1} \bigl( A^{T} \bigl(C-AX(k)B \bigr)B^{T} \bigr). \end{aligned}$$

Next, we choose a step size. To generate the best step size at each iteration, we minimize an error which occurs at the next iteration, \(X(k+1)\). Then, for each \(k\in \mathbb {N}\cup {0}\), we define \(\phi _{k+1}:[0,\infty )\rightarrow \mathbb {R}\) by

$$\begin{aligned} \phi _{k+1}(\tau ) :={}& f \bigl(\operatorname {Vec}\bigl(X(k+1) \bigr) \bigr) \\ ={}& \frac{1}{2} \bigl\Vert \bigl(B^{T}\otimes A \bigr) \operatorname {Vec}\bigl( X(k)+\tau _{k+1} \bigl(A^{T} \bigl(C-AX(k)B \bigr)B^{T} \bigr) \bigr)- \operatorname {Vec}(C) \bigr\Vert _{F}^{2}. \end{aligned}$$

Now, we shall minimize the function \(\phi _{k+1}(\tau )\) by applying the properties of matrix trace. Before that, we may transform \(\phi _{k+1}(\tau )\) into a convenient form by letting \(\overline {c}= \widehat {c}-Sx\) and \(\tilde {b}= SS^{T} \overline {c}\), so that

$$\begin{aligned} \phi _{k+1}(\tau ) &= \frac{1}{2} \bigl\Vert S \bigl(x(k)+\tau _{k+1}S^{T} \bigl( \widehat {c}-Sx(k) \bigr) \bigr)-\widehat {c}\bigr\Vert _{F}^{2} \\ &= \frac{1}{2} \bigl\Vert \tau _{k+1}SS^{T} \bigl( \widehat {c}-Sx(k) \bigr)+Sx(k)- \widehat {c}\bigr\Vert _{F}^{2} \\ &= \frac{1}{2} \Vert \tau _{k+1}\tilde {b}-\overline {c}\Vert _{F}^{2}. \end{aligned}$$

Differentiating both sides, we have

$$\begin{aligned} \frac{d\phi _{k+1}(\tau )}{d\tau } &= \frac{1}{2}\frac{d}{d\tau } \operatorname {tr}\bigl( (\tau \tilde {b}-\overline {c})^{T}(\tau \tilde {b}-\overline {c}) \bigr) \\ &= \frac{1}{2}\frac{d}{d\tau }\operatorname {tr}\bigl( \tau \tilde {b}\tau \tilde {b}^{T}-\tau \tilde {b}\overline {c}^{T}-\overline {c}\tau \tilde {b}^{T}+ \overline {c}\overline {c}^{T} \bigr) \\ &=\tau \operatorname {tr}\bigl(\tilde {b}\tilde {b}^{T} \bigr)-\operatorname {tr}\bigl(\tilde {b}\overline {c}^{T} \bigr). \end{aligned}$$

Note that the second derivative of \(\phi _{k+1}(\tau )\) is the constant \(\operatorname {tr}(\tilde {b}\tilde {b}^{T})\), which is positive. Setting \(d\phi _{k+1}(\tau )/d\tau =0\) and using Lemma 2.1(iii), we obtain the minimizer of \(\phi _{k+1}(\tau )\) as follows:

$$\begin{aligned} \tau _{k+1} &= \frac{ \Vert (B^{T}\otimes A)^{T}(\operatorname {Vec}(C)-(B^{T}\otimes A)\operatorname {Vec}(X(k))) \Vert _{F}^{2}}{ \Vert (B^{T}\otimes A)(B^{T}\otimes A)^{T}(\operatorname {Vec}(C)-(B^{T}\otimes A)\operatorname {Vec}(X(k))) \Vert _{F}^{2}} \\ &= \frac{ \Vert A^{T}(C-AX(k)B)B^{T} \Vert _{F}^{2}}{ \Vert AA^{T}(C-AX(k)B)B^{T}B \Vert _{F}^{2}}. \end{aligned}$$

Summarizing the direction and the step size altogether, we get:

Algorithm 3.1

The gradient-descent iterative algorithm for solving (2).

Initialization step.:

Given any small error\(\epsilon >0\), choose an initial matrix\(X(0)\). Set\(k:=0\). Compute\(\widehat{A}=AA^{T}\), and\(\widehat{B}=B^{T}B\).

Stopping rule.:

Compute\(E(k)=C-AX(k)B\). If\(\Vert E(k) \Vert _{F}<\epsilon \), stop. Otherwise, go to the next step.

Updating step.:

Compute

$$\begin{aligned}& \tau _{k+1} = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n} (\sum_{\beta =1}^{q} (\sum_{\alpha =1}^{p}A^{T}(i,\alpha )E(\alpha ,\beta ) ) B^{T}(\beta ,j) )^{2}}{\sum_{i=1}^{p}\sum_{j=1}^{q} (\sum_{\beta =1}^{q} (\sum_{\alpha =1}^{p}\widehat{A}(i,\alpha )E(\alpha ,\beta ) )\widehat{B}(\beta ,j) )^{2}}, \\& X(k+1) = X(k)+\tau _{k+1}A^{T}E(k)B^{T}. \end{aligned}$$

Set\(k:=k+1\)and return to Stopping rule.

Remark 3.2

In Algorithm 3.1, we introduce the matrices Â, , and \(E(k)\) to avoid duplicate manipulations. The term \(E(k)\) or \(E(\alpha ,\beta )\) in the denominator of the formula of \(\tau _{k+1}\) does not cause a severe propagation of errors when \(X(k)\) is close to the exact solution. This is because the Stopping rule prevents \(E(\alpha ,\beta )\) from being a very small number, and there is also the term \(E(\alpha ,\beta )\) in the numerator. A similar comment is applied to any developed algorithms in this paper.

3.2 Convergence of the algorithm

Here, we will prove that Algorithm 3.1 converges to the exact solution. The following analysis will hold for strongly convex functions. Recall that a twice-differentiable convex function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is said to be strongly convex if there exist \(m,M \in [0,\infty )\) such that \(mI\preceq \nabla ^{2} f(x)\preceq MI\) for all \(x\in \mathbb {R}^{n}\).

Lemma 3.3

([41])

Iffis strongly convex on\(\mathbb {R}^{n}\), then for any\(x,y\in \mathbb {R}^{n}\)

$$\begin{aligned}& f(y)\geqslant f(x)+\nabla f(x)^{T}(y-x)+\frac{m}{2} \Vert y-x \Vert ^{2}_{F}, \end{aligned}$$
(5)
$$\begin{aligned}& f(y)\leqslant f(x)+\nabla f(x)^{T}(y-x)+\frac{M}{2} \Vert y-x \Vert ^{2}_{F}. \end{aligned}$$
(6)

Theorem 3.4

If (2) is consistent and has a unique solution\(X^{*}\), then the iterative sequence\(\{X(k)\}\)generated by Algorithm 3.1converges to\(X^{*}\)for any initial matrix\(X(0)\), i.e., \(X(k)\rightarrow X^{*}\)as\(k\rightarrow \infty \).

Proof

If \(\nabla f(\operatorname {Vec}(X(k)))= 0\) for some k, then \(X(k)=X^{*}\) and the result holds. So assume that \(\nabla f(\operatorname {Vec}(X(k)))\neq 0\) for all k. To investigate its convexity, let us find the second derivative. Indeed, we have from (4) and Lemma 2.1 that

$$\begin{aligned} \nabla ^{2}f \bigl(\operatorname {Vec}(X) \bigr) = \bigl(B^{T} \otimes A \bigr)^{T} \bigl(B^{T} \otimes A \bigr) = BB^{T} \otimes A^{T} A. \end{aligned}$$

For convenience, we write \(\lambda _{\min }\) and \(\lambda _{\max }\) instead of \(\lambda _{\min }(BB^{T}\otimes A^{T}A)\) and \(\lambda _{\max }(BB^{T}\otimes A^{T}A)\), respectively. Since \(BB^{T} \otimes A^{T} A\) is symmetric, we have

$$\begin{aligned} \lambda _{\min }I \preceq \nabla ^{2}f \bigl(\operatorname {Vec}(X) \bigr) \preceq \lambda _{\max }I, \end{aligned}$$

meaning that f is strongly convex. Considering \(\phi _{k+1}(\tau )=f(\operatorname {Vec}(X(k+1)))\) and applying (6) in Lemma 3.3, we obtain

$$\begin{aligned} \phi _{k+1}(\tau )&\leqslant f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr)-\tau \biggl\Vert \nabla f \bigl( \operatorname {Vec}\bigl(X(k) \bigr) \bigr) \biggl\Vert ^{2}_{F}+\frac{\lambda _{\max }\tau ^{2}}{2} \biggr\Vert \nabla f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) \biggr\Vert ^{2}_{F}. \end{aligned}$$

The right-hand side is minimized by \(\tau _{k+1}=1/\lambda _{\max }\), and

$$\begin{aligned} f \bigl(\operatorname {Vec}\bigl(X(k+1) \bigr) \bigr)&=\phi _{k+1}(\tau _{k+1}) \\ &\leqslant f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr)-\frac{1}{2\lambda _{\max }} \bigl\Vert \nabla f \bigl( \operatorname {Vec}\bigl(X(k) \bigr) \bigr) \bigr\Vert ^{2}_{F}. \end{aligned}$$
(7)

It follows from (5) that

$$\begin{aligned} f \bigl(\operatorname {Vec}\bigl(X(k+1) \bigr) \bigr) &\geqslant f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr)-\tau \bigl\Vert \nabla f \bigl( \operatorname {Vec}\bigl(X(k) \bigr) \bigr) \bigr\Vert ^{2}_{F} \\ &\quad {} +\frac{\lambda _{\min }\tau ^{2}}{2} \bigl\Vert \nabla f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) \bigr\Vert ^{2}_{F}. \end{aligned}$$
(8)

We find that \(\tau = 1/\lambda _{\min }\) minimizes the RHS of (8), i.e.,

$$\begin{aligned} 0 \geqslant f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr)-\frac{1}{2\lambda _{\min }} \bigl\Vert \nabla f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) \bigr\Vert ^{2}_{F}. \end{aligned}$$

Hence,

$$ \bigl\Vert \nabla f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) \bigr\Vert ^{2}_{F} \geqslant 2\lambda _{ \min }f \bigl( \operatorname {Vec}\bigl(X(k) \bigr) \bigr). $$
(9)

Substituting (9) into (7) and then putting \(c:=1-\lambda _{\min }/\lambda _{\max }\), we have

$$\begin{aligned} f \bigl(\operatorname {Vec}\bigl(X(k+1) \bigr) \bigr) \leqslant c f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr). \end{aligned}$$

We obtain inductively that

$$ f \bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) \leqslant c^{k}f \bigl( \operatorname {Vec}\bigl(X(0) \bigr) \bigr). $$

Since A has full column-rank and B has full row-rank, \(BB^{T}\otimes A^{T}A\) is invertible. It follows that \(BB^{T}\otimes A^{T}A\) is positive definite, which implies \(\lambda _{\min }>0\) and thus \(0< c<1\). Hence, \(f(\operatorname {Vec}(X(k)))\rightarrow 0\) as \(k\rightarrow \infty \). □

4 The equation \(\sum_{t=1}^{p}A_{t}XB_{t}\)

In this section, we consider the generalized Sylvester equation

$$\begin{aligned} \sum_{t=1}^{p}A_{t}XB_{t}=C, \end{aligned}$$
(10)

where for each \(t=1,\ldots,p\)\(A_{t}\in M_{q,m}\) is a full column-rank matrix, \(B_{t}\in M_{n,r}\) is a full row-rank matrix, \(C\in M_{q,r}\) is a known constant matrix, and \(X\in M_{m,n}\) is an unknown matrix. An equivalent condition for (10) to have a unique solution is that \(P = \sum_{t=1}^{p} B_{t}^{T}\otimes A_{t}\) is invertible. Its unique solution is given by

$$\begin{aligned} \operatorname {Vec}\bigl( X^{*} \bigr) = \bigl( P^{T}P \bigr)^{-1}P^{T} \operatorname {Vec}(C). \end{aligned}$$
(11)

We shall introduce a new iterative method for solving (10) based on gradients and the steepest descend which provides an appropriate sequence of convergent factors for minimizing an error at each iteration.

4.1 Proposed algorithm

We define the quadratic norm-error function \(\tilde {f}:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) by

$$\begin{aligned} \tilde {f}(x):=\frac{1}{2} \bigl\Vert Px-\operatorname {Vec}(C) \bigr\Vert _{F}^{2}. \end{aligned}$$

It is obvious that is convex. For convenience, we let \(P = \sum_{t=1}^{p} B_{t}^{T}\otimes A\). We assume that P is invertible, then the exact solution exists. The gradient-descent iterative method therefore can be described through the following recursive rule:

$$\begin{aligned} X(k+1)=X(k)-\tilde {\tau }_{k+1}\nabla \tilde {f}\bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr). \end{aligned}$$

To search for the direction, we use the same techniques as in the previous section and then obtain

$$\begin{aligned} \nabla \tilde {f}\bigl(\operatorname {Vec}(X) \bigr) = P^{T} \bigl( P\operatorname {Vec}(X)- \operatorname {Vec}(C) \bigr). \end{aligned}$$

Thus, our new iterative equation is in the form

$$\begin{aligned} \operatorname {Vec}\bigl(X(k+1) \bigr) = \operatorname {Vec}\bigl(X(k) \bigr)+\tilde {\tau }_{k+1} P^{T} \bigl( P\operatorname {Vec}(X)- \operatorname {Vec}(C) \bigr). \end{aligned}$$

Using Lemma 2.1, we obtain

$$\begin{aligned} X(k+1) = X(k)+\tilde {\tau }\sum_{t=1}^{p} \Biggl( A_{t}^{T} \Biggl( C-\sum_{l=1}^{p}A_{l}X(k)B_{l} \Biggr)B_{t}^{T} \Biggr). \end{aligned}$$

Next, we choose a step size. With the same technique as in the previous section, we minimize \(\tilde {\phi }:[0,\infty )\rightarrow \mathbb {R}\) by for each \(k=0,1,\ldots \) , \(\tilde {\phi }_{k+1}(\tilde {\tau }):=\tilde {f}(X(k+1))\). Similarly, the minimizer of function \(\tilde {\phi }_{k+1}(\tilde {\tau })\) is

$$\begin{aligned} \tilde {\tau }_{k+1}= \frac{ \Vert \sum_{t=1}^{p} ( A_{t}^{T} ( C-\sum_{l=1}^{p}A_{l}X(k)B_{l} ) B_{t}^{T} ) \Vert _{F}^{2}}{ \Vert \sum_{t=1}^{p}\sum_{h=1}^{p} ( A_{t}A_{h}^{T} ( C-\sum_{l=1}^{p}A_{l}X(k)B_{l} )B_{h}^{T}B_{t} ) \Vert _{F}^{2}}. \end{aligned}$$

Summarizing the direction and the step size altogether, we get:

Algorithm 4.1

The gradient-descent iterative algorithm for solving (10).

Initialization step.:

Given any small error\(\epsilon >0\), choose an initial matrix\(X(0)\). Set\(k:=0\). Compute\(A_{\alpha ,\beta }=A_{\alpha }A_{\beta }^{T}\), and\(B_{\alpha ,\beta }=B_{\alpha }^{T}B_{\beta }\)for all\(\alpha ,\beta =1,\ldots,p\).

Stopping rule.:

Compute\(E(k)=C-\sum_{t=1}^{p}A_{t}X(k)B_{t}\). If\(\Vert E(k) \Vert _{F}<\epsilon \), stop. Otherwise, go to the next step.

Updating step.:

Compute

$$\begin{aligned}& \tilde {\tau }_{k+1} = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n} (\sum_{t=1}^{p}\sum_{\beta =1}^{r}\sum_{\alpha =1}^{q}A_{t}^{T}(i,\alpha )E(\alpha ,\beta )B_{t}^{T}(\beta ,j) )^{2}}{\sum_{i=1}^{q}\sum_{j=1}^{r} (\sum_{t=1}^{p}\sum_{h=1}^{p}\sum_{\beta =1}^{r}\sum_{\alpha =1}^{q}A_{t,h}(i,\alpha )E(\alpha ,\beta )B_{t,h}(\beta ,j) )^{2}}, \\& X(k+1) = X(k)+\tilde {\tau }_{k+1}\sum_{t=1}^{p}A_{t}^{T}E(k)B_{t}^{T}. \end{aligned}$$

Set\(k:=k+1\)and return to the Stopping rule.

4.2 Convergence analysis of the algorithm

In this subsection, we shall show that Algorithm 4.1 is applicable for any choice of the initial matrix \(X(0)\) as long as equation (10) has a unique solution. After that, we shall discuss error estimates and the asymptotic convergence rate of the algorithm.

Theorem 4.2

If (10) is consistent and has a unique solution\(X^{*}\), or equivalently, Pis invertible, then the iterative sequence\(\{X(k)\}\)generated by Algorithm 4.1converges to\(X^{*}\)for any initial matrix\(X(0)\), i.e., \(X(k)\rightarrow X^{*}\)as\(k\rightarrow \infty \).

Proof

Convergence of Algorithm 4.1 can be proved similarly as in Theorem 3.4. In this case, we have

$$\begin{aligned} \lambda _{\min } \bigl(P^{T} P \bigr) I \preceq \nabla ^{2}\tilde {f}\bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr) = P^{T} P \preceq \lambda _{\max } \bigl(P^{T} P \bigr) I, \end{aligned}$$

which implies the strong convexity of . In a similar manner, we get

$$\begin{aligned} \tilde {f}\bigl(\operatorname {Vec}\bigl(X(k+1) \bigr) \bigr) \leqslant \tilde {c}\tilde {f}\bigl( \operatorname {Vec}\bigl(X(k) \bigr) \bigr), \end{aligned}$$
(12)

where \(\tilde {c}:=1-\lambda _{\min }(P^{T}P)/\lambda _{\max }(P^{T}P)\). By induction, we obtain

$$\begin{aligned} \tilde {f}\bigl(\operatorname {Vec}\bigl(X(k) \bigr) \bigr)\leqslant \tilde {c}^{k} \tilde {f}\bigl(\operatorname {Vec}\bigl(X(0) \bigr) \bigr). \end{aligned}$$
(13)

The uniqueness of the solution implies that P is positive definite, and thus \(0<\tilde {c}<1\). Hence, \(\tilde {f}(\operatorname {Vec}(X(k)))\rightarrow 0\) as \(k\rightarrow \infty \). □

From now on, we denote \(\kappa = \kappa (P)\), the condition number of P. Observe that \(\tilde {c}= 1-\kappa ^{-2}\). According to Lemma 2.1(iii), the bounds (12) and (13) give rise to the following estimates:

$$\begin{aligned}& \Biggl\Vert \sum_{t=1}^{p}A_{t}X(k)B_{t}-C \Biggr\Vert _{F} \leqslant \bigl(1- \kappa ^{-2} \bigr)^{\frac{1}{2}} \Biggl\Vert \sum_{t=1}^{p}A_{t}X(k-1)B_{t}-C \Biggr\Vert _{F}, \end{aligned}$$
(14)
$$\begin{aligned}& \Biggl\Vert \sum_{t=1}^{p}A_{t}X(k)B_{t}-C \Biggr\Vert _{F} \leqslant \bigl(1- \kappa ^{-2} \bigr)^{\frac{k}{2}} \Biggl\Vert \sum_{t=1}^{p}A_{t}X(0)B_{t}-C \Biggr\Vert _{F}. \end{aligned}$$
(15)

Since \(0<\tilde {c}<1\), it follows that if \(\Vert \sum_{t=1}^{p}A_{t}X(k-1)B_{t}-C \Vert _{F}\) are nonzero, then

$$\begin{aligned} \Biggl\Vert \sum_{t=1}^{p}A_{t}X(k)B_{t}-C \Biggr\Vert _{F} &< \Biggl\Vert \sum_{t=1}^{p}A_{t}X(k-1)B_{t}-C \Biggr\Vert _{F}. \end{aligned}$$
(16)

We can summarize the above discussion as follows:

Theorem 4.3

Suppose the hypothesis of Theorem 4.2holds. The convergence rate of Algorithm 4.1 (with respect to the certain error\(\Vert \sum_{t=1}^{p}A_{t}X(k)B_{t}-C \Vert _{F}\)) is governed by\(\sqrt{1-\kappa ^{-2}}\). Moreover, the error estimates\(\Vert \sum_{t=1}^{p}A_{t}X(k)B_{t}-C \Vert _{F}\)compared to the previous iteration and the first iteration are provided by (14) and (15), respectively. In particular, the relative error at each iteration gets smaller than the previous (nonzero) error, as in (16).

Theorem 4.4

Suppose the hypothesis of Theorem 4.2holds. Then the error estimates\(\Vert X(k)-X^{*} \Vert _{F}\)compared to the previous iteration and the first iteration of Algorithm 4.1are given as follows:

$$\begin{aligned}& \bigl\Vert X(k)-X^{*} \bigr\Vert _{F} \leqslant \kappa \sqrt{\kappa ^{2}-1} \bigl\Vert X(k-1)-X^{*} \bigr\Vert _{F}, \end{aligned}$$
(17)
$$\begin{aligned}& \bigl\Vert X(k)-X^{*} \bigr\Vert _{F} \leqslant \kappa ^{2} \bigl(1-\kappa ^{-2} \bigr)^{ \frac{k}{2}} \bigl\Vert X(0)-X^{*} \bigr\Vert _{F}. \end{aligned}$$
(18)

In particular, the convergence rate of the algorithm is governed by\(\sqrt{1-\kappa ^{-2}}\).

Proof

Utilizing equation (15) and Lemma 2.2, we have

$$\begin{aligned} \bigl\Vert X(k)-X^{*} \bigr\Vert _{F} &= \bigl\Vert \operatorname {Vec}\bigl(X(k) \bigr)-\operatorname {Vec}\bigl(X^{*} \bigr) \bigr\Vert _{F} \\ &= \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigl(P^{T}P \bigr)\operatorname {Vec}\bigl(X(k) \bigr)- \bigl(P^{T}P \bigr)^{-1} \bigl(P^{T}P \bigr) \operatorname {Vec}\bigl(X^{*} \bigr) \bigr\Vert _{F} \\ &\leqslant \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigr\Vert _{2} \bigl\Vert P^{T} \bigr\Vert _{2} \bigl\Vert P \operatorname {Vec}\bigl(X(k) \bigr)-P\operatorname {Vec}\bigl(X^{*} \bigr) \bigr\Vert _{F} \\ &\leqslant \bigl(1-\kappa ^{-2} \bigr)^{\frac{k}{2}} \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigr\Vert _{2} \bigl\Vert P^{T} \bigr\Vert _{2} \bigl\Vert P\operatorname {Vec}\bigl(X(0) \bigr)-\operatorname {Vec}(C) \bigr\Vert _{F} \\ &\leqslant \bigl(1-\kappa ^{-2} \bigr)^{\frac{k}{2}} \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigr\Vert _{2} \bigl\Vert P^{T} \bigr\Vert _{2} \Vert P \Vert _{2} \bigl\Vert X(0)-X^{*} \bigr\Vert _{F} \\ &= \bigl(1-\kappa ^{-2} \bigr)^{\frac{k}{2}} \frac{\lambda _{\max }(P^{T}P)}{\lambda _{\min }(P^{T}P)} \bigl\Vert X(0)-X^{*} \bigr\Vert _{F} \\ &=\kappa ^{2} \bigl(1-\kappa ^{-2} \bigr)^{\frac{k}{2}} \bigl\Vert X(0)-X^{*} \bigr\Vert _{F}. \end{aligned}$$

Since the asymptotic behavior of the above error depends on the term \(( 1-\kappa ^{-2} )^{\frac{k}{2}}\), the asymptotic convergence rate for the algorithm is governed by \(\sqrt{1-\kappa ^{-2}}\). In a similar manner but making use of (14) instead of (15), we obtain

$$\begin{aligned} \bigl\Vert X(k)-X^{*} \bigr\Vert _{F} &\leqslant \bigl(1-\kappa ^{-2} \bigr)^{\frac{1}{2}} \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigr\Vert _{2} \bigl\Vert P^{T} \bigr\Vert _{2} \bigl\Vert P\operatorname {Vec}\bigl(X(k-1) \bigr)- \operatorname {Vec}(C) \bigr\Vert _{F} \\ &\leqslant \bigl(1-\kappa ^{-2} \bigr)^{\frac{1}{2}} \bigl\Vert \bigl(P^{T}P \bigr)^{-1} \bigr\Vert _{2} \bigl\Vert P^{T} \bigr\Vert _{2} \Vert P \Vert _{2} \bigl\Vert X(k-1)-X^{*} \bigr\Vert _{F} \\ &=\kappa ^{2} \bigl(1-\kappa ^{-2} \bigr)^{\frac{1}{2}} \bigl\Vert X(k-1)-X^{*} \bigr\Vert _{F}, \end{aligned}$$

and hence (17) is reached. □

Thus, the condition number κ determines the asymptotic convergence rate, as well as how far our initial matrix was from the exact solution. The closer κ gets to 1, the faster the algorithm converges to the required result.

5 Numerical simulations for a class of the generalized Sylvester matrix equations

In this section, we present applications of our proposed algorithms to the certain linear matrix equations. To show the effectiveness and capability of our algorithms, we compare our proposed algorithms to the mentioned existing algorithms as well as the direct methods (3) and (11). For convenience, we abbreviate TauOpt to represent our algorithms. To measure the computational time taken for each program, we apply the tic and toc functions in MATLAB and abbreviate CT for it. The readers are recommended to consider all reported results, such as errors, CTs, figures, while comparing the performance of any algorithms. To measure the error at the kth step of the iteration, we consider the following error:

$$\begin{aligned} \gamma _{k} := \bigl\Vert X(k)-X^{*} \bigr\Vert _{F}. \end{aligned}$$

All iterations have been carried out by MATLAB R2018a, Intel(R) Core(TM) i7-6700HQ CPU @ 2.60 GHz, 8.00 GB RAM, PC environment.

Example 5.1

We consider the equation \(AXB=C\) with

$$\begin{aligned}& A = \begin{bmatrix} 1 & -1 & 2 & 3 & 1 & -3 & 3 & 2 \\ 2 & 3 & -2 & 2 & 2 & 1 & 3 & 3 \\ 3 & 1 & 1 & -1 & -3 & -2 & -1 & 3 \end{bmatrix}^{T} \quad \text{and} \\& B = \begin{bmatrix} 1 & 2 & -5 & 9 & 7 & 5 & 1 & 0 & -6 & 3 \\ 2 & -7 & 8 & 3 & 0 & 1 & 2 & 3 & 5 & -6 \\ 6 & -5 & 2 & 1 & 0 & 3 & -9 & 8 & 7 & 6 \end{bmatrix}. \end{aligned}$$

We choose the initial matrix \(X(0)=10^{-6}\operatorname {ones}(3,3)\) where \(\operatorname {ones}(m,n)\) denotes the \(m\times n\) matrix with contains 1 at every position. After running Algorithm 3.1, the numerical solutions converge to the exact solution

$$\begin{aligned} X^{*} = \begin{bmatrix} 1 & 5 & -9 \\ 6 & 5 & 4 \\ 1 & 2 & 3 \end{bmatrix}. \end{aligned}$$

In this example, we compare Algorithm 3.1 with GI (Proposition 1.2) and LS (Proposition 1.3). All reports are presented after running 100 iterations. Table 1 shows the errors at the final iteration as well as the computational time. Figure 1 displays the error plot. Table 1 implies that our algorithm takes significantly less computational time than the direct method. For comparison to other two algorithms, it seems that our algorithm takes a little more time but both Table 1 and Fig. 1 indicate that ours obtains a highly satisfactory approximated solution.

Figure 1
figure 1

Errors for Example 5.1

Table 1 Error and CT for Example 5.1

Example 5.2

In this example we consider the Sylvester equation with

$$\begin{aligned} A = \operatorname {tridiag}(3,-9,1) \in M_{100} \quad \text{and}\quad B = \operatorname {tridiag}(-1,-2,5) \in M_{100}. \end{aligned}$$

After running Algorithm 4.1 with an initial matrix \(X(0) = 10^{-6}\operatorname {ones}(100,100)\), the numerical solution converges to the exact solution \(X^{*}=\operatorname {tridiag}(1,2,3) \in M_{100}\).

We compare Algorithm 4.1 with the following algorithms: GI [32], RGI [34], MGI [35], JGI (Algorithm 4, [36]) and AJGI (Algorithm 5, [36]). The results after running for 100 iterations are shown in Fig. 2 and Table 2. According to the error and CT in Table 2 and Fig. 2, we find that despite a little longer computational time, our final error outperforms the other algorithms.

Figure 2
figure 2

Errors for Example 5.2

Table 2 Error and CT for Example 5.2

Example 5.3

In this example we consider equation (10) when \(p=3\) with

$$\begin{aligned}& A_{1} = \begin{bmatrix} 1 & 2 & 3 \\ -1 & 3 & 1 \\ 2 & -2 & 1 \\ 3 & 2 & -1 \\ 1 & 2 & -3 \\ -3 & 1 & -2 \\ 3 & 3 & -1 \\ 2 & 3 & 3 \end{bmatrix},\quad\quad A_{2} = \begin{bmatrix} 3 & 6 & 5 \\ 6 & 9 & -4 \\ 3 & 2 & -1 \\ 1 & 2 & -3 \\ -3 & 1 & -2 \\ 3 & 3 & -1 \\ 6 & -1 & 0 \\ 2 & 3 & 3 \end{bmatrix},\quad\quad A_{3} = \begin{bmatrix} -2 & 0 & 5 \\ 6 & 9 & -4 \\ 9 & 5 & -4 \\ 0 & 1 & 6 \\ 9 & -2 & 0 \\ 3 & 3 & -1 \\ -7 & 2 & 0 \\ -8 & 8 & 1 \end{bmatrix} , \\& B_{1} = \begin{bmatrix} 1 & 2 & 6 \\ 2 & -7 & -5 \\ -5 & 8 & 2 \\ 9 & 3 & 1 \\ 7 & 0 & 0 \\ 5 & 1 & 3 \\ 1 & 2 & -9 \\ 0 & 3 & 8 \\ -6 & 5 & 7 \\ 3 & -6 & 6 \end{bmatrix}^{T}, \quad\quad B_{2} = \begin{bmatrix} 1 & 6 & 6 \\ 2 & -2 & -5 \\ -5 & 0 & 2 \\ 4 & 5 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 3 \\ 3 & 2 & 3 \\ -9 & 3 & -5 \\ -6 & 5 & 9 \\ 3 & -6 & 1 \end{bmatrix}^{T}, \quad\quad B_{3} = \begin{bmatrix} 3 & 6 & 6 \\ 2 & -2 & 6 \\ 1 & 0 & 3 \\ 1 & 5 & 0 \\ 1 & 0 & -7 \\ 0 & 1 & 3 \\ 3 & 0 & 3 \\ -9 & 9 & -5 \\ -6 & -4 & 9 \\ 3 & -6 & 1 \end{bmatrix}^{T}. \end{aligned}$$

We choose an initial matrix \(X(0) = 10^{-6}\operatorname {ones}(3,3)\). After running Algorithm 4.1, the numerical solutions converge to the exact solution

$$\begin{aligned} X^{*} = \begin{bmatrix} 6 & 2 & 0 \\ -9 & 4 & -2 \\ 3 & 6 & 0 \end{bmatrix}. \end{aligned}$$

In this example, we compare Algorithm 4.1 with GI (Proposition 1.1) and LSI (Proposition 1.4). The results after 100 iterations are shown in Fig. 3 and Table 3. We find that Algorithm 4.1 gives the fastest convergence.

Figure 3
figure 3

Errors for Example 5.3

Table 3 Errors and CT for Example 5.3

6 An application to a discretization of one-dimensional heat equation

In this section, we apply our proposed algorithm to a discretization of one-dimensional heat equation:

$$\begin{aligned} \frac{\partial u}{\partial t} = c^{2} \frac{\partial ^{2}u}{\partial x^{2}} \quad \text{on }[0,\beta_{t}]\times [\alpha _{x}, \beta _{x}] \end{aligned}$$
(19)

subject to the boundary conditions \(u(\alpha _{x},t) = g_{l}\), \(u(\beta _{x},t) = g_{r}\), \(u(x,0) = g_{d}\) where \(g_{l}\), \(g_{r}\), \(g_{d}\) are given functions.

6.1 Discretization of the heat equation

We make discretization at the grid points in the rectangle which are at \((x_{i},t_{j})\) with \(x_{i} = \alpha _{x}+ih_{x}\) and \(t_{j} = jh_{t} \) where

$$\begin{aligned} h_{x} = \frac{\beta _{x}-\alpha _{x}}{N_{x}+1} \quad \text{and}\quad h_{t} = \frac{\beta _{t}}{N_{t}}. \end{aligned}$$
(20)

We denote \(u_{ij}=u(x_{i},t_{j})\). By the Forward Time Central Space (FTCS) method, we obtain

$$\begin{aligned} \frac{\partial u}{\partial t} = \frac{u_{i,j+1}-u_{ij}}{h_{t}} = c^{2} \frac{u_{i-1,j}-2u_{ij}+u_{i+1,j}}{h_{x}^{2}} = c^{2} \frac{\partial ^{2} u}{\partial x^{2}}, \end{aligned}$$

or equivalently,

$$\begin{aligned} u_{i,j+1} = F \bigl( u_{i-1,j}+u_{i+1,j} \bigr) + (1-2F)u_{ij}, \end{aligned}$$

where \(F = h_{t}c^{2}/h_{x}^{2}\) for \(1\leqslant i\leqslant N_{x}\), \(1\leqslant j\leqslant N_{t}\). We transform equation (19) into a linear system of \(N_{x}N_{t}\) equations in \(N_{x}N_{t}\) unknowns \(u_{11},\ldots,u_{N_{x}N_{t}}\):

$$\begin{aligned} T_{H}\operatorname {Vec}(U) = V, \end{aligned}$$
(21)

where \(U = [u_{ij}]\), \(T_{H}\in M_{N_{x}N_{t}}\) has \(N_{t}\times N_{t}\) blocks of the form \(I_{N_{x}}\) on its diagonal and \(\operatorname{tridiag}(-F,-(1-2F),-F)\) under its diagonal. Here is an example of \(T_{H}\) where \(N_{t}=3\) and \(N_{x}=2\):

$$\begin{aligned} T_{H} = \left [ \textstyle\begin{array}{@{}c@{\quad}c|c@{\quad}c|c@{\quad}c@{}} 1 & & & & & \\ & 1 & & & & \\ \hline -(1-2F) & -F & 1 & & & \\ -F & -(1-2F) & & 1 & & \\ \hline & & -(1-2F) & -F & 1 & \\ & & -F & -(1-2F) & & 1 \end{array}\displaystyle \right ]. \end{aligned}$$

The vector V is partitioned in \(N_{x}\) periods as [ V 1 T V 2 T V N y T ] T where

$$\begin{aligned} V_{1} = \begin{bmatrix} Fg_{d}(\alpha _{x},0)+(1-2F)g_{d}(x_{1},0)+Fg_{d}(x_{2},0) \\ Fg_{d}(x_{1},0)+(1-2F)g_{d}(x_{2},0)+Fg_{d}(x_{3},0) \\ \vdots \\ Fg_{d}(x_{N_{t}-2},0)+(1-2F)g_{d}(x_{N_{t}-1},0)+Fg_{d}(N_{t},0) \end{bmatrix} \quad \text{and}\quad V_{i} = \begin{bmatrix} Fg_{l}(\alpha _{x},i-1) \\ 0 \\ \vdots \\ 0 \\ Fg_{r}(\beta _{x},i-1) \end{bmatrix} \end{aligned}$$

for \(i = 2,\ldots,N_{y}\).

Equation (21) is formed as \(AXB=C\) where \(A = T_{H}\), \(X = \operatorname {Vec}(U)\), \(B = I\) and \(C = V\). According to Algorithm 3.1, we obtain an algorithm for (21) as follows:

Algorithm 6.1

The gradient-descent iterative algorithm for solving one-dimensional heat equation.

Input step.:

Input\(N_{x}\), \(N_{t}\in \mathbb {N}\)as numbers of partition.

Initialization step.:

Let\(h_{x}\)and\(h_{t}\)be as in (20) and, for each\(i=1,\ldots,N_{x}\)and\(j=1,\ldots,N_{t}\), \(x_{i}=\alpha _{x}+ih_{x}\)and\(t_{j}=jh_{t}\), compute\(s=T_{H}V\), \(S=T_{H}^{2}\), \(\widehat{s}=T_{H}s\), and\(\widehat{S}=T_{H}S\). Choose\(u(0)\in \mathbb {R}^{N_{x}N_{t}}\)and set\(k:=0\).

Updating step.:

Compute

$$\begin{aligned}& \tau _{k+1} = \frac{\sum_{p=1}^{N_{x}N_{t}} ( s_{p}-\sum_{q=1}^{N_{x}N_{t}}S_{pq}u_{q}(k) )^{2}}{\sum_{p=1}^{N_{x}N_{t}} ( \widehat{s}-\sum_{q=1}^{N_{x}N_{t}}\widehat{S}_{pq}u_{q}(k) )^{2}}, \\& u(k+1) = u(k)+\tau _{k+1} \bigl( s-Su(k) \bigr). \end{aligned}$$

Set\(k:=k+1\)and repeat the Updating step.

Here, we denote \(s_{p}\) the pth entry of a vector s and \(S_{pq}\) the \((p,q)\)-entry of S. To stop the algorithm, an appropriate stopping rule is \(\Vert V-T_{H}u(k) \Vert _{F}^{2}<\epsilon \) where ϵ is a small positive number.

6.2 Numerical simulation for the heat equation

To obtain the numerical solutions, we need to partition the rectangular domain. The accuracy of the solution depends on the size of the partition grid. A better accuracy must be from a finer grid system and it causes the size of the associated matrix \(T_{H}\) to be larger.

Example 6.2

Consider the heat equation (19) on \(\{ (x,t) : 0< x<1, t>0\}\) with the boundary and initial conditions given as:

$$\begin{aligned} u(0,t) = u(1,t) = 0 \quad \text{and}\quad u(x,0) = \sin \pi x. \end{aligned}$$

Let \(c = 1\), \(N_{x} = 4\), \(h_{t} = 0.01\). We have \(h_{x} = 0.2\) and \(F = 0.25\). In this case, we consider \(N_{t} = 10\), so the size of the matrix \(T_{H}\) is \(40\times 40\). We run Algorithm 6.1 with the initial vector \(u(0) = 10^{-6}[ 1 \ \cdots \ 1]^{T}\) and the numerical solutions converge to the exact solution

$$\begin{aligned} u^{*}(x,t) = e^{-\pi ^{2}t}\sin (\pi x). \end{aligned}$$

In this example we compare our algorithm to the following algorithms: GI (Proposition 1.2), RGI [34], MGI [35], LS (Proposition 1.3), JGI (Algorithm 4, [36]) and AJGI (Algorithm 5, [36]). The results after running 500 iterations are shown in Figs. 4 and 5, as well as Tables 4 and 5.

Figure 4
figure 4

Errors for Example 6.2

Figure 5
figure 5

The 3D-plot of the analytical solution (left) and the numerical solution (right) for Example 6.2

Table 4 Comparison of numerical and analytical results for Example 6.2
Table 5 Errors and computational time for Example 6.2

7 An application to a discretization of two-dimensional Poisson’s equation

In this section, we give an application of the proposed algorithm to a discretization of two-dimensional Poisson’s equation:

$$\begin{aligned} \frac{\partial ^{2}u(x,y)}{\partial x^{2}}+ \frac{\partial ^{2}u(x,y)}{\partial y^{2}} = f(x,y) \quad \text{on }[ \alpha _{x},\beta _{x}]\times [\alpha _{y},\beta _{y}] \end{aligned}$$
(22)

with the boundary conditions \(u(x,\beta _{y}) = g_{u}\), \(u(x,\alpha _{y}) = g_{d}\), \(u(\alpha _{x},y) = g_{l}\), and \(u(\beta _{x},y) = g_{r}\) where \(g_{u}\), \(g_{d}\), \(g_{l}\), \(g_{r}\) are given functions. Notice that the two-dimensional Laplace’s equation is a homogeneous case of the Poisson’s equation when the RHS function is zero, i.e., \(f(x,y)=0\).

7.1 Discretization with rectangular grid

We make discretization at the grid points in the rectangle which are at \((x_{i},y_{j})\) with \(x_{i} = \alpha _{x}+ih_{x}\) and \(y_{j} = \alpha _{y}+jh_{y} \) where

$$\begin{aligned} h_{x} = \frac{\beta _{x}-\alpha _{x}}{N_{x}+1} \quad \text{and}\quad h_{y} = \frac{\beta _{y}-\alpha _{y}}{N_{y}+1}. \end{aligned}$$
(23)

We denote \(u_{ij}=u(x_{i},y_{j})\), \(f_{ij}=f(x_{i},y_{j})\), as well as \(g_{u}\), \(g_{d}\), \(g_{l}\), \(g_{r}\). By the standard finite difference approximation, we obtain

$$\begin{aligned} \frac{\partial ^{2}u(x,y)}{\partial x^{2}}+ \frac{\partial ^{2}u(x,y)}{\partial y^{2}} = \frac{u_{i-1,j}-2u_{ij}+u_{i+1,j}}{h_{x}^{2}}+ \frac{u_{i,j-1}-2u_{ij}+u_{i,j+1}}{h_{y}^{2}}, \end{aligned}$$
(24)

or equivalently,

$$\begin{aligned}& h_{y}^{2} ( 2u_{ij}-u_{i-1,j}-u_{i+1,j} )+h_{x}^{2} ( 2u_{ij}-u_{i,j+1}-u_{i,j+1} ) =-h_{x}^{2}h_{y}^{2}f_{ij}, \end{aligned}$$

for \(1\leqslant i\leqslant N_{x}\), \(1\leqslant j\leqslant N_{y}\). Now, we can convert the differential equation (22) to a linear system of \(N_{x}N_{y}\) equations in \(N_{x}N_{y}\) unknowns \(u_{11},\ldots,u_{N_{x}N_{y}}\):

$$\begin{aligned} \bigl(h_{y}^{2}T_{1}+h_{x}^{2}T_{2} \bigr) \operatorname {Vec}(U) = -h_{x}^{2}h_{y}^{2} \operatorname {Vec}[f_{ij}]+h_{x}^{2} (\overline{g_{u}}+ \overline{g_{d}} )+h_{y}^{2} ( \overline{g_{l}}+\overline{g_{r}} ), \end{aligned}$$
(25)

where \(U=[u_{ij}]\), \(T_{1}\) has \(N_{y}\times N_{y}\) blocks of the form \(\operatorname{tridiag}(-1,2,-1)\) of \(N_{x}\times N_{x}\) on its diagonal and \(T_{2}\) also has \(N_{y}\times N_{y}\) blocks of the form \(2I_{N_{x}}\) on its diagonal and \(-I_{N_{x}}\) blocks on its off-diagonals. The boundary conditions produce constant vectors \(\overline{g_{u}}\), \(\overline{g_{d}}\), \(\overline{g_{l}}\), \(\overline{g_{r}}\) at the RHS of (25) as follows:

g u = [ g u x 1 , β y g u x 2 , β y g u x N x , β y 0 0 ] T , g d = [ 0 0 g d x 1 , α y g d x 2 , α y g d x N x , α y ] T , g l = [ g l α x , y N y 0 0 g l α x , y N y 1 0 g l α x , y 1 0 0 ] T , g r = [ 0 0 g r β x , y N y 0 0 g r β x , y N y 1 0 g r β x , y 1 ] T .

Note that, for the Laplace’s equation, equation (25) will be reduced to

$$\begin{aligned} \bigl(h_{y}^{2}T_{1}+h_{x}^{2}T_{2} \bigr) \operatorname {Vec}(U) = h_{x}^{2} (\overline{g_{u}}+ \overline{g_{d}} )+h_{y}^{2} ( \overline{g_{l}}+\overline{g_{r}} ). \end{aligned}$$

Equation (25) is formed as \(AXB=C\) where \(A = h_{y}^{2}T_{1}+h_{x}^{2}T_{2}\), \(X = \operatorname {Vec}(U)\), \(B = I\) and \(C = -h_{x}^{2}h_{y}^{2}\operatorname {Vec}([f_{ij}])+h_{x}^{2} ( \overline{g_{u}}+\overline{g_{d}} )+h_{y}^{2} ( \overline{g_{l}}+\overline{g_{r}} )\). According to Algorithm 3.1, we obtain an algorithm for the rectangular-grid case as follows:

Algorithm 7.1

The gradient-descent iterative algorithm for solving two-dimensional Poisson’s equation.

Input step.:

Input\(N_{x}\), \(N_{y}\in \mathbb {N}\)as numbers of partition.

Initialization step.:

Let\(h_{x}\)and\(h_{y}\)be as in (23) and for each\(i=1,\ldots,N_{x}\)and\(j=1,\ldots,N_{y}\), \(f_{ij}=f(x_{i},y_{j})\)where\(x_{i}=\alpha _{x}+ih_{x}\)and\(y_{j}=\alpha _{y}+jh_{y}\). Compute\(c=-h_{x}^{2}h_{y}^{2}\operatorname {Vec}[f_{ij}]+h_{x}^{2} (\overline{g_{u}}+\overline{g_{d}} )+h_{y}^{2} (\overline{g_{l}}+\overline{g_{r}} )\), \(s=T_{N}c\), \(S=T_{N}^{2}\), \(t=T_{N}s\), and\(T=T_{N}S\)where\(T_{N}=h_{y}^{2}T_{1}+h_{x}^{2}T_{2}\). Choose\(u(0)\in \mathbb {R}^{N_{x}N_{y}}\)and set\(k:=0\).

Updating step.:

Compute

$$\begin{aligned}& \tau _{k+1} = \frac{\sum_{p=1}^{N_{x}N_{y}} ( s_{p}-\sum_{q=1}^{N_{x}N_{y}}S_{pq}u_{q}(k) )^{2}}{\sum_{p=1}^{N_{x}N_{y}} ( t_{p}-\sum_{q=1}^{N_{x}N_{y}}T_{pq}u_{q}(k) )^{2}}, \\& u(k+1) = u(k)+\tau _{k+1} \bigl( s-Su(k) \bigr). \end{aligned}$$

Set\(k:=k+1\)and repeat the Updating step.

Here, we denote by \(s_{p}\) the pth entry of a vector s and by \(S_{pq}\) the \((p,q)\)-entry of S. In case of solving the two-dimensional Laplace’s equation, initially compute \(c=h_{x}^{2} (\overline{g_{u}}+\overline{g_{d}} )+h_{y}^{2} (\overline{g_{l}}+\overline{g_{r}} )\). To stop the algorithm, a reasonable stopping rule is \(\Vert c-T_{N}u(k) \Vert _{F}^{2}<\epsilon \) where ϵ is a small positive number. Since the coefficient matrix \(T_{N}\) is sparse, the error norm can be described more precisely:

$$\begin{aligned} \bigl\Vert c-T_{N}u(k) \bigr\Vert _{F}^{2} &= \Vert c \Vert _{F}^{2}-2\operatorname {tr}\bigl(c^{T}T_{N}u(k) \bigr)+ \bigl\Vert T_{N}u(k) \bigr\Vert _{F}^{2} \\ &= \Vert c \Vert _{F}^{2}-2h_{x}^{2}h_{y}^{2} \sum_{i=1}^{N_{x}} \sum _{j=1}^{N_{y}}h_{Y}^{2}f_{ij}(-u_{i-1,j}+2u_{ij}-u_{i+1,j}) \\ &\quad {}+h_{x}^{2}f_{ij}(-u_{i,j+1}+2u_{ij}-u_{i,j-1})+ \bigl\Vert T_{N}u(k) \bigr\Vert _{F}^{2}. \end{aligned}$$

7.2 Discretization with square grid

Now, we consider the Poisson’s equation (22) on the square \([\alpha , \beta ] \times [\alpha , \beta ]\) with the boundary condition \(u=0\) on the boundary of the square. In this case, \(h:=h_{x}=h_{y}\) and \(N:= N_{x}=N_{y}\) and hence

$$\begin{aligned} T_{N} = I_{N} \otimes T_{r}+T_{r} \otimes I_{N}, \end{aligned}$$

where \(T_{r} = \operatorname{tridiag}(-1,2,-1)\in M_{N}\). Thereby, (25) can be transformed into

$$\begin{aligned} T_{N}\operatorname {Vec}(U) = -h^{2}\operatorname {Vec}\bigl([f_{ij}] \bigr)+\overline{g_{u}}+ \overline{g_{d}}+ \overline{g_{l}}+\overline{g_{r}}, \end{aligned}$$
(26)

or equivalently, \(T_{r}U+UT_{r} = G\) where \(G=-h^{2}\operatorname {Vec}([f_{ij}])+\overline{g_{u}}+\overline{g_{d}}+ \overline{g_{l}}+\overline{g_{r}}\). Thus (26) can be solved by Algorithm 4.1 where \(P=T_{N}\).

To have the condition number of \(T_{N}\), we consider the smallest and largest eigenvalues of \(T_{r}\) which are given respectively by (see, e.g., [42])

$$\begin{aligned} \lambda _{1} = 2 \biggl( 1- \cos \frac{\pi }{N+1} \biggr) \approx \biggl(\frac{\pi }{N+1} \biggr)^{2}, \quad \quad \lambda _{N} = 2 \biggl( 1- \cos \frac{N\pi }{N+1} \biggr) \approx 4. \end{aligned}$$

Since \(T_{N}=I_{N} \otimes T_{r}+T_{r} \otimes I_{N}\), the eigenvalue of \(T_{N}\) is \(\lambda _{i} + \lambda _{j}\) where \(\lambda _{i},\lambda _{j}\in \sigma (T_{r})\). Thus, the condition number of \(T_{N}\) for large N is

$$\begin{aligned} \kappa _{T_{N}} = \frac{\lambda _{N}+\lambda _{N}}{\lambda _{1}+\lambda _{1}} \approx \frac{4}{\pi ^{2}}(N+1)^{2}. \end{aligned}$$
(27)

Corollary 7.2

The discretization (26) of the Poisson’s equation (22) can be solved by using Algorithm 7.1in which\(c=-h^{2}\operatorname {Vec}[f_{ij}]+\overline{g_{u}}+\overline{g_{d}}+ \overline{g_{l}}+\overline{g_{r}}\)so that the approximate solution\(u(k)\)converges to the exact solution\(u^{*}\)for any initial vector\(u(0)\). The convergence rate of the algorithm is governed by\(\sqrt{1-\kappa _{T_{N}}^{-2}}\), where\(\kappa _{T_{N}}\)is given by (27). Moreover, the error estimates are given as follows:

$$\begin{aligned}& \bigl\Vert u(k)-u^{*} \bigr\Vert _{F} \leqslant \kappa _{T_{N}} \bigl(1-\kappa _{T_{N}}^{-2} \bigr)^{ \frac{1}{2}} \bigl\Vert u(k-1)-u^{*} \bigr\Vert _{F}, \\& \bigl\Vert u(k)-u^{*} \bigr\Vert _{F} \leqslant \kappa _{T_{N}} \bigl(1-\kappa _{T_{N}}^{-2} \bigr)^{ \frac{k}{2}} \bigl\Vert u(0)-u^{*} \bigr\Vert _{F}. \end{aligned}$$

7.3 Numerical simulations for the Poisson’s equation

Example 7.3

We consider an application of our algorithm to the two-dimensional Poisson’s equation (22) with

$$\begin{aligned} f(x,y) = -2\pi ^{2}\sin (\pi x)\sin (\pi y), \quad 0< x< 1, 0< y< 1, \end{aligned}$$

and the boundary condition \(u=0\) on the boundary of the rectangle. It is called a Dirichlet problem. We choose an initial vector \(u(0) = 10^{-6}[ 1 \ \cdots \ 1 ]^{T}\). We run Algorithm 7.1 with the rectangular grid of \(10\times 20\) which causes the size of the matrix \(T_{N}\) to be \(200\times 200\). The analytical solution is

$$\begin{aligned} u^{*}(x,y) =\sin (\pi x)\sin (\pi y). \end{aligned}$$

In this example, we provide only a comparison of numerical and analytical solutions in Table 6, and a 3D-plot of both solutions in Fig. 6.

Figure 6
figure 6

The 3D-plot of the analytical solution (left) and the numerical solution (right) for Example 7.3

Table 6 Comparison of numerical and analytical results for Example 7.3

Example 7.4

Consider the two-dimensional Laplace’s equation on \([0,1]\times [0,\pi ]\) with the boundary conditions:

$$\begin{aligned} u(0,y) = \sin y, \quad \quad u(1,y) = e\sin y, \quad \quad u(x,0) = 0, \quad \quad u(x,\pi ) = 0. \end{aligned}$$

We run Algorithm 7.1 with the initial vector \(u(0) = [ 1 \ \cdots \ 1 ]^{T}\). We choose two grid partitions: one has \(h_{x} = 0.25\), \(h_{y} = \pi /4\) and the other has \(h_{x} = 0.0625\), \(h_{y} = \pi /32\). So the sizes of the matrix \(T_{N}\) are \(9\times 9\) and \(465\times 465\), respectively. A comparison of numerical and analytical results is shown in Table 7. Figure 7 displays a 3D-plot of the numerical and the analytical results for the latter grid partition. Note that the analytical solution is

$$\begin{aligned} u^{*}(x,y) = e^{x}\sin y. \end{aligned}$$
Figure 7
figure 7

The 3D-plot of the analytical solution (left) and the numerical solution (right) for Example 7.4

Table 7 comparison of numerical and analytical results for Example 7.4

8 Conclusion

The proposed gradient-descent based iterative algorithm is well suited for solving the generalized Sylvester matrix equation, \(\sum_{t=1}^{p}A_{t}XB_{t}=C\). Such matrix equation can be reduced to a class of well-known linear matrix equations such as the Sylvester equation, the Kalman–Yakubovich equation, and so on. The proposed algorithm is applicable for any problems as long as \(A_{t}\) and \(B_{t}\) have full column-rank and full row-rank, respectively, for all t. The convergence rate of the algorithm is governed by \(\sqrt{1-\kappa ^{-2}}\) where κ is the condition number of \(\sum_{t=1}^{p}B_{t}^{T}\otimes A_{t}\). As applications, our algorithm can be adapted to the discretization of the one-dimensional heat equation and the two-dimensional Poisson’s equation. According to numerical simulations, our algorithms converge fast to the exact solution in spite of a little more computational time compared to other methods. The numerical examples for heat and Poisson’s equations in Sects. 6 and 7 guarantee the capability and adaptability of our proposed algorithms.