Keywords

1 Introduction

1.1 Nonlinear Multi-output Regression

We formulate a nonlinear multi-output regression task [1,2,3]: let \(\mathbf {f}\) be an unknown smooth mapping from an input space \(\mathbf {X}\subset \mathbb {R}^q\) to m-dimensional output space \(\mathbb {R}^m\). Given a training data set

$$\begin{aligned} \mathbf {Z}_{(n)} = \left\{ \mathbf {Z}_i = \left( \mathbf {x}_{i}, \mathbf {y}_i = \mathbf {f}(\mathbf {x}_i) \right) , i = 1, 2, \ldots , n\right\} , \end{aligned}$$
(1)

consisting of input-output pairs, the task is to construct the function \(\mathbf {y}^* = \mathbf {f}^*(\mathbf {x}) = \mathbf {f}^*(\mathbf {x}|\mathbf {Z}_{(n)})\) to predict the true output \(\mathbf {y}= \mathbf {f}(\mathbf {x})\) for an arbitrary Out-of-Sample (OoS) input \(\mathbf {x}\in \mathbf {X}\) with small predictive error \(|\mathbf {y}^* - \mathbf {y}|\). In engineering applications \(\mathbf {f}^*(\mathbf {x})\) is usually used as a surrogate of some target function [4]. Most of optimization algorithms use gradient of the optimized function; in this case, the regression method also should allow estimating \(m\times q\) Jacobian matrix \(\mathbf {J}_f(\mathbf {x}) = \nabla _x\mathbf {f}(\mathbf {x})\) of the mapping \(\mathbf {f}(\mathbf {x})\) at an arbitrary input point \(\mathbf {x}\in \mathbf {X}\).

There exist various regression methods such as least squares (LS) techniques (linear and nonlinear), artificial neural networks, kernel nonparametric regression, Gaussian process regression, kriging regression, etc. [1,2,3, 5,6,7,8,9,10,11,12,13,14,15,16]. A classical approach is based on Kernel Nonparametric Regression (KNR) [7]: we select the kernel function \(K(\mathbf {x}, \mathbf {x}')\) (see [17]) and construct the KNR-estimator

$$\begin{aligned} \mathbf {f}_{KNR}(\mathbf {x}) = \frac{1}{K(\mathbf {x})}\sum _{j=1}^nK(\mathbf {x},\mathbf {x}_j)\cdot y_j,\,\,K(\mathbf {x}) = \sum _{j=1}^nK(\mathbf {x},\mathbf {x}_j), \end{aligned}$$
(2)

which minimizes (over \(\hat{\mathbf {y}}\)) the residual \(\sum _{j=1}^nK(\mathbf {x},\mathbf {x}_j)\left| \hat{\mathbf {y}}-\mathbf {y}_j\right| ^2\).

The symmetric non-negative definite function \(K(\mathbf {x}, \mathbf {x}')\) can be interpreted as a covariance function of some random field \(\mathbf {y}(\mathbf {x})\); thus, the unknown function \(\mathbf {f}(\mathbf {x})\) can be interpreted as a realization of the random field \(\mathbf {y}(\mathbf {x})\) and \(K(\mathbf {x}, \mathbf {x}') = \mathrm {cov}(\mathbf {f}(\mathbf {x}), \mathbf {f}(\mathbf {x}'))\). If we consider only the first and second moments of this random field, then without loss of generality we can assume that this field is Gaussian and as a result obtain so-called Gaussian Process Regression [5, 6, 18, 19].

One of the most popular kernel estimators is kriging, first developed by Krige [20] and popularized by Sacks [21]. Kriging provides both global predictions and their uncertainty. Kriging-based surrogate models are widely used in engineering modeling and optimization [4, 22,23,24].

Kriging regression combines both linear LS and KNR approaches: the deviation of the unknown function \(\mathbf {f}(\mathbf {x})\) from its LS estimator, constructed on basis of some functional dictionary, is modeled by a zero mean Gaussian random field with the covariance function \(K(\mathbf {x}, \mathbf {x}')\). Thus we can estimate the deviation at the point \(\mathbf {x}\) using some filtration procedure and known deviations at the sample points \(\{\mathbf {x}_i\}\). Usually stationary covariance functions \(K(\mathbf {x}, \mathbf {x}')\) are used that depend on their arguments \(\mathbf {x}\) and \(\mathbf {x}'\) only through the difference \((\mathbf {x}- \mathbf {x}')\).

1.2 Learning with Non-stationary Kernels

Many methods use kernels that are stationary. However, as indicated e.g. in [2, 3, 5, 6], such methods have serious drawbacks in case of functions with strongly varying gradients. Traditional kriging “is stationary in nature” and has low accuracy in case of functions with “non-stationary responses” (significant changes in “smoothness”) [25, 26]. Figure 1 illustrates this phenomenon by the Xiong function \(\mathbf {f}(\mathbf {x}) = \sin (30(\mathbf {x}- 0.9)^4)\cdot \cos (2(\mathbf {x}- 0.9)) + (\mathbf {x}- 0.9)/2,\) \(\mathbf {x}\in [0, 1]\), and its kriging estimator with a stationary kernel [25]. Therefore, non-stationary kernels with adaptive kernel width are used to estimate non-regular functions. There are strategies for constructing the non-stationary kernels [26].

Fig. 1.
figure 1

Example of Kriging prediction with a stationary covariance [25].

The interpretable nonlinear map approach from [27] uses the one-to-one reparameterization function \(\mathbf {u}= \varphi (\mathbf {x})\) with the inverse \(\mathbf {x}= \psi (\mathbf {u})\) to map the Input space \(\mathbf {X}\) to \(\mathbf {U}= \varphi (\mathbf {X})\), such that the covariance function \(k(\mathbf {u}, \mathbf {u}') = K(\psi (\mathbf {u}), \psi (\mathbf {u}')) = \mathrm {cov}(\mathbf {f}(\psi (\mathbf {u})), \mathbf {f}(\psi (\mathbf {u}')))\) becomes approximately stationary. This approach was studied for years in geostatistics in case of relatively low dimensions (\(q = 2, 3\)), and the general case has been considered in [25] with the reparameterization function \(\varphi (\mathbf {x}) = \mathbf {x}_0 + \int _{x_0^{(1)}}^{x^{(1)}}\int _{x_0^{(2)}}^{x^{(2)}}\cdots \int _{x_0^{(q)}}^{x^{(q)}}s(\mathbf {x})d\mathbf {x}\), where \(\mathbf {x}= (x^{(1)}, x^{(2)}, \ldots , x^{(q)})\) and \(s(\mathbf {x})\) is a density function, modelled by a linear combination of some “dictionary” functions with optimized coefficients. A simple one-dimensional illustration of such map is provided in Fig. 2.

Fig. 2.
figure 2

A conceptual illustration of the nonlinear reparameterization function [25].

After such reparameterization, KNR-estimator (2) \(\mathbf {g}_{KNR}(\mathbf {u})\) for the function \(\mathbf {g}(\mathbf {u}) = \mathbf {f}(\psi (\mathbf {u}))\) with the stationary kernel \(k(\mathbf {u}, \mathbf {u}')\) is constructed, and the function \(\mathbf {f}^*(\mathbf {x}) = \mathbf {g}_{KNR}(\varphi (\mathbf {x}))\) is used as an estimator for \(\mathbf {f}(\mathbf {x})\).

1.3 Manifold Learning Regression

A fundamentally different geometrical approach to KNR called Manifold Learning Regression (MLR) was proposed in [10, 11]; MLR also constructs the reparameterization function \(\mathbf {u}= \varphi (\mathbf {x})\) and estimates the Jacobian matrix \(\mathbf {J}_f(\mathbf {x})\).

MLR compares favourably with many conventional regression methods. In Fig. 3 (see [10]) we depict the KNR-estimator \(\mathbf {f}_{KNR}\) (2) with a stationary kernel and the MLR-estimator \(\mathbf {f}_{MLR}\) for the Xiong function \(\mathbf {f}(\mathbf {x})\). The input values in the set \(\mathbf {Z}_{(n)}\), \(n = 100\) were uniformly randomly distributed on the interval [0, 1].

We see that the MLR method provides the essentially smoother estimate. The mean squared errors \(\mathrm {MSE}_{KNR} = 0.0024\) and \(\mathrm {MSE}_{MLR} = 0.0014\) were calculated using the test sample with \(n = 1001\) uniform grid points in the interval [0, 1].

Fig. 3.
figure 3

Reconstruction of the Xiong function (a) by KNR with stationary kernel (b) and MLR (c).

MLR is based on a Manifold Learning approach. Let us represent in the input-output space \(\mathbb {R}^p\), \(p = q + m\), the graph of the function \(\mathbf {f}\) by the smooth q-dimensional manifold (Regression Manifold, RM)

$$\begin{aligned} \mathbf {M}(\mathbf {f}) = \left\{ \mathbf {Z}= \mathbf {F}(\mathbf {x})\in \mathbb {R}^p:\,\mathbf {x}\in \mathbf {X}\subset \mathbb {R}^q\right\} \subset \mathbb {R}^p, \end{aligned}$$
(3)

embedded in the ambient space \(\mathbb {R}^p\) and parameterized by the single chart

$$\begin{aligned} \mathbf {F}:\, \mathbf {x}\in \mathbf {X}\subset \mathbb {R}^q\rightarrow \mathbf {Z}= \mathbf {F}(\mathbf {x}) = \left( \mathbf {x}, \mathbf {f}(\mathbf {x}) \right) \in \mathbb {R}^p. \end{aligned}$$
(4)

Arbitrary function \(\mathbf {f}^*:\,\mathbf {X}\rightarrow \mathbb {R}^m\) also determines the manifold \(\mathbf {M}(\mathbf {f}^*)\) (substitute \(\mathbf {f}^*(\mathbf {x})\) and \(\mathbf {F}^*(\mathbf {x})\) instead of \(\mathbf {f}(\mathbf {x})\) and \(\mathbf {F}(\mathbf {x})\) in (3) and (4)).

In order to apply MLR, we estimate RM \(\mathbf {M}(\mathbf {f})\) using the training data \(\mathbf {Z}_{(n)}\) (1) by the Grassmann & Stiefel Eigenmaps (GSE) algorithm [28]. The constructed estimator \(\mathbf {M}_{GSE} = \mathbf {M}_{GSE}(\mathbf {Z}_{(n)})\), being also a q-dimensional manifold embedded in \(\mathbb {R}^p\), provides small Hausdorff distance \(d_H(\mathbf {M}_{GSE}, \mathbf {M}(\mathbf {f}))\) between these manifolds. In addition, the tangent spaces \(\mathrm {L}(\mathbf {Z})\) to RM \(\mathbf {M}(\mathbf {f})\) at the manifold points \(\mathbf {Z}\in \mathbf {M}(\mathbf {f})\) are estimated by the linear spaces \(\mathrm {L}_{GSE}(\mathbf {Z})\) with “aligned” bases smoothly depending on \(\mathbf {Z}\). GSE also constructs the low-dimensional parameterization \(h(\mathbf {Z})\) of the manifold points \(\mathbf {Z}\) and the recovery mapping \(\mathbf {g}(h)\), which accurately reconstructs \(\mathbf {Z}\) from \(h(\mathbf {Z})\).

To get the estimator \(\mathbf {f}_{MLR}(\mathbf {x})\) of the unknown function \(\mathbf {f}\), we solve the equation \(\mathbf {M}(\mathbf {f}_{MLR}) = \mathbf {M}_{GSE}\). Using the estimator \(\mathrm {L}_{GSE}(\mathbf {F}(\mathbf {x}))\), we also construct \(m\times q\) matrix \(\mathbf {G}_{MLR}(\mathbf {x})\), which estimates the \(m\times q\) Jacobian matrix \(\mathbf {J}_f(\mathbf {x}) = \nabla _x\mathbf {f}(\mathbf {x})\) of \(\mathbf {f}(\mathbf {x})\) at the arbitrary point \(\mathbf {x}\in \mathbf {X}\). Here as the reparameterization function \(\mathbf {u}= \varphi (\mathbf {x})\) we use approximation of the unknown function \(h(\mathbf {F}(\mathbf {x}))\) (it depends on \(\mathbf {f}(\mathbf {x})\), which is unknown at the OoS points \(\mathbf {x}\in \mathbf {X}\)).

1.4 Paper Contribution

The GSE algorithm contains several very computationally expensive steps such as construction of the aligned bases in the estimated tangent spaces, the embedding mapping and the recovery mappings, the reparameterization mapping, etc. Although the incremental version of the GSE algorithm [29] reduces its complexity, still it remains computationally expensive.

The paper proposes a new modified version of the MLR algorithm (mMLR) with significantly less computational complexity. We developed a simplified version of the MLR algorithm, which does not require computationally expensive steps, listed above, so that we can construct the estimators \((\mathbf {f}_{MLR}(\mathbf {x}), \mathbf {G}_{MLR}(\mathbf {x}))\) while preserving the same accuracy. Then instead of using the KNR procedure with a stationary kernel we developed its version with a non-stationary kernel, which is defined on basis of the constructed MLR estimators.

Note that in this paper we consider the case when the input domain \(\mathbf {X}\subset \mathbb {R}^q\) is a “full-dimensional” subset of \(\mathbb {R}^q\) (i.e., the intrinsic dimension of \(\mathbf {X}\) is equal to q) in contrast to [6, 16], where \(\mathbf {X}\) is a low-dimensional manifold in \(\mathbb {R}^q\). In [30] they reviewed approaches to the regression with manifold valued inputs.

The paper is organized as follows. Section 2 describes some details of the GSE/MLR algorithms; the proposed mMLR algorithm is described in Sect. 3.

2 Manifold Learning Regression

2.1 Tangent Bundle Manifold Estimation Problem

The MLR algorithm is based on the solution of the Tangent bundle manifold estimation problem [31, 32]: estimate RM \(\mathbf {M}(\mathbf {f})\) (3) from the dataset \(\mathbf {Z}_{(n)}\) (1), sampled from \(\mathbf {M}(\mathbf {f})\). The manifold estimation problem is to construct:

  • the embedding mapping h from RM \(\mathbf {M}(\mathbf {f})\) to the q-dimensional Feature Space (FS) \(\mathbf {T}_h = h(\mathbf {M}(\mathbf {f}))\), which provides low-dimensional parameterization (coordinates) \(h(\mathbf {Z})\) of the manifold points \(\mathbf {Z}\in \mathbf {M}(\mathbf {f})\),

  • the recovery mapping \(\mathbf {g}(t)\) from FS \(\mathbf {T}_h\) to \(\mathbb {R}^p\), which recovers the manifold points \(\mathbf {Z}= \mathbf {g}(t)\) from their low-dimensional coordinates \(t = h(\mathbf {Z})\),

such that the recovered value \(r_{h,g}(\mathbf {Z}) = \mathbf {g}(h(\mathbf {Z}))\) is close to the initial vector \(\mathbf {Z}\):

$$\begin{aligned} \mathbf {g}(h(\mathbf {Z}))\approx \mathbf {Z}, \end{aligned}$$
(5)

i.e. the recovery error \(\delta _{h,g}(\mathbf {Z}) = |r_{h,g}(\mathbf {Z}) - \mathbf {Z}|\) is small. These mappings determine the q-dimensional Recovered Regression manifold (RRM)

$$\begin{aligned} \mathbf {M}_{h,g}&= r_{h,g}(\mathbf {M}(\mathbf {f})) = \{r_{h,g}(\mathbf {Z})\in \mathbb {R}^p:\,\mathbf {Z}\in \mathbf {M}(\mathbf {f})\} \nonumber \\&=\{\mathbf {Z}= \mathbf {g}(t)\in \mathbb {R}^p:\,t\in \mathbf {T}_h = h(\mathbf {M}(\mathbf {f}))\subset \mathbb {R}^q\}, \end{aligned}$$
(6)

which is embedded in the ambient space \(\mathbb {R}^p\), covered by the single chart \(\mathbf {g}\), and consists of all recovered values \(r_{h,g}(\mathbf {Z})\) of the manifold points \(\mathbf {Z}\). Thanks to (5) we get proximity of the manifolds \(\mathbf {M}_{h,g}\approx \mathbf {M}(\mathbf {f})\), i.e. the Hausdorff distance \(d_H(\mathbf {M}_{h,g}, \mathbf {M}(\mathbf {f}))\) between RM \(\mathbf {M}(\mathbf {f})\) and RRM \(\mathbf {M}_{h,g}\) (6) is small due the inequality \(d_H(\mathbf {M}_{h,g}, \mathbf {M}(\mathbf {f}))\le \sup _{\mathbf {Z}\in \mathbf {M}(\mathbf {f})}\delta _{h,g}(\mathbf {Z})\).

The manifold proximity (5) at the OoS point \(\mathbf {Z}\in \mathbf {M}(\mathbf {f})\setminus \mathbf {Z}_{(n)}\) characterizes the generalization ability of the solution \((h, \mathbf {g})\) at the specific point \(\mathbf {Z}\). Good generalization ability requires [32] that the pair \((h, \mathbf {g})\) should provide the tangent proximities \(\mathrm {L}_{h,g}(\mathbf {Z})\approx \mathrm {L}(\mathbf {Z})\) between the tangent spaces \(\mathrm {L}(\mathbf {Z})\) to RM \(\mathbf {M}(\mathbf {f})\) at points \(\mathbf {Z}\in \mathbf {M}(\mathbf {f})\) and the tangent spaces \(\mathrm {L}_{h,g}(\mathbf {Z}) = \mathrm {Span}(\mathbf {J}_g(h(\mathbf {Z})))\) (spanned by columns of the Jacobian matrix \(\mathbf {J}_g(t)\) of the mapping \(\mathbf {g}\) at the point \(t = h(\mathbf {Z})\)) to RRM \(\mathbf {M}_{h,g}\) at the recovered points \(r_{h,g}(\mathbf {Z})\in \mathbf {M}_{h,g}\). Note that the tangent proximity is defined in terms of a chosen distance between these tangent spaces considered as elements of the Grassmann manifold \(\mathrm {Grass}(p, q)\), consisting of all q-dimensional linear subspaces in \(\mathbb {R}^p\).

The set of manifold points equipped with the tangent spaces at these points is called the Tangent bundle of the manifold [33], and therefore we refer to the manifold estimation problem with the tangent proximity requirement as the Tangent bundle manifold learning problem [31]. The GSE algorithm, briefly described in the next section, provides the solution to this problem.

2.2 Grassmann and Stiefel Eigenmaps Algorithm

The GSE algorithm consists of the three successively performed steps: tangent manifold learning, manifold embedding, and manifold recovery.

Tangent Manifold Learning. We construct the sample-based \(p\times q\) matrices \(\mathbf {H}(\mathbf {Z})\) with columns \(\{\mathbf {H}^{(k)}(\mathbf {Z})\in \mathbb {R}^p,\, 1\le k \le q\}\), smoothly depending on \(\mathbf {Z}\), to meet the relations \(\mathrm {Span}(\mathbf {H}(\mathbf {Z}))\approx \mathrm {L}(\mathbf {Z})\) and \(\nabla _{\mathbf {H}^{(i)}(\mathbf {Z})}\mathbf {H}^{(j)}(\mathbf {Z}) = \nabla _{\mathbf {H}^{(j)}(\mathbf {Z})}\mathbf {H}^{(i)}(\mathbf {Z})\) (covariant differentiation is used here), \(1 \le i < j \le q\), for all points \(\mathbf {Z}\in \mathbf {M}(\mathbf {f})\).

The latter condition provides that these columns are coordinate tangent fields on RM \(\mathbf {M}(\mathbf {f})\) and, thus, \(\mathbf {H}(\mathbf {Z})\) is the Jacobian matrix of some mapping [33]. Thus the mappings h and \(\mathbf {g}\) are constructed in such a way that

$$\begin{aligned} \mathbf {J}_g(h(\mathbf {Z})) = \mathbf {H}(\mathbf {Z}). \end{aligned}$$
(7)

Using Principal Component Analysis (PCA), we estimate the tangent space \(\mathrm {L}(\mathbf {Z})\) at the sample point \(\mathbf {Z}\in \mathbf {Z}_{(n)}\) [34] by the q-dimensional linear space \(\mathrm {L}_{PCA}(\mathbf {Z})\), spanned by the eigenvectors of the local sample covariance matrix

$$\begin{aligned} \varSigma (\mathbf {Z}|K_p)=\frac{1}{K_p(\mathbf {Z})}\sum _{j=1}^nK_p(\mathbf {Z},\mathbf {Z}_j)\cdot [(\mathbf {Z}_j-\mathbf {Z})\cdot (\mathbf {Z}_j-\mathbf {Z})^{\mathrm {T}}], \end{aligned}$$
(8)

corresponding to the q largest eigenvalues; here \(K_p(\mathbf {Z}) = \sum _{j=1}^nK_p(\mathbf {Z},\mathbf {Z}_j)\) and \(K_p(\mathbf {Z}, \mathbf {Z}')\) is a stationary kernel in \(\mathbb {R}^p\) (e.g., the indicator kernel \(\mathrm {I}\{|\mathbf {Z}- \mathbf {Z}'| < \varepsilon \}\) or the heat kernel [35] \(K_{p,\varepsilon ,\rho }(\mathbf {Z}, \mathbf {Z}') = \mathrm {I}\{|\mathbf {Z}- \mathbf {Z}'| \le \varepsilon \}\cdot \exp \{-\rho \cdot |\mathbf {Z}- \mathbf {Z}'|^2\}\) with the parameters \(\varepsilon \) and \(\rho \)).

We construct the matrices \(\mathbf {H}(\mathbf {Z})\) to meet the relations

$$\begin{aligned} \mathrm {Span}(\mathbf {H}(\mathbf {Z})) = \mathrm {L}_{PCA}(\mathbf {Z}), \end{aligned}$$
(9)

therefore, the required proximity \(\mathrm {Span}(\mathbf {H}(\mathbf {Z}))\approx \mathrm {L}(\mathbf {Z})\) follows automatically from the approximate equalities \(\mathrm {L}_{PCA}(\mathbf {Z})\approx \mathrm {L}(\mathbf {Z})\), which are satisfied when RM \(\mathbf {M}(\mathbf {f})\) is “well sampled” and the parameter \(\varepsilon \) is small enough [36].

The principal components form the orthogonal basis in the linear space \(\mathrm {L}_{PCA}(\mathbf {Z})\). Let us denote the \(p\times q\) matrix with the principal components as columns by \(Q_{PCA}(\mathbf {Z})\). However, for different \(\mathbf {Z}\) these bases are not agreed with each other and can be very different even in neighboring points. While preserving the requirements (9), the GSE algorithm constructs other bases in these linear spaces, determined by the \(p\times q\) matrices

$$\begin{aligned} \mathbf {H}_{GSE}(\mathbf {Z}) = Q_{PCA}(\mathbf {Z})\cdot v(\mathbf {Z}). \end{aligned}$$
(10)

Here \(q\times q\) nonsingular matrices \(v(\mathbf {Z})\) should provide smooth dependency of \(\mathbf {H}(\mathbf {Z})\) on \(\mathbf {Z}\) and coordinateness of the tangent fields \(\{\mathbf {H}^{(k)}(\mathbf {Z})\in \mathbb {R}^p,\,1 \le k \le q\}\).

At the sample points the matrices \({\mathbf {H}_i = \mathbf {H}_{GSE}(\mathbf {Z}_i)}\) (10) are constructed to minimize the quadratic form \(\sum _{i,j=1}^nK_p(\mathbf {Z}_i,\mathbf {Z}_j)\cdot \Vert \mathbf {H}_i-\mathbf {H}_j\Vert _F^2\) under the coordinateness constraint and certain normalizing condition, required to avoid a degenerate solution; here \(\Vert \cdot \Vert _F\) is the Frobenius matrix norm. The exact solution of this problem is obtained in the explicit form; at the OoS points \(\mathbf {Z}\), the matrices \(\mathbf {H}_{GSE}(\mathbf {Z})\) are constructed using certain interpolation procedure.

Manifold Embedding. After we construct the matrices \(\mathbf {H}_{GSE}(\mathbf {Z})\) and assuming that the conditions (5) and (9) are satisfied, we use the Taylor series expansion of the mapping \(\mathbf {g}(t)\), \(t = h(\mathbf {Z})\) to get the relation \(\mathbf {Z}' - \mathbf {Z}\approx \mathbf {H}_{GSE}(\mathbf {Z})\cdot (h(\mathbf {Z}') - h(\mathbf {Z}))\) for the neighboring points \(\mathbf {Z}, \mathbf {Z}'\in \mathbf {M}(\mathbf {f})\). These relations, considered further as regression equations, allow constructing the embedding mapping \(h_{GSE}(\mathbf {Z})\) and FS \(\mathbf {T}_h = h(\mathbf {M}(\mathbf {f}))\).

Manifold Recovery. After we construct the matrices \(\mathbf {H}_{GSE}(\mathbf {Z})\) and the mapping \(h_{GSE}\), using known values \(\{\mathbf {g}(t_i)\approx \mathbf {Z}_i\}\) (5) and \(\{\mathbf {J}_g(t_i) = \mathbf {H}_i\}\) (9), \(t_i = h_{GSE}(\mathbf {Z}_i)\), we construct the mapping \(\mathbf {g}_{GSE}(t)\) and the estimator \(\mathbf {G}_{GSE}(t)\) for its covariance matrix \(\mathbf {J}_g(t)\).

2.3 Manifold Learning Regression Algorithm

We split the p-dimensional vector \(\mathbf {Z}=\left( \begin{array}{c} \mathbf {Z}_{in} \\ \mathbf {Z}_{out} \end{array} \right) \), \(p = q + m\), into the q-dimensional vector \(\mathbf {Z}_{in}\) and the m-dimensional vector \(\mathbf {Z}_{out}\) and obtain the corresponding partitions

$$\begin{aligned} \mathbf {H}_{GSE}(\mathbf {Z})&= \left( \begin{array}{c} \mathbf {H}_{GSE,in}(\mathbf {Z}) \\ \mathbf {H}_{GSE,out}(\mathbf {Z}) \end{array}\right) , \, Q_{PCA}(\mathbf {Z}) = \left( \begin{array}{c} Q_{PCA,in}(\mathbf {Z}) \\ Q_{PCA,out}(\mathbf {Z}) \end{array}\right) ,\nonumber \\ \mathbf {g}_{GSE}(t)&= \left( \begin{array}{c} \mathbf {g}_{GSE,in}(t) \\ \mathbf {g}_{GSE,out}(t) \end{array} \right) ,\, \mathbf {G}_{GSE}(t) = \left( \begin{array}{c} \mathbf {G}_{GSE,in}(t) \\ \mathbf {G}_{GSE,out}(t) \end{array} \right) \end{aligned}$$
(11)

of the \(p\times q\) matrices \(\mathbf {H}_{GSE}(\mathbf {Z})\) and \(Q_{PCA}(\mathbf {Z})\), the p-dimensional vector \(\mathbf {g}_{GSE}(t)\), and the \(p\times q\) matrix \(\mathbf {G}_{GSE}(t)\); note that the \(q\times q\) matrix \(\mathbf {G}_{GSE,in}(t)\) and the \(m\times q\) matrix \(\mathbf {G}_{GSE,out}(t)\) are the Jacobian matrices of the mappings \(\mathbf {g}_{GSE,in}(t)\) and \(\mathbf {g}_{GSE,out}(t)\), respectively.

It follows from the proximities (5), (9) and the partition (11) with \(\mathbf {Z}= \mathbf {F}(\mathbf {x})\) (4) that

$$\begin{aligned} \mathbf {g}_{GSE,in}(h_{GSE}(\mathbf {F}(\mathbf {x})))\approx \mathbf {x},\,\, \mathbf {g}_{GSE,out}(h_{GSE}(\mathbf {F}(\mathbf {x})))\approx \mathbf {f}(\mathbf {x}), \end{aligned}$$
(12)

but the left part of the latter equation cannot be used for estimating the unknown function \(\mathbf {f}(\mathbf {x})\) since it depends on the function \(h_{GSE}(\mathbf {F}(\mathbf {x}))\), which in its turn depends on the function \(\mathbf {f}(\mathbf {x})\).

According to the MLR approach we construct the estimator \(\varphi (\mathbf {x})\) for the function \(h_{GSE}(\mathbf {F}(\mathbf {x}))\) as follows. We have two parameterizations of the manifold points \(\mathbf {Z}= \mathbf {F}(\mathbf {x})\in \mathbf {M}(\mathbf {f})\): the “natural” parameterization by the input \(\mathbf {x}\in \mathbf {X}\) and the GSE-parameterization \(t = h_{GSE}(\mathbf {Z})\), which are linked by the unknown one-to-one mapping \(t = \varphi (\mathbf {x})\), whose values \(\{\varphi (\mathbf {x}_i) = t_i = h_{GSE}(\mathbf {Z}_i)\}\) are known at the sample inputs \(\{\mathbf {x}_i\}\). The relations (5) and (12) imply that \(\mathbf {g}_{GSE,in}(\varphi (\mathbf {x}))\approx \mathbf {x}\) and \(\mathbf {G}_{GSE,in}(\varphi (\mathbf {x})) \cdot \mathbf {J}_{\varphi }(\mathbf {x})\approx \mathrm {I}_q\). Thus we get that \(\mathbf {J}_{\varphi }(\mathbf {x})\approx \mathbf {G}_{GSE,in}^{-1}(\varphi (\mathbf {x}))\); here \(\mathbf {J}_{\varphi }(\mathbf {x})\) is the Jacobian matrix of the mapping \(\varphi (\mathbf {x})\). Therefore, the known matrices \(\{\mathbf {G}_{GSE,in}^{-1}(\varphi (\mathbf {x}_i)) = \mathbf {G}_{GSE,in}^{-1}(t_i)\}\) estimate the Jacobian matrices \(\{\mathbf {J}_{\varphi }(\mathbf {x}_i)\}\) at the sample inputs \(\{\mathbf {x}_i\}\).

Based on the known values \(\{(\varphi (\mathbf {x}_i), \mathbf {J}_{\varphi }(\mathbf {x}_i))\}\), \(\varphi (\mathbf {x})\) is estimated at the arbitrary point \(\mathbf {x}\) by \(\varphi _{MLR}(\mathbf {x}) = \frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \{t_j+\mathbf {G}_{GSE,in}^{-1}\cdot (\mathbf {x}-\mathbf {x}_j)\}\); here \(K_q(\mathbf {x}, \mathbf {x}')\) is a stationary kernel in \(\mathbb {R}^q\) (like \(K_{p,\varepsilon ,\rho }\), but defined in \(\mathbb {R}^q\)).

The relations (12) imply that \(\mathbf {G}_{GSE,out}(\varphi (\mathbf {x})) \cdot \mathbf {J}_{\varphi }(\mathbf {x})\approx \mathbf {J}_f(\mathbf {x})\) and we get

$$\begin{aligned} \mathbf {f}_{MLR}(\mathbf {x})&= \mathbf {g}_{GSE,out}(\varphi _{MLR}(\mathbf {x})), \end{aligned}$$
(13)
$$\begin{aligned} \mathbf {G}_{MLR}(\mathbf {x})&= \mathbf {G}_{GSE,out}(\varphi _{MLR}(\mathbf {x})) \cdot \mathbf {G}_{GSE,in}^{-1}(\varphi _{MLR}(\mathbf {x})) \end{aligned}$$
(14)

as the estimators for the unknown function \(\mathbf {f}(\mathbf {x})\) and its Jacobian matrix \(\mathbf {J}_f(\mathbf {x})\).

Note that the estimators (13), (14) require constructing the aligned bases (matrices \(\mathbf {H}_{GSE}(\mathbf {Z})\)), the embedding mapping \(h_{GSE}(\mathbf {Z})\), the recovery mapping \(\mathbf {g}_{GSE}(t)\) and the estimator \(\mathbf {G}_{GSE}(t)\) for its Jacobian matrix, and the reparameterization mapping \(\varphi _{MLR}(\mathbf {x})\). These GSE steps are computationally expensive, even if the incremental version of GSE is used [29].

3 Modified Manifold Learning Regression

The proposed modified version of the MLR method consists of the following parts: constructing both the PCA-approximations for the tangent spaces at the sample points (as in case of the GSE algorithm) and the preliminary estimation of \(\mathbf {f}(\mathbf {x})\) for arbitrary inputs (Sect. 3.1), constructing both the PCA-approximations \(\mathrm {L}_{PCA}(\mathbf {Z})\) at the OoS points \(\mathbf {Z}= \mathbf {F}(\mathbf {x})\) and the estimators \(\mathbf {G}_{MLR}(\mathbf {x})\) of the Jacobian matrix \(\mathbf {J}_f(\mathbf {x})\) for arbitrary inputs (Sect. 3.2), constructing the non-stationary kernels based on the preliminary MLR estimators and their usage for construction of both the new adaptive PCA-approximations and the final estimators \((\mathbf {f}_{mMLR}(\mathbf {x}), \mathbf {G}_{mMLR}(\mathbf {x}))\).

3.1 Preliminary Estimation of Unknown Functions

We start from the standard PCA-approximations for the tangent spaces \(\mathrm {L}(\mathbf {Z})\) at the sample points.

Step 1. Given the training dataset \(\mathbf {Z}_{(n)}\) (1), \(p\times q\) matrices \(Q_{PCA}(\mathbf {Z}_i)\) and linear spaces \(\mathrm {L}_{PCA}(\mathbf {Z}_i) = \mathrm {Span}(Q_{PCA}(\mathbf {Z}_i))\), \(i = 1, 2, \ldots , n\), are constructed as in Sect. 2.2.

Let \(\{\mathbf {H}_{GSE}(\mathbf {Z}_i) = Q_{PCA}(\mathbf {Z}_i)\cdot v(\mathbf {Z}_i)\}\) (10) be the GSE-matrices, computed after the estimation of the aligning matrices \(\{v(\mathbf {Z}_i)\}\). It follows from (7) and (9)–(11) that

$$\begin{aligned} \mathbf {G}_{GSE,in}(h_{GSE}(\mathbf {Z}))&= \mathbf {H}_{GSE,in}(\mathbf {Z}) = Q_{PCA,in}(\mathbf {Z})\cdot v(\mathbf {Z}), \\ \mathbf {G}_{GSE,out}(h_{GSE}(\mathbf {Z}))&= \mathbf {H}_{GSE,out}(\mathbf {Z}) = Q_{PCA,out}(\mathbf {Z})\cdot v(\mathbf {Z}). \end{aligned}$$

Thus the estimator \(\mathbf {G}_{MLR}(\mathbf {x})\) (14) at the sample inputs \(\{\mathbf {x}_i\}\) is equal to

$$\begin{aligned}&\mathbf {G}_{MLR}(\mathbf {x}_i) = \mathbf {H}_{GSE,out}(\mathbf {Z}_i)\cdot \mathbf {H}_{GSE,in}^{-1}(\mathbf {Z}_i) \nonumber \\&= Q_{PCA,out}(\mathbf {Z}_i) v(\mathbf {Z}_i) v^{-1}(\mathbf {Z}_i) Q_{PCA,in}(\mathbf {Z}_i) = Q_{PCA,out}(\mathbf {Z}_i)Q_{PCA,in}^{-1}(\mathbf {Z}_i) \end{aligned}$$
(15)

and depends only on the PCA-matrices \(\{Q_{PCA}(\mathbf {Z}_i)\}\), not on the matrices \(v(\mathbf {Z}_i)\).

Step 2. Compute the estimators \(\{\mathbf {G}_{MLR}(\mathbf {x}_i)\}\) (15) for \(i = 1, 2, \ldots , n\).

After the Step 2 we obtain values \({\mathbf {G}_{MLR}(\mathbf {x}_i)}\) of the Jacobian matrix of \(\mathbf {f}(\mathbf {x})\) at the sample inputs. Using the Taylor series expansion we get that \(\mathbf {f}(\mathbf {x})\approx \mathbf {f}(\mathbf {x}') + \mathbf {J}_f(\mathbf {x}')\cdot (\mathbf {x}- \mathbf {x}')\) for the neighboring input points \(\mathbf {x}, \mathbf {x}'\in \mathbf {X}\). We construct the estimator \(\mathbf {f}^*(\mathbf {x})\) for \(\mathbf {f}(\mathbf {x})\) at the arbitrary point \(\mathbf {x}\) as a solution to the regression problem with known Jacobian values at sample points [30] by minimizing the residual \(\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot |\mathbf {y}-\mathbf {y}_j-\mathbf {G}_{MLR}(\mathbf {x}_j)\cdot (\mathbf {x}-\mathbf {x}_j)|^2\) over \(\mathbf {y}\).

Step 3. Compute the estimator \(\mathbf {f}^*(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\)

$$\begin{aligned} \mathbf {f}^*(\mathbf {x})&= \frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \{\mathbf {y}_j+\mathbf {G}_{MLR}(\mathbf {x}_j)\cdot (\mathbf {x}-\mathbf {x}_j)\} \nonumber \\&= \mathbf {f}_{sKNR}(\mathbf {x}) + \frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \mathbf {G}_{MLR}(\mathbf {x}_j)\cdot (\mathbf {x}-\mathbf {x}_j). \end{aligned}$$
(16)

Here \(\mathbf {f}_{sKNR}(\mathbf {x}) = \frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \mathbf {y}_j\) is the KNR-estimator (2) with a stationary kernel.

Note that the estimators \(\mathbf {f}^*(\mathbf {x})\) (16) and \(\{\mathbf {G}_{MLR}(\mathbf {x}_i)\}\) (15) coincide with the MLR-estimators (13) and (14) but they have significantly lower computational complexity.

3.2 Estimation of Jacobian Matrix at Arbitrary Point

The \(p\times q\) matrix \(Q_{PCA}(\mathbf {Z})\) and the tangent space \(\mathrm {L}_{PCA}(\mathbf {Z})\) at the OoS point \(\mathbf {Z}= \mathbf {F}(\mathbf {x})\) are computed using the estimator \(\mathbf {f}^*(\mathbf {x})\) (16). Thus we can define \(\mathbf {F}_{MLR}(\mathbf {x}) = \left( \mathbf {x}, \mathbf {f}^*(\mathbf {x}) \right) \) (4).

Step 4. Compute the \(p\times q\) matrix \(Q_{PCA}(\mathbf {Z}^*)\) at the point \(\mathbf {Z}^* = \mathbf {F}_{MLR}(\mathbf {x})\), such that its columns are the eigenvectors of the matrix \(\varSigma (\mathbf {Z}^*|K_p)\) (8) corresponding to the q largest eigenvalues.

The matrix \(Q_{PCA}(\mathbf {F}_{MLR}(\mathbf {x}))\) estimates the matrix \(Q_{PCA}(\mathbf {F}(\mathbf {x}))\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\). Thus, the relation (14) results in the next step.

Step 5. Compute the preliminary estimator \(\mathbf {G}_{MLR}(\mathbf {x})\) for \(\mathbf {J}_f(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\)

$$\begin{aligned} \mathbf {G}_{MLR}(\mathbf {x}) = Q_{PCA,out}(\mathbf {F}_{MLR}(\mathbf {x}))\cdot Q_{PCA,in}^{-1}(\mathbf {F}_{MLR}(\mathbf {x})). \end{aligned}$$
(17)

Then based on (17) we compute the preliminary estimators

$$\begin{aligned} \mathbf {f}_{MLR}(\mathbf {x})&=\frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \{\mathbf {y}_j+\mathbf {G}_{MLR}(\mathbf {x})\cdot (\mathbf {x}-\mathbf {x}_j)\} \nonumber \\&= \mathbf {f}_{sKNR}(\mathbf {x}) + \mathbf {G}_{MLR}(\mathbf {x})\cdot (\mathbf {x}-\overline{\mathbf {x}}_{sKNR}) \end{aligned}$$
(18)

for \(\mathbf {f}(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\); here \( \overline{\mathbf {x}}_{sKNR} = \frac{1}{K_q(\mathbf {x})}\sum _{j=1}^nK_q(\mathbf {x},\mathbf {x}_j)\cdot \mathbf {x}_j\).

3.3 Estimation of Unknown Function at Arbitrary Point

The estimators \(\mathbf {f}_{MLR}(\mathbf {x})\) (18) and \(\mathbf {G}_{MLR}(\mathbf {x})\) (17) use the stationary kernels \(K_q(\mathbf {x}, \mathbf {x}')\) in (18) and \(K_p(\mathbf {Z}, \mathbf {Z}')\) in \(\varSigma (\mathbf {Z}^*|K_p)\) (7), respectively; here we introduce their non-stationary analogues.

Let \(\mathrm {L}= \mathrm {Span}(Q)\) and \(\mathrm {L}' = \mathrm {Span}(Q')\) be q-dimensional linear spaces in \(\mathbb {R}^p\) whose orthonormal bases are the columns of the \(p\times q\) orthogonal matrices Q and \(Q'\), respectively. Considering them as elements of the Grassmann manifold \(\mathrm {Grass}(p, q)\), let us denote by

$$\begin{aligned} d_{BC}(\mathrm {L}, \mathrm {L}') = \{1 - \mathrm {Det}^2[Q^{\mathrm {T}}\cdot Q']\}^{1/2}\,\, \text{ and } \,\, K_{BC}(\mathrm {L}, \mathrm {L}') = \mathrm {Det}^2[Q^{\mathrm {T}}\cdot Q'] \end{aligned}$$

the Binet-Cauchy metric and the Binet-Cauchy kernel on the Grassmann manifold, respectively [37, 38]. Note that these quantities do not depend on a choice of the orthonormal bases Q and \(Q'\). Let us introduce another Grassmann kernel depending on the threshold \(\tau \) as

$$\begin{aligned} K_{G,\tau }(\mathrm {L}, \mathrm {L}') = \mathrm {I}\{d_{BC}(\mathrm {L},\mathrm {L}') \le \tau \}\cdot K_{BC}(\mathrm {L},\mathrm {L}'). \end{aligned}$$

The final mMLR estimators are constructed by modification of the Steps 1–5 above using the introduced non-stationary kernels. For \(\mathbf {Z}, \mathbf {Z}'\in \mathbf {Z}_{(n)}\), we introduce the non-stationary kernel

$$\begin{aligned} K_{p,MLR}(\mathbf {Z}, \mathbf {Z}') = K_{p,\varepsilon ,\rho }(\mathbf {Z}, \mathbf {Z}')\cdot K_{G,\tau }(\mathrm {L}_{PCA}(\mathbf {Z}), \mathrm {L}_{PCA}(\mathbf {Z}')). \end{aligned}$$
(19)

Step 6 (modified Step 1). The columns of the orthogonal \(p\times q\) matrices \(Q_{mPCA}(\mathbf {Z}_i)\) at sample points consist of the eigenvectors of the matrices \(\varSigma (\mathbf {Z}_i|K_{p,MLR})\) (8) corresponding to its q largest eigenvalues, \(i = 1, 2, \ldots , n\). When calculating the covariance matrices \(\varSigma (\mathbf {Z}_i|K_{p,MLR})\) we use the non-stationary kernels \(K_{p,MLR}\) (19) at the sample points.

Step 7 (modified Step 2). Using (17) with the matrices \(\{Q_{PCA}(\mathbf {Z}_i)\}\) replaced by the matrices \(\{Q_{mPCA}(\mathbf {Z}_i)\}\) we compute the modified \(m\times q\) matrices \(\{\mathbf {G}_{mMLR}(\mathbf {x}_i)\}\).

Step 8 (modified Step 3). The value \(\mathbf {f}^{**}(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\) is computed by (16) with the matrices \(\{\mathbf {G}_{MLR}(\mathbf {x}_i)\}\) replaced by the matrices \(\{\mathbf {G}_{mMLR}(\mathbf {x}_i)\}\).

Step 9 (modified Step 4). We compute the \(p\times q\) matrix \(Q_{mPCA}(\mathbf {Z})\) at the point \(\mathbf {Z}= \mathbf {F}_{mMLR}(\mathbf {x}) = \left( \mathbf {x}, \mathbf {f}^{**}(\mathbf {x}) \right) \) with arbitrary input \(\mathbf {x}\in \mathbf {X}\). Columns of this matrix are the eigenvectors of the matrix \(\varSigma (\mathbf {F}_{mMLR}(\mathbf {x})|K_{p,MLR})\) (8) corresponding to its q largest eigenvalues with the non-stationary kernel \(K_{p,MLR}(\mathbf {Z}, \mathbf {Z}')\) (19), \(\mathbf {Z},\mathbf {Z}'\in \mathbf {Z}_{(n)}\).

Let us denote \(\mathrm {L}_{mPCA}(\mathbf {F}_{mMLR}(\mathbf {x})) = \mathrm {Span}(Q_{mPCA}(\mathbf {F}_{mMLR}(\mathbf {x})))\). For the arbitrary inputs \(\mathbf {x},\mathbf {x}' \in \mathbf {X}\) we introduce the non-stationary kernel

$$\begin{aligned} K_{q,MLR}(\mathbf {x}, \mathbf {x}') = K_{q,\varepsilon ,\rho }(\mathbf {x}, \mathbf {x}')\cdot K_{G,\tau }(\mathrm {L}_{mPCA}(\mathbf {F}_{mMLR}(\mathbf {x})), \mathrm {L}_{mPCA}(\mathbf {F}(\mathbf {x}))). \end{aligned}$$
(20)

Step 10 (modified Step 5). We compute the final estimators \(\mathbf {G}_{mMLR}(\mathbf {x})\) for \(\mathbf {J}_f(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\) by the formula (17), where \(Q_{PCA}(\mathbf {F}_{MLR}(\mathbf {x}))\) is replaced by \(Q_{mPCA}(\mathbf {F}_{mMLR}(\mathbf {x}))\).

After that, we compute the final estimators \(\mathbf {f}_{mMLR}(\mathbf {x})\) for \(\mathbf {f}(\mathbf {x})\) at the arbitrary input \(\mathbf {x}\in \mathbf {X}\) by the formula (18) in which \(\mathbf {G}_{MLR}(\mathbf {x})\) is replaced by \(\mathbf {G}_{mMLR}(\mathbf {x})\), the KNR-estimators \(\mathbf {f}_{sKNR}(\mathbf {x})\) and \(\overline{\mathbf {x}}_{sKNR}\) with the stationary kernel \(K_q\) are replaced by the KNR-estimators \(\mathbf {f}_{nsKNR}(\mathbf {x})\) and \(\overline{\mathbf {x}}_{nsKNR}\) with the non-stationary kernel \(K_{q,MLR}\) (20), respectively.

4 Conclusion

The initially proposed Manifold Learning Regression (MLR) method was based on the GSE-solution to the Tangent Bundle Manifold Learning problem, which is very computationally expensive. The paper proposes a modified version of the MLR method, which does not require to use the most of GSE/MLR steps (such as constructing the aligned bases at the estimated tangent spaces, the embedding and the recovery mappings, the reparameterization mapping, etc.). As a result the modified estimator has significantly smaller computational complexity while preserving its accuracy.