Abstract
Various Dimensionality Reduction algorithms transform initial high-dimensional data into their lower-dimensional representations preserving chosen properties of the initial data. Typically, such algorithms use the solution of large-dimensional optimization problems, and the incremental versions are designed for many popular algorithms to reduce their computational complexity. Under manifold assumption about high-dimensional data, advanced manifold learning algorithms should preserve the Data manifold and its differential properties such as tangent spaces, Riemannian tensor, etc. Incremental version of the Grassmann&Stiefel Eigenmaps manifold learning algorithm, which has asymptotically minimal reconstruction error, is proposed in this paper and has significantly smaller computational complexity in contrast to the initial algorithm.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Machine learning
- Dimensionality reduction
- Manifold learning
- Tangent bundle manifold learning
- Incremental learning
1 Introduction
The general goal of data analysis is to extract previously unknown information from a given dataset. Many data analysis tasks, such as pattern recognition, classification, clustering, prognosis, and others, deal with real-world data that are presented in high-dimensional spaces, and the ‘curse of dimensionality’ phenomena are often an obstacle to the use of many methods for solving these tasks.
Fortunately, in many applications, especially in pattern recognition, the real high-dimensional data occupy only a very small part in the high dimensional ‘observation space’ Rp; it means that an intrinsic dimension q of the data is small compared to the dimension p (usually, q << p) [1, 2]. Various dimensionality reduction (feature extraction) algorithms, whose goal is a finding of a low-dimensional parameterization of such high-dimensional data, transform the data into their low-dimensional representations (features) preserving certain chosen subject-driven data properties [3, 4].
The most popular model of high-dimensional data, which occupy a small part of observation space Rp, is Manifold model in accordance with which the data lie on or near an unknown Data manifold (DM) of known lower dimensionality q < p embedded in an ambient high-dimensional space Rp (Manifold assumption [5] about high-dimensional data). Typically, this assumption is satisfied for ‘real-world’ high-dimensional data obtained from ‘natural’ sources.
Dimensionality reduction under the manifold assumption about processed data are usually referred to as the Manifold learning [6, 7] whose goal is constructing a low-dimensional parameterization of the DM (global low-dimensional coordinates on the DM) from a finite dataset sampled from the DM. This parameterization produces an Embedding mapping from the DM to low-dimensional Feature space that should preserve specific properties of the DM determined by chosen optimized cost function which defines an ‘evaluation measure’ for the dimensionality reduction and reflects the desired properties of the initial data which should be preserved in their features.
Most manifold learning algorithms include the solution of large-dimensional global optimization problems and, thus, are computationally expensive. The incremental versions of many popular algorithms (Locally Linear Embedding, Isomap, Laplacian Eigenmaps, Local Tangent Space Alignment, Hessian Eigenmaps, etc. [6, 7]), which reduce their computational complexity, were developed [8–17].
The manifold learning algorithms are usually used as a first key step in solution of machine learning tasks: the low-dimensional features are used in reduced learning procedures instead of initial high-dimensional data avoiding the curse of dimensionality [18]: ‘dimensionality reduction may be necessary in order to discard redundancy and reduce the computational cost of further operations’ [19]. If the low-dimensional features preserve only specific properties of data, then substantial data losses are possible when using the features instead of the initial data. To prevent these losses, the features should preserve as much as possible available information contained in the high-dimensional data [20]; it means the possibility for recovering the initial data from their features with small reconstruction error. Such Manifold reconstruction algorithms result in both the parameterization and recovery of the unknown DM [21].
Mathematically [22], a ‘preserving the important information of the DM’ means that manifold learning algorithms should ‘recover the geometry’ of the DM, and ‘the information necessary for reconstructing the geometry of the manifold is embodied in its Riemannian metric (tensor)’ [23]. Thus, the learning algorithms should accurately recover Riemannian data manifold that is the DM equipped by Riemannian tensor.
Certain requirement to the recovery follows from the necessity of providing a good generalization capability of the manifold reconstruction algorithms and preserving local structure of the DM: the algorithms should preserve a differential structure of the DM providing proximity between tangent spaces to the DM and Recovered data manifold (RDM) [24]. In the Manifold theory [23, 25], the set composed of the manifold points equipped by tangent spaces at these points is called the Tangent bundle of the manifold; thus, a reconstruction of the DM, which ensures accurate reconstruction of its tangent spaces too, is referred to as the Tangent bundle manifold learning.
Earlier proposed geometrically motivated Grassmann&Stiefel Eigenmaps algorithm (GSE) [24, 26] solves the Tangent bundle manifold learning and recovers Riemannian tensor of the DM; thus, it solves the Riemannian manifold recovery problem.
The GSE, like most manifold learning algorithms, includes the solution of large-dimensional global optimization problems and, thus, is computationally expensive.
In this paper, we propose an incremental version of the GSE that reduces the solution of the computationally expensive global optimization problems to the solution of a sequence of local optimization problems solved in explicit form.
The rest of the paper is organized as follows. Section 2 contains strong definition of the Tangent bundle manifold learning and describes main ideas realized in its GSE-solution. The proposed incremental version of the GSE is presented in Sect. 3.
2 Tangent Bundle Manifold Learning
2.1 Definitions and Assumptions
Consider unknown q-dimensional Data manifold with known intrinsic dimension q
covered by a single chart g and embedded in an ambient p-dimensional space Rp, q < p. The chart g is one-to-one mapping from open bounded Coordinate space \( {\mathbf{Y}} \subset {\text{R}}^{\text{q}} \) to the manifold M = g(Y) with differentiable inverse mapping hg(X) = g−1(X) whose values y = hg(X) ∈ Y give low-dimensional coordinates (representations, features) of high-dimensional manifold-valued data X.
If the mappings hg(X) and g(y) are differentiable and Jg(y) is p × q Jacobian matrix of the mapping g(y), than q-dimensional linear space L(X) = Span(Jg(hg(X))) in Rp is tangent space to the DM M at the point X ∈ M; hereinafter, Span(H) is linear space spanned by columns of arbitrary matrix H.
The tangent spaces can be considered as elements of the Grassmann manifold Grass(p, q) consisting of all q-dimensional linear subspaces in Rp.
Standard inner product in Rp induces an inner product on the tangent space L(X) that defines Riemannian metric (tensor) Δ(X) in each manifold point X ∈ M smoothly varying from point to point; thus, the DM M is a Riemannian manifold (M, Δ).
Let \( {\mathbf{X}}_{\text{n}} = \left\{ {{\text{X}}_{ 1} ,{\text{X}}_{ 2} , \ldots ,{\text{X}}_{\text{n}} } \right\} \) be a dataset randomly sampled from the DM M according to certain (unknown) probability measure whose support coincides with M.
2.2 Tangent Bundle Manifold Learning Definition
Conventional manifold learning problem, called usually Manifold embedding problem [6, 7], is to construct a low-dimensional parameterization of the DM from given sample X n, which produces an Embedding mapping \( {\text{h}}: {\mathbf{M}} \subset {\text{R}}^{\text{p}} \to {\mathbf{Y}}_{\text{h}} = {\text{h}}\left( {\mathbf{M}} \right) \subset {\text{R}}^{\text{q}} \) from the DM M to the Feature space (FS) Y h ⊂ Rq, q < p, which preserves specific chosen properties of the DM.
Manifold reconstruction algorithm, which provides additionally a possibility of accurate recovery of original vectors X from their low-dimensional features y = h(X), includes a constructing of a Recovering mapping g(y) from the FS Y h to the Euclidean space Rp in such a way that the pair (h, g) ensures approximate equalities
The mappings (h, g) determine q-dimensional Recovered data manifold
which is embedded in the ambient space Rp, covered by a single chart g, and consists of all recovered values rh,g(X) of manifold points X ∈ M. Proximities (1) imply manifold proximity M h,g ≈ M meaning a small Hausdorff distance dH(M h,g, M) between the DM M and RDM M h,g due inequality \( {\text{d}}_{\text{H}} ({\mathbf{M}}_{{{\text{h}},{\text{g}}}} ,{\mathbf{M}}) \le { \sup }_{{{\text{X}} \in {\mathbf{M}}}} |{\text{r}}_{{{\text{h}},{\text{g}}}} \left( {\text{X}} \right){-}{\text{X}}| \).
Let G(y) = Jg(y) be p × q Jacobian matrix of the mapping g(y) which determines q-dimensional tangent space Lh,g(X) to the RDM M h,g at the point rh,g(X) ∈ M h,g:
Tangent bundle manifold learning problem is to construct the pair (h, g) of mappings h and g from given sample X n ensuring both the proximities (1) and proximities
proximities (4) are defined with use certain chosen metric on the Grass(p, q).
The matrix G(y) determines also metric tensor \( \Delta_{{{\text{h}},{\text{g}}}} \left( {\text{X}} \right) = {\text{G}}^{\text{T}} \left( {{\text{h}}\left( {\text{X}} \right)} \right) \times {\text{G}}\left( {{\text{h}}\left( {\text{X}} \right)} \right) \) on the RMD M h,g which is q × q matrix consisting of inner products {(Gi(h(X)), Gj(h(X)))} between ith and jth columns Gi(h(X)) and Gj(h(X)) of the matrix G(h(X)). Thus, the pair (h, g) determines Recovered Riemannian manifold (M h,g, Δh,g) that accurately approximates initial Riemannian data manifold (M, Δ).
2.3 Grassmann&Stiefel Eigenmaps: An Approach
Grassmann&Stiefel Eigenmaps algorithm gives the solution to the Tangent bundle manifold learning problem and consists of three successively performed parts: Tangent manifold learning, Manifold embedding, and Manifold recovery.
Tangent Manifold Learning Part.
A sample-based family H consisting of p × q matrices H(X) smoothly depending on X ∈ M is constructed to meet relations
in certain chosen metric on the Grassmann manifold. In next steps, the mappings h and g will be built in such a way that both the equalities (1) and
are fulfilled. Hence, linear space LH(X) (5) approximates the tangent space Lh,g(X) (3) to the RDM M h,g at the point rh,g(X).
Manifold Embedding Part.
Given the family H already constructed, the embedding mapping y = h(X) is constructed as follows. The Taylor series expansions
of the mapping g at near points h(X′), h(X) ∈ Y h, under the desired approximate equalities (1), (6) for the mappings h and g to be specified further, imply equalities:
for near points X, X′ ∈ M. These equations considered further as regression equations allow constructing the embedding mapping h and the FS Y h = h(M).
Manifold Reconstruction Step.
Given the family H and mapping h(X) already constructed, the expansion (7), under the desired proximities (1) and (6), implies relation
for near points y, h(X) ∈ Y h which is used for constructing the mapping g.
2.4 Grassmann&Stiefel Eigenmaps: Some Details
Details of the GSE are presented below. The numbers {εi > 0} denote the algorithms parameters whose values are chosen depending on the sample size n (εi = εi,n) and tend to zero as n → ∞ with rate O(n−1/(q+2)).
Step S1: Neighborhoods (Construction and Description).
The necessary preliminary calculations are performed at first step S1.
Euclidean Kernel.
Introduce Euclidean kernel KE(X, X′) = I{|X′ – X| < ε1} on the DM at points X, X′ ∈ M, here I{·} is indicator function.
Grassmann Kernel.
An applying the Principal Component Analysis (PCA) [27] to the points from the set Un(X, ε1) = {X′ ∈ X n: |X′ – X| < ε1} ∪ {X}, results in p × q orthogonal matrix QPCA(X) whose columns are PCA principal eigenvectors corresponding to the q largest PCA eigenvalues. These matrices determine q-dimensional linear spaces LPCA(X) = Span(QPCA(X)) in Rp, which, under certain conditions, approximate the tangent spaces L(X):
In what follows, we assume that sample size n is large enough to ensure a positive value of the qth PCA-eigenvalue in sample points and provide proximities (10). To provide trade-off between ‘statistical error’ (depending on number n(X) of sample points in set Un(X, ε1)) and ‘curvature error’ (caused by deviation of the manifold-valued points from the ‘assumed in the PCA’ linear space) in (10), ball radius ε1 should tend to 0 as n → ∞ with rate O(n−1/(q+2)), providing, with high probability, the order O(n−1/(q+2)) for the error in (10) [28, 29]; here ‘an event occurs with high probability’ means that its probability exceeds the value (1 – Cα/nα) for any n and α > 0, and the constant Cα depends only on α.
Grassmann kernel KG(X, X′) on the DM at points X, X′ ∈ M is defined as
with use Binet-Cauchy kernel KBC(LPCA(X), LPCA(X′)) = Det2[S(X, X′)] and Binet-Cauchy metric dBC(LPCA(X), LPCA(X′)) = {1 − Det2[S(X, X′)]}1/2 on the Grassmann manifold Grass(p, q) [30, 31], here S(X, X′) = \( {\text{Q}}_{\text{PCA}}^{\text{T}} ({\text{X}}) \times {\text{Q}}_{\text{PCA}} ({\text{X}}^{{\prime }} ) \).
Orthogonal p × p matrix \( \uppi_{\text{PCA}} ({\text{X}}) = {\text{Q}}_{\text{PCA}} \left( {\text{X}} \right) \times {\text{Q}}_{\text{PCA}}^{\text{T}} ({\text{X}}) \) is projector onto linear space LPCA(X) which approximates projection matrix π(X) onto the tangent space L(X).
Aggregate Kernel.
Introduce the kernel K(X, X′) = KE(X, X′) × KG(X, X′), which reflects not only geometrical nearness between points X and X′ but also nearness between the linear spaces LPCA(X) and LPCA(X′) (and, thus (10), nearness between the tangent spaces L(X) and L(X′)), as a product of the Euclidean and Grassmann kernels.
Step S2: Tangent Manifold Learning.
The matrices H(X) will be constructed to meet the equalities LH(X) = LPCA(X) for all points X ∈ M that implies a representation
in which q × q matrices v(x) should provide a smooth depending H(X) on point X.
At first, the p × q matrices {Hi = QPCA(Xi) × vi} are constructed to minimize a form
over q × q matrices v1, v2, …, vn, under normalizing constraint
used to avoid a degenerate solution; here \( {\text{K}}\left( {\text{X}} \right) = \sum\nolimits_{{{\text{j}} = 1}}^{\text{n}} {{\text{K}}\left( {{\text{X}},{\text{X}}_{\text{j}} } \right)} \) and \( {\text{K}} = \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{K}}\left( {{\text{X}}_{\text{i}} } \right)} \).
The quadratic form (12) and the constraint (13) take the forms (K – Tr(V T × Ф × V)) and V T × F × V = K × Iq, respectively, here V is (nq) × q matrix whose transpose consists of the consecutively written transposed q × q matrices v1, v2, …, vn, Φ = ||Φij|| and F = ||Fij|| are nq × nq matrices consisting, respectively, of q × q matrices
Thus, a minimization (12), (13) is reduced to the generalized eigenvector problem
and (nq) × q matrix V, whose columns V1, V2, …, Vq ∈ Rnq are orthonormal eigenvectors corresponding to the q largest eigenvalues in the problem (14), determines the required q × q matrices v1, v2, …, vn.
The value H(X) (11) at arbitrary point X ∈ M is chosen to minimize a form
over v(X) under condition Span(H) = LPCA(X), whose solution is
It follows from above formulas that the q × p matrix
estimates Jacobian matrix Jh(X) of Embedding mapping h(X) constructed afterward, here \( {\text{H}}^{ - } ({\text{X}}) \) is q × p pseudoinverse Moore-Penrose matrix of p × q matrix H(X) [32].
Step S3: Manifold Embedding.
Embedding mapping h(X) with already known (estimated) Jacobian Gh(X) is constructed to meet equalities (8) written for all pairs of near points X, X′ ∈ M which can be considered as regression equations.
At first, the vector set {h1, h2, …, hn} ⊂ Rq is computed as a standard least squares solution in this regression problem by minimizing the residual
over the vectors h1, h2, …, hn under normalizing condition h1 + h2 + … + hn = 0.
Then, considering the obtained vectors {hj} as preliminary values of the mapping h(X) at sample points, choose the value
for arbitrary point X ∈ M as a result of minimizing over h the residual
The mapping (18) determines Feature sample Y h,n = {yh,i = h(Xi), i = 1, 2, …, n}.
Step S4: Manifold Recovery.
A kernel on the FS Y h and, then, the recovering mapping g(y) and its Jacobian matrix G(y) are constructed in this step.
Kernel on the Feature Space.
It follows from (8) that proximities
hold true for near points y = h(X) and yh,i ∈ Y h,n. Let uE(y, ε1) = {yh,i: d(y, yh,i) < ε1} be a neighborhood of the feature y = h(X) consisting of sample features which are images of the sample points from Un(X, ε1).
An applying the PCA to the set h−1(uE(y, ε1)) = {Xi: yh,i ∈ uE(y, ε1)} results in the linear space LPCA*(y) ∈ Grass(p, q) which meets proximity LPCA*(h(X)) ≈ LPCA(X).
Introduce feature kernel k(y, yh,i) = I{yh,i ∈ uE(y, ε1)} × KG(LPCA*(y), LPCA*(yh,i)) that meets equalities k(h(X), h(X′)) ≈ K(X, X′) for near points X ∈ M and X′ ∈ X n.
Constructing the Recovering Mapping and its Jacobian.
The matrix G(y), which should meet both the conditions (6) and constraint Span(G(y)) = LPCA*(y), is chosen by minimizing quadratic form \( \sum\nolimits_{{{\text{j}} = 1}}^{\text{n}} {{\text{k}}\left( {{\text{y}},{\text{y}}_{{{\text{h}},{\text{j}}}} } \right) \times ||{\text{G}}({\text{y}}) - {\text{H}}_{\text{j}} ||_{\text{F}}^{2} } \) over G, that results in
here π*(y) is the projector onto the linear space LPCA*(y) and \( {\text{k}}\left( {\text{y}} \right) = \sum\nolimits_{{{\text{j}} = 1}}^{\text{n}} {{\text{k}}\left( {{\text{y}},{\text{y}}_{{{\text{h}},{\text{j}}}} } \right)} \).
Based on expansions (9) written for features yh,j ∈ uE(y, ε1), g(y) is chosen by minimizing quadratic form \( \sum\nolimits_{{{\text{j}} = 1}}^{\text{n}} {{\text{k}}\left( {{\text{y}},{\text{y}}_{{{\text{h}},{\text{j}}}} } \right) \times \left| {{\text{X}}_{\text{j}} - {\text{g}}\left( {\text{y}} \right) - {\text{G}}({\text{y}}) \times ({\text{y}}_{{{\text{h}},{\text{j}}}} - {\text{y}})} \right|^{2} } \) over g, thus
The constructed mappings (18), (21) allow recovering the DM M and its tangent spaces L(X) by the formulas (2) and (4).
2.5 Grassmann&Stiefel Eigenmaps: Some Properties
Under asymptotic n → ∞, when ε1 = O(n−1/(q+2)), relation dH(M h,g, M) = O(n−2/(q+2)) hold true uniformly in points X ∈ M with high probability [33]. This rate coincides with the asymptotically minimax lower bound for the Hausdorff distance dH(M h,g, M) [34]; thus, the RDM M h,g estimates the DM M with optimal rate of convergence.
The main computational complexity of the GSE-algorithm is in the second and third steps, in which global high-dimensional optimization problems are solved.
First problem is generalized eigenvector problem (14) with nq × nq matrices F and Φ. This problem is solved usually with use the Singular value decomposition (SVD) [32] whose computational complexity is O(n3) [35].
Second problem is regression problem (17) for nq-dimensional estimated vector. This problem is reduced to the solution of the linear least-square normal equations with nq × nq matrix whose computational complexity is O(n3) also [32].
Thus, the GSE has total computational complexity O(n3) and is computationally expensive under large sample size n.
3 Incremental Grassmann&Stiefel Eigenmaps
The incremental version of the GSE divides the most computationally expensive generalized eigenvector and regression problems into n local optimization procedures, each time k solved in explicit form for one new variable (matrix Hk and feature hk) only, k = 1, 2, …, n.
The proposed incremental version includes an additional preliminary step S1+ performed after the Step S1, in which a weighted undirected sample graph Г(X n) consisting of the sample points {Xi} as nodes is constructed and the shortest ways between arbitrary node chosen as an origin of the graph and all the other nodes are calculated.
The second and third steps S2 and S3 are replaced by common incremental step S2–3 in which the matrices {Hk} and features {hk} are computed sequentially at the graph nodes, moving along the shortest paths starting from the chosen origin of the graph. Step S4 in the GSE remains unchanged in the incremental version.
3.1 Step S1+: Sample Graph
Introduce a weighted undirected sample graph Г(X n) consisting of the sample points {Xi} as nodes. The edges in Г(X n) connect the nodes Xi and Xj if and only when K(Xi, Xj) > 0; the lengths of such edge (Xi, Xj) equal to |Xi – Xj|/K(Xi, Xj).
Choose arbitrary node X(1) ∈ Г(Xn) as an origin of the graph. Using the Dijksra algorithm [36], compute the shortest paths between the chosen node and all the other nodes X(2), X(3), …, X(n) writing in ascending order of the lengths of the shortest paths from the origin X(1). Denote Гk a subgraph consisting of the nodes {X(1), X(2), …, X(k)} and connected them edges.
Note.
The origin X(1) can be chosen as a node with minimal eccentricity; an eccentricity of some node equals to maximum of lengths of the shortest paths between the node under consideration and all the other nodes. But a calculation of the shortest ways between all nodes in the graph Г(X n), which should be computed for this construction, require n-fold applying of the Dijksra algorithm.
3.2 Step S2–3: Incremental Tangent Manifold Learning and Manifold Embedding
Incremental version computes sequentially the matrices H(X) and h(X) at the points X(1), X(2), …, X(n), starting from matrix H(1) and h(1) (initialization). Thus, step S2–3 consists of n substeps {S2–3k, k = 1, 2, …, n} in which initialization substep is
Initialization substep S2–31.
Put v(1) = Iq and h(1) = 0; thus, H(X(1)) = QPCA(X(1)).
At the k-th substep S2–3k, k > 1, when the matrices H(j), j < k, have already computed, quadratic form ΔH,k, similar to the form (12) but written only for the points Xi, Xj ∈ Гk, is minimized over single unknown matrix H(k) = QPCA(X(k)) × v(k). This problem, in turn, is reduced to a minimization over v(k) of the form dH,k(H(k)), similar to the form dH,n(H(k)) (15) but written only for points Xj ∈ Гk−1. Its solution v(k), which is similar to the solution (16), is written in explicit form.
Let Δh,k be a quadratic form, similar to the form Δh,n (17) but written only for points Xi, Xj ∈ Гk. The value h(k), under the already computed values h(j), j < k, is calculated by minimizing the quadratic form Δh,k over single vector h(k). This problem, in turn, is reduced to a minimization over h(k) the form dh,k(h(k)), similar to the form dh,n(h(k)) (19) but written only for points Xj ∈ Гk−1; its solution, similar to the solution (18), is written in explicit form also.
Thus, the substeps S2–3k, k = 1, 2, …, n, are:
Typical substep S2–3k, 1 < k ≤ n.
Given {(H(j), h(j)), j < k} already obtained, put
Given {(H(k), h(k)), k = 1, 2, …, n}, the value H(X) = QPCA(X) × v(X) and h(X) at arbitrary point X ∈ M are calculated with use formulas (16) and (18), respectively.
3.3 Incremental GSE: Properties
Computational Complexity.
Incremental GSE works mainly with sample data lying in a neighborhood of some point X contained in ε1-ball Un(X, ε1) centered at X. The number n(X) of sample points fallen into this ball, under ε1 = ε1,n = O(n−1/(q+2)), with high probability equals to n × O(n−q/(q+2)) = O(n2/(q+2)) uniformly on X ∈ M [37].
The sample graph Г(X n) consists of V = n nodes and E edges connecting the graph nodes {Xk}. Each node Xk is connected with no more than n(Xk) other nodes, thus E < 0.5 × n × maxkn(Xk) = O(n(q+4)/(q+2)) and, hence, Г(X n) is sparse graph.
The running time of the Dijksra algorithm (Step S1+), which computes the shortest paths in the sparse connected graph Г(X n), is O(E × lnV) = O(n(q+4)/(q+2) × lnn) in the worst case; the Fibonacci heap improves this rate to O(E + V × lnV) = O(n(q+4)/(q+2)) [38].
The running time of k-th Step S2–3k (formulas (22) and (23)) is proportional to n(Xk); thus total running time of the Step S2–3 is n × O(n−q/(q+2)) = O(n(q+4)/(q+2)).
Therefore, the running time of the incremental version of the GSE is O(n(q+4)/(q+2)), in contrast to the running time O(n3) of the initial GSE.
Accuracy.
It follows from (18), (21) that X - rh,g(X) ≈ \( \left(\pi {_{\text{PCA}}^{\text{T}} \left( {\text{X}} \right) \times {\text{e}}({\text{X}})} \right) \times \left| \delta {({\text{X}})} \right| \), in which \( \delta \left( {\text{X}} \right) \, = {\text{X}} - \frac{1}{{{\text{K}}({\text{X}})}}\sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {{\text{K}}\left( {{\text{X}},{\text{X}}_{\text{i}} } \right) \times {\text{X}}_{\text{i}} } \) and e(X) = δ(X)/|δ(X)|. The first and second multipliers are majorized by the PCA-error in (10) and ε1,n, respectively, each of them has rate O(n−1/(q+2)). Thus, reconstruction error (X − rh,g(X)) in the incremental GSE has the same asymptotically optimal rate O(n−2/(q+2)) as in the original GSE.
4 Conclusion
The incremental version of the Grassmann&Stiefel Eigenmaps algorithm, which constructs the low-dimensional representations of high-dimensional data with asymptotically minimal reconstruction error, is proposed. This version has the same optimal convergence rate O(n−2/(q+2)) of the reconstruction error and a significantly smaller computational complexity on the sample size n: running time O(n(q+4)/(q+2)) of the incremental version in contrast to O(n3) of the original algorithm.
References
Donoho, D.L.: High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture at the “Mathematical Challenges of the 21st Century” Conference of the AMS, Los Angeles (2000). http://www-stat.stanford.edu/donoho/Lectures/AMS2000/AMS2000.html
Verleysen, M.: Learning high-dimensional data. In: Ablameyko, S., Goras, L., Gori, M., Piuri, V. (eds.) Limitations and Future Trends in Neural Computation. NATO Science Series, III: Computer and Systems Sciences, vol. 186, pp. 141–162. IOS Press, Netherlands (2003)
Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives, pp. 1–64 (2014). arXiv:1206.5538v3[cs.LG]. Accessed 23 Apr 2014
Bernstein, A., Kuleshov, A.: Low-dimensional data representation in data analysis. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS, vol. 8774, pp. 47–58. Springer, Heidelberg (2014)
Seung, H.S., Lee, D.D.: The manifold ways of perception. Science 290(5500), 2268–2269 (2000)
Huo, X., Ni, X., Smith, A.K.: Survey of manifold-based learning methods. In: Liao, T.W., Triantaphyllou, E. (eds.) Recent Advances in Data Mining of Enterprise Data, pp. 691–745. World Scientific, Singapore (2007)
Ma, Y., Fu, Y. (eds.): Manifold Learning Theory and Applications. CRC Press, London (2011)
Law, M.H.C., Jain, A.K.: Nonlinear manifold learning for data stream. In: Berry, M., Dayal, U., Kamath, C., Skillicorn, D. (eds.) Proceedings of the 4th SIAM International Conference on Data Mining, Like Buena Vista, Florida, USA, pp. 33–44 (2004)
Law, M.H.C., Jain, A.K.: Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 377–391 (2006)
Gao, X., Liang, J.: An improved incremental nonlinear dimensionality reduction for isometric data embedding. Inf. Process. Lett. 115(4), 492–501 (2015)
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)
Kouropteva, O., Okun, O., Pietikäinen, M.: Incremental locally linear embedding algorithm. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 521–530. Springer, Heidelberg (2005)
Kouropteva, O., Okun, O., Pietikäinen, M.: Incremental locally linear embedding. Pattern Recogn. 38(10), 1764–1767 (2005)
Schuon, S., Ðurković, M., Diepold, K., Scheuerle, J., Markward, S.: Truly incremental locally linear embedding. In: Proceedings of the CoTeSys 1st International Workshop on Cognition for Technical Systems, 6–8 October 2008, Munich, Germany, p. 5 (2008)
Jia, P., Yin, J., Huang, X., Hu, D.: Incremental Laplacian eigenmaps by preserving adjacent information between data points. Pattern Recogn. Lett. 30(16), 1457–1463 (2009)
Liu, X., Yin, J.-w., Feng, Z., Dong, J.: Incremental manifold learning via tangent space alignment. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 107–121. Springer, Heidelberg (2006)
Abdel-Mannan, O., Ben Hamza, A., Youssef, A.: Incremental line tangent space alignment algorithm. In: Proceedings of 2007 Canadian Conference on Electrical and Computer Engineering (CCECE 2007), 22–26 April 2007, Vancouver, pp. 1329–1332. IEEE (2007)
Kuleshov, A., Bernstein, A.: Manifold learning in data mining tasks. In: Perner, P. (ed.) MLDM 2014. LNCS, vol. 8556, pp. 119–133. Springer, Heidelberg (2014)
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, New York (2007)
Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72(7–9), 1431–1443 (2009)
Bernstein, A.V., Kuleshov, A.P.: Data-based manifold reconstruction via tangent bundle manifold learning. In: ICML-2014, Topological Methods for Machine Learning Workshop, Beijing, 25 June 2014. http://topology.cs.wisc.edu/KuleshovBernstein.pdf
Perrault-Joncas, D., Meilă, M.: Non-linear Dimensionality Reduction: Riemannian Metric Estimation and the Problem of Geometric Recovery, pp. 1–25 (2013). arXiv:1305.7255v1[stat.ML]. Accessed 30 May 2013
Jost, J.: Riemannian Geometry and Geometric Analysis, 6th edn. Springer, Berlin (2011)
Bernstein, A.V., Kuleshov, A.P.: Manifold learning: generalizing ability and tangent proximity. Int. J. Softw. Inf. 7(3), 359–390 (2013)
Lee, J.M.: Manifolds and Differential Geometry. Graduate Studies in Mathematics, vol. 107. American Mathematical Society, Providence (2009)
Bernstein, A.V., Kuleshov, A.P.: Tangent bundle manifold learning via Grassmann&Stiefel eigenmaps, pp. 1–25, December 2012. arXiv:1212.6031v1[cs.LG]
Jollie, T.: Principal Component Analysis. Springer, New York (2002)
Singer, A., Wu, H.-T.: Vector diffusion maps and the connection Laplacian. Commun. Pure Appl. Math. 65(8), 1067–1144 (2012)
Tyagi, H., Vural, E., Frossard, P.: Tangent space estimation for smooth embeddings of Riemannian manifold, pp. 1–35 (2013). arXiv:1208.1065v2[stat.CO]. Accessed 17 May 2013
Hamm, J., Daniel, L.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 376–383 (2008)
Wolf, L., Shashua, A.: Learning over sets using kernel principal angles. J. Mach. Learn. Res. 4, 913–931 (2003)
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Kuleshov, A., Bernstein, A., Yanovich, Y.: Asymptotically optimal method in manifold estimation. In: Abstracts of the XXIX-th European Meeting of Statisticians, 20–25 July 2013, Budapest, Hungary, p. 325 (2013). http://ems2013.eu/conf/upload/BEK086_006.pdf
Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Minimax manifold estimation. J. Mach. Learn. Res. 13, 1263–1291 (2012)
Trefethen, L.N.: Bau III, David: Numerical Linear Algebra. SIAM, Philadelphia (1997)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Yanovich, Y.: Asymptotic properties of local sampling on manifolds. J. Math. Stat. (2016)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. Assoc. Comput. Mach. 34(3), 596–615 (1987)
Acknowledgments
This work is partially supported by the Russian Foundation for Basic Research, research project 16-29-09649 ofi-m.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kuleshov, A., Bernstein, A. (2016). Incremental Construction of Low-Dimensional Data Representations. In: Schwenker, F., Abbas, H., El Gayar, N., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2016. Lecture Notes in Computer Science(), vol 9896. Springer, Cham. https://doi.org/10.1007/978-3-319-46182-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-46182-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46181-6
Online ISBN: 978-3-319-46182-3
eBook Packages: Computer ScienceComputer Science (R0)