1. Introduction

It is known that the goal of learning theory is to approximate a function (or some function features) from data samples.

Let be a compact subset of -dimensional Euclidean spaces , . Then, learning theory is to find a function related the input to the output (see [13]). The function is determined by a probability distribution on where is the marginal distribution on and is the condition probability of for a given

Generally, the distribution is known only through a set of sample independently drawn according to . Given a sample , the regression problem based on Support Vector Machine (SVM) learning is to find a function such that is a good estimate of when a new input is provided. The binary classification problem based on SVM learning is to find a function which divides into two parts. Here is often induced by a real-valued function with the form of where if , otherwise, . The functions are often generated from the following Tikhonov regularization scheme (see, e.g., [49]) associated with a reproducing kernel Hilbert space (RKHS) (defined below) and a sample :

(1.1)

where is a positive constant called the regularization parameter and () called -norm SVM loss.

In addition, the Tikhonov regularization scheme involving offset (see, e.g., [4, 10, 11]) can be presented below with a similar way to (1.1)

(1.2)

We are in a position to define reproducing kernel Hilbert space. A function is called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is, for any finite set of distinct points , the matrix is positive semidefinite.

The reproducing kernel Hilbert space (RKHS) (see [12]) associated with the Mercer kernel is defined to be the closure of the linear span of the set of functions with the inner product satisfying and the reproducing property

(1.3)

If , then . Denote as the space of continuous function on with the norm . Let Then the reproducing property tells that

(1.4)

It is easy to see that is a subset of We say that is a universal kernel if for any compact subset is dense in (see [13, Page 2652]).

Let be a given discrete set of finite points. Then, we may define an RKHS by the linear span of the set of functions . Then, it is easy to see that and for any there holds

Define and where the minimum is taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to estimate the regularization errors (see, e.g., [4, 7, 9, 14])

(1.5)
(1.6)

The convergence rate of (1.5) is controlled by the -functional (see, e.g., [9])

(1.7)

and (1.6) is controlled by another -functional (see, e.g., [4])

(1.8)

where with

(1.9)

We notice that, on one hand, the -functionals (1.7) and (1.8) are the modifications of the -functional of interpolation theory (see [15]) since the interpolation relation (1.4). On the other hand, they are different from the usual -functionals (see e.g., [1630]) since the term However, they have some similar point. For example, if is a universal kernel, is dense in (see e.g., [31]). Moreover, some classical function spaces such as the polynomial spaces (see [2, 32]) and even some Sobolev spaces may be regarded as RKHS (see e.g., [33]).

In learning theory we often require and for some (see e.g., [1, 7, 14]). Many results on this topic have been achieved. With the weighted Durrmeyer operators [8, 9] showed the decay by taking to be the algebraic polynomials kernels on or on the simplex in .

However, in general case, the convergence of -functional (1.8) should also be considered since the offset often has influences on the solution of the learning algorithms (see e.g., [6, 11]). Hence, the purpose of this paper is twofold. One is to provide the convergence rates of (1.7) and (1.8) when is a general Mercer kernel on the unit sphere and The other is how to construct functions of the type of

(1.10)

to obtain the convergence rate of (1.8). The translation networks constructed in [3437] have the form of (1.10) and the zonal networks constructed in [38, 39] have the form of (1.10) with . So the methods used by these references may be used here to estimate the convergence rates of (1.7) and (1.8) if one can bound the term

In the present paper, we shall give the convergence rate of (1.7) and (1.8) for a general kernel defined on the unit sphere and with being the usual Lebesgue measure on . If there is a distortion between and the convergence rate of (1.7)-(1.8) in the general case may be obtained according to the way used by [1, 8].

The rest of this paper is organized as follows. In Section 2, we shall restate some notations on spherical harmonics and present the main results. Some useful lemmas dealing with the approximation order for the de la Vallée means of the spherical harmonics, the Gauss integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N. Mhaskar will be given in Section 3. A kind of weighted norm estimate for the Mercer kernel matrices on the unit sphere will be given in Lemma 3.8. Our main results are proved in the last section.

Throughout the paper, we shall write if there exists a constant such that . We write if and .

2. Notations and Results

To state the results of this paper, we need some notations and results on spherical harmonics.

2.1. Notations

For integers , , the class of all one variable algebraic polynomials of degree defined on is denoted by , the class of all spherical harmonics of degree will be denoted by , and the class of all spherical harmonics of degree will be denoted by . The dimension of is given by (see [40, Page 65])

(2.1)

and that of is One has the following well-known addition formula (see [41, Page 10, Theorem ]):

(2.2)

where is the degree- generalized Legendre polynomial. The Legendre polynomials are normalized so that and satisfy the orthogonality relations

(2.3)

Define and by taking to be the usual volume element of and the Jacobi weights functions , , , respectively. For any we have the following relation (see [42, Page 312]):

(2.4)

The orthogonal projections of a function on are defined by (see e.g., [43])

(2.5)

where denotes the inner product of and .

2.2. Main Results

Let satisfy and . Define

(2.6)

Then, by [44, Chapter 17] we know that is positive semidefinite on and the right of (2.6) is convergence absolutely and uniformly since . Therefore, is a Mercer kernel on By [13, Theorem ] we know that is a universal kernel on . We suppose that there is a constant depending only on such for any

(2.7)

Given a finite set , we denote by the cardinality of . For and we say that a finite subset is an -covering of if

(2.8)

where with being the geodesic distance between and .

Let be an integer, a sequence of real numbers. Define forward difference operators by , ,

(2.9)

We say a finite subset is a subset of interpolatory type if for any real numbers there is a such that , This kind of subsets may be found from [45, 46].

Let be the set of all sequence for which and the set of all sequence for which

Let be a real number, Then, we say if there is a function such that

(2.10)

We now give the results of this paper.

Theorem 2.1.

If there is a constant depending only on such that is a subset of interpolatory type and a -covering of satisfying with and being a given positive integer. is an integer. is a real number such that there is and , satisfies and . is the reproducing kernel space reproduced by and the kernel (2.6). . Then there is a constant depending only on and and a function with and a constant such that

(2.11)
(2.12)

The functions satisfying the conditions of Theorem 2.1 may be found in [39, Page 357].

Corollary 2.2.

Under the conditions of Theorem 2.1. If , then

(2.13)

Corollary 2.2 shows that the convergence rate of the -functional (1.8) is controlled by the smoothness of both the reproducing kernels and the approximated function .

Theorem 2.3.

If there is a constant depending only on such that is a subset of interpolatory type and a -covering of satisfying with and being a given positive integer. is the reproducing kernel space reproducing by and the kernel (2.6) with satisfying and Then, for and there holds

(2.14)

where

3. Some Lemmas

To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss integral formula and Marcinkiewicz inequalities.

Lemma 3.1 (see [4750]).

There exist constants depending only on such that for any positive integer and any -covering of satisfying , there exists a set of real numbers , such that

(3.1)

for any and for

(3.2)

where the constants of equivalence depending only on , , , and when is small. Here one employs the slight abuse of notation that

The second lemma we shall use is the Nikolskii inequality for the spherical harmonics.

Lemma 3.2 (see [38, 45, 49, 51, 52]).

If , , then one has the following Nikolskii inequality:

(3.3)

where the constant depends only on .

We now restate the general approximation frame of the Cesàro means and de la Vallée Poussin means provided by Dai and Ditzian (see [53]).

Lemma 3.3.

Let be a positive measure on . is a sequence of finite-dimensional spaces satisfying the following:

(I).

(II) is orthogonal to (in ) when

(III) is dense in for all .

(IV) is the collection of the constants.

The Cesàro means of is given by

(3.4)

for , where

(3.5)

and is an orthogonal base of in One sets,for a given , and if there exists such that

Let be defined as for and for and is a nonegative and nonincrease function. are the de la Vallée Poussin means defined as

(3.6)

Then, If for some , , then, and

(3.7)

Lemma 3.3 makes the following Lemma 3.4.

Lemma 3.4.

Let be the function defined as in Lemma 3.3. Define two kinds of operators, respectively, by

(3.8)

Then, for any and for any . Moreover,

(3.9)
(3.10)

where for one defines

(3.11)

Proof.

By [54, Lemma ] we know for some . Hence, (3.9) holds by (3.7). By [19, Theorem ] we know for Hence, (3.10) holds by (3.7).

Let be a finite set. Then we call an M-Z quadrature measure of order if (3.1) and (3.2) hold for By this definition one knows the finite set in Lemma 3.1 is an M-Z quadrature measure of order .

Define an operator as

(3.12)

Then, we have the following results.

Lemma 3.5 (see [39]).

For a given integer let be an M-Z quadrature measure of order , , an integer, , , where satisfies which satisfies if and if . defined in Lemma 3.3 is a nonnegative and non-increasing function. Let satisfy . Then, for , , where consists of for which the derivative of order ; that is, , belongs to . Then, there is an operator such that

(i)(see [39, Proposition , (b)]). for

(3.13)
(3.14)

where

(ii)(see [39, Theorem ]). Moreover, if one adds an assumption that then, there are constants and such that

(3.15)

and for

(3.16)

Lemma 3.6 (see e.g., [29, Page 230]).

Let . Then,

(3.17)

Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials

Lemma 3.7.

For the generalized Legendre polynomials one has

(3.18)

Proof.

It may be obtained by (2.2).

Lemma 3.8.

Let satisfy (2.7) for and . is a finite set satisfying the conditions of Theorem 2.1. Then, there is a constant depending only on such that

(3.19)

Proof.

Define a matrix by , where with and Then,

(3.20)

By the Parseval equality we have

(3.21)

Let satisfy , . Then, by (3.1)

(3.22)

Hence, . On the other hand, since , , we have for any that

(3.23)

It follows for that

(3.24)

Define . Then, (3.24), (3.10), the Cauchy inequality, and the fact make

(3.25)

It follows that

(3.26)

Equation (3.2) thus holds.

4. Proof of the Main Results

We now show Theorems 2.1 and 2.3, respectively.

Proof of Theorem 2.1.

Lemma in [39] gave the following results.

Let,, be an integer, and a sequence of real numbers such . Then, there exists such that ,

Since and we have a such that Hence, and

(4.1)

and for there holds for that

(4.2)

It follows for that

(4.3)

On the other hand, since

(4.4)

where for , we have by (4.3)

(4.5)

Hence, above equation and (3.1)-(3.2) makes

(4.6)

where , Define

(4.7)

Then, we know and by (3.9)

(4.8)

where

(4.9)

It follows by (3.9) that

(4.10)

On the other hand, by the definition of and (3.14) we have for that

(4.11)

where denotes the operator of Lemma 3.5 for Hence,

(4.12)

Equation (3.2) and the definition of make

(4.13)

The Hölder inequality, the of Lemma 3.5, and the fact that make . Therefore,

(4.14)

Take then

(4.15)

Equations (3.2), (3.17), (3.16), and the Cauchy inequality make

(4.16)

Let be the Gamma function. Then, it is well known that Therefore,

(4.17)

Hence,

(4.18)

Equations (4.14) and (4.4) make

(4.19)

and hence

(4.20)

Since , we have (2.11) by (4.20). Equation (2.12) follows by (4.3), (4.4), and (3.19).

Proof of Corollary 2.2.

By (2.11)-(2.12) one has

(4.21)

Proof of Theorem 2.3.

Take the place of in Lemma 3.5 with denote still by the operator in Lemma 3.5 with and

(4.22)

then, and by (3.15) In this case,

(4.23)

Since is a spherical harmonics of order , we know by of Lemma 3.5 that are also spherical harmonics of order Then, (3.2), of Lemma 3.5, (3.3), and (3.16) make

(4.24)

Hence, (3.19) and above equation make . Equation (2.14) follows by (3.15).