1 Introduction

Objects’recognition is one of the most active areas of research with crucial applications in diverse fields. Object representation is a primordial step to systematically allocate mathematical structures to shapes in order to ease the implementation and analysis of the relevant classification algorithms. Objects may be described according to main parameters: colors, textures, shapes, movements and locations. However, shape remains a critical feature for object recognition as commonly accepted by the computer vision community. The need for consistent structures that help handle the rich repertoire of shapes has elicited the investigation of several researchers. Among the proposed representations are: continuous contour parametrization [1], medial [2], active model [3], and algebraic representations [4]. Once a particular representation is defined, a classifier is either trained during a learning phase through probability models and preselected shape sets, or designed through a clustering technique. Another type of invariant shape representation has been proposed, the Fourier descriptors along with moment invariants. These are among the main techniques, yet are only based on continuous regions and contours [5]. In 1977, D.G Kendall proposed a new invariant representation where he identified a shape with the geometrical information that remains after filtering out the location, scale and rotation effects from initial configuration matrices that capture objects’shapes through landmarks. The elimination of irrelevant effects generated a non-linear space. Consequently, the application of various clustering and classification methods is not appropriate. The Riemannian metric for this space refers only to whether two shapes are identical (different only in translation, rotation and scale) or not, yet in many cases we need to measure shape similarity. An appropriate metric for shape classification should not only suit certain invariance properties but also satisfy the different input properties. This study tries to devise a robust classifier through the combination of an efficient shape representation emerging within the framework of the Kendall’s theory with the Bayesian approach renowned for its rigorous theoretical foundation and optimal classification results. We based our choice of the Kendall representation upon the firm belief that its adequate implementation yields computational effectiveness. Indeed, representation-based-matrices are easy for calculate and are much less time consuming compared to those using more complex structures. Our main contribution is the adaptation of the Gaussian Bayes classifier in Kendall space and its local approximation through projections onto tangent spaces. Section 2 of this paper exposes an overview of some works related to Kendall space. The proposed method is presented in Sect. 3. Experiments are presented and discussed in Sect. 4. Finally, we conclude the work and suggest proposals in Sect. 5.

2 Previous Works on Kendall Space

The study of shapes dates back to D’Arcy Thompson, but the first more systematic algorithmic treatment of shape representations and metrics is due to Bookstein and Kendall. They represented shape by a collection of ordered landmark points invariant to Euclidean similarity transformations. Two objects have the same shape if they can be translated, scaled and rotated to each other so that they match exactly. This standpoint has led to the foundation of the Kendall shape theory [6, 7] which is a most popular and widely discrete used shape representation. Number of works elaborated by shape theories experts like Bookstein [8], Dryden [9], have adopted Kendall definition to get finite dimensional spaces from landmarks coordinates. The readers are referred to [10, 11] for a thorough view of the recent developments of Kendall theory. The solution to resolve the problem of non-conforming between Kendall space and the classical linear algorithm of classification was proposed by Jayasumana et al. in [12] and is to perform a mapping operation from the manifold to the Hilbert space using a kernel function. Which produces a richer representation of the data, and makes tasks such as classification easier. However, only positive definite kernels yield a mapping to a Hilbert space and a poor choice of kernel can often result in reduced classification performance. Another idea to adapt an unsupervised learning algorithm, k-means, to Kendall space is to integrate the Procrustes mean [13]. Here we focus on adapting the Bayesian approach to Kendall space through a linear tangent space.

3 Our Approach of Supervised Learning in Kendall Space

3.1 Landmarks Selection

At the outset we fixed the value of the number \(N_{c}\) of the targeted shape classes \(\omega _{i}\), for \(1\le i\le N_{c}\), since we are working within a supervised context. Then, we generated for each \(\omega _{i}\) amongst the \(N_{c}\) classes a set \(\left\{ X_{j}^{i\star }\right\} _{1\le j\le N_{ps}}\) comprising \(N_{ps}\) initial configurations of learning samples. Here, we gathered in each matrix \(X_{j}^{i\star }\) the vector column coordinates of some fixed number of labelled landmarks, which are selected to capture the shape of the jth sample from the class \(\omega _{i}\). More accurately, the selection consists of picking up k landmarks from the contours. We noticed that some of the benchmarks also provide a set of landmarks selected by experts, unless said landmarks were not valid, in that case we performed the selection task manually based on templates assigned to each class.

3.2 Learning Shapes Processing

Let \({X_{j}^{i}}^{\star }= \left( \begin{array}{cccc} x_{0}^{\star } &{} x_{1}^{\star } &{} \ldots &{} x_{k-1}^{\star } \\ \end{array} \right) \) be one of the learning samples in \(\{{X_{j}^{i}}^{\star }\}_{1\le j\le N_{ps}}\) for the class \(\omega _{i}\) (Cf. Sect. 3.1). We started by calculating the center of \(X_{j}^{i\star }\) to remove the translation effect from \(X_{j}^{i\star }\), by moving \(x_{c}^{\star }\) to the origin of the coordinates system, when each \(x_{\ell }^{\star }\) is substituted with \(x_{\ell }^{\star }-x_{c}^{\star }\), for all \(0\le \ell \le k-1\), respectively.

$$\begin{aligned} x_{c}^{\star } = \frac{1}{k}\sum _{\ell =0}^{k-1}x_{\ell }^{\star } \end{aligned}$$
(1)

Next, we performed a dimension reduction via a right multiplication of the last version of \(X_{j}^{i\star }\) by the recentering orthogonal matrix Q,

$$\begin{aligned} Q=\left\{ \begin{array}{ll} Q_{\ell 1}=\frac{1}{\sqrt{k}}, &{} 1\le \ell \le k ; \\ Q_{\ell \ell }=\frac{\ell -1}{\sqrt{\ell (\ell -1)}} , &{} 2\le \ell \le k; \\ Q_{\ell h}=-\frac{1}{\sqrt{\ell (\ell -1)}}, &{} 1\le \ell \le h-1,2\le h\le k; \\ Q_{\ell h}=0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$
(2)

to get an intermediate representation matrix \({\tilde{X}}_{j}^{i}\) with general form \(\left( \begin{array}{ccccc} 0&{}{\tilde{x}}_{1} &{} {\tilde{x}}_{2} &{} \ldots &{} {\tilde{x}}_{k-1} \\ \end{array} \right) \) which we reduced naturally to

$$\begin{aligned} {\tilde{X}}_{j}^{i}=\left( \begin{array}{cccc} {\tilde{x}}_{1} &{} {\tilde{x}}_{2} &{} \ldots &{} {\tilde{x}}_{k-1} \\ \end{array} \right) \end{aligned}$$
(3)

After that, we eliminated the scaling effect through normalization to get

$$\begin{aligned} X_{j}^{i}=\frac{1}{\sqrt{tr({\tilde{X}}_{j}^{i}({\tilde{X}}_{j}^{i})^{t})}} \end{aligned}$$
(4)

The set \(\left\{ X_{j}^{i}\right\} _{1\le j\le N_{ps}}\) coincides with points on the \(2\left( k-1\right) -1\) dimensional unit sphere, denoted by \(S_{2}^{k}\). We notice that any matrix \(TX_{j}^{i}\), for \(T\in \mathbf {SO}\left( 2\right) \), has the same shape as \(X_{j}^{i}\). So, from now on, \(X_{j}^{i}\) is treated as a pre-shape representation and the sought shape denoted by \(\pi \left( X_{j}^{i}\right) \) identifies an equivalence class modulo the left action of rotations T in \(\mathbf {SO}\left( 2\right) \) on the pre-shape \(X_{j}^{i}\); the space of all possible shapes is the \(2\left( k-1\right) -2\) dimensional Kendall space, which is the quotient space.

$$\begin{aligned} \varSigma _{2}^{k} = {\mathcal {S}}_{2}^{k}\big /\mathbf {SO}\left( 2\right) \end{aligned}$$
(5)

The pseudo-singular values decomposition helps to write any pre-shape \(X_{j}^{i}\) as a three-factors product,

$$\begin{aligned} X_{j}^{i} = U(\begin{array}{cc}\varLambda&0 \end{array})V\end{aligned}$$
(6)

where \(U\in \mathbf {SO}\left( 2\right) \), \(V\in \mathbf {SO}\left( k-1\right) \), 0 is the null matrix of dimensions \(2\times \left( k-3\right) \), and \(\varLambda \) is the \(2\times 2\) diagonal matrix \( diag\{\lambda _{1},\lambda _{2}\}\) such that \(\lambda _{1}\ge |\lambda _{2}|\), \(\lambda _{1}^{2}+\lambda _{2}^{2}=1\), and \(\lambda _{2}\ge 0\) unless \(k=3\). This decomposition provides a systematic way to decide whether or not any couple of learning pre-shapes belong to the same equivalence class. In order to quotient out the left acting orthogonal matrices to obtain the learning shapes in \(\varSigma _{2}^{k}\) (5), we calculated \(U_{X_{j_{1}}^{i}}(\begin{array}{cc}\varLambda _{X_{j_{1}}^{i}}&0 \end{array})V_{X_{j_{1}}^{i}}\) and \(U_{X_{j_{2}}^{i}}(\begin{array}{cc}\varLambda _{X_{j_{2}}^{i}}&0 \end{array})V_{X_{j_{2}}^{i}}\) of each couple of pre-shapes \(X_{j_{1}}^{i}\) and \(X_{j_{2}}^{i}\) in \(\{ X_{j}^{i}\} _{1\le j\le N_{ps}}\) (6). Then, we decided that \(\pi (X_{j_{1}}^{i})\) and \(\pi (X_{j_{2}}^{i})\) are identical if and only if \(\varLambda _{X_{j_{1}}^{i}}=\varLambda _{X_{j_{2}}^{i}}\) and both first rows of \(V_{X_{j_{1}}^{i}}\) and \(V_{X_{j_{2}}^{i}}\) are exactly the same, the remaining \(k-3\) rows of \(V_{X_{j_{1}}^{i}}\) and \(V_{X_{j_{2}}^{i}}\) do not matter since they are multiplied by the null matrix of dimensions \(2\times \left( k-3\right) \) appearing in \((\begin{array}{cc}\varLambda _{X_{j_{1}}^{i}}&0\end{array})\) and \((\begin{array}{cc}\varLambda _{X_{j_{2}}^{i}}&0\end{array})\), respectively. Naturally, we did not care about the left acting orthogonal matrices \(U_{X_{j_{1}}^{i}}\) and \(U_{X_{j_{2}}^{i}}\) because they do not affect shapes. This way, we succeeded in regrouping the \(N_{ps}\) learning pre-shapes of \(\{ X_{j}^{i}\}_{1\le j\le N_{ps}}\) into \(N_{s}^{i}\) learning equivalence classes. Concretely, for all \(1\le p\le N_{s}^{i}\), the pth equivalence class is represented by any candidate denoted by \(\pi (X_{p}^{i})\), and qualified as the learning shape.

3.3 Likelihood for Parameters Inference

We describe here, our approach of learning procedure to estimate the expectation vector \(\mu _{i}\) as well as the covariance matrix \(\varSigma _{i}\) of the Gaussian likelihood. We assume that the learning samples in each set \(\left\{ \pi \left( X_{p}^{i}\right) \right\} _{1\le p\le N_{s}^{i}}\) are independent and identically distributed according to the likelihood of each \(\omega _{i}\), respectively. In a first step, we calculated the mean shape \(\pi \left( {\hat{\nu }}_{i}\right) \) of the set of learning shapes \(\left\{ \pi \left( X_{p}^{i}\right) \right\} _{1\le p\le N_{s}^{i}}\) which we used later as a reference shape where we approximated \(\varSigma _{2}^{k}\) locally by its tangent space [14]. Specifically, we looked for \(\pi \left( {\hat{\nu }}_{i}\right) \) as a solution of the minimization problem

$$\begin{aligned} \arg \inf ^{ }_{\pi (\nu )\in \varSigma _{2}^{k}}\sum _{p=1}^{N_{s}^{i}}d_{F}^{2}\left( \pi \left( X_{p}^{i}\right) ,\pi \left( \nu \right) \right) \end{aligned}$$
(7)

which involves the procrustes distance between \(\pi \left( X_{p}^{i}\right) \) and \(\pi \left( \nu \right) \)

$$\begin{aligned} d_{F}^{2}\left( \pi \left( X_{p}^{i}\right) ,\pi \left( \nu \right) \right) = \sin \left( \rho \left( \pi \left( X_{p}^{i}\right) ,\pi \left( \nu \right) \right) \right) \end{aligned}$$
(8)

Here, \(\rho \) is the distance function defined for \(\varSigma _{2}^{k}\) as

$$\begin{aligned} \rho \left( \pi \left( X_{p}^{i}\right) ,\pi \left( \nu \right) \right) = \arccos \left( \lambda _{1}+\lambda _{2}\right) \end{aligned}$$
(9)

with the pseudo-singular values \(\lambda _{1}\) and \(\lambda _{2}\) of \(X_{p}^{i}\nu ^{t}\) for arbitrary pre-shapes \(X_{p}^{i}\) and \(\nu \) of \(\pi \left( X_{p}^{i}\right) \) and \(\pi \left( \nu \right) \), respectively. In a second step, we mapped each learning shape \(\pi \left( X_{p}^{i}\right) \) in \(\left\{ \pi \left( X_{p}^{i}\right) \right\} _{1\le p\le N_{s}^{i}}\) onto its projection \({\bar{\pi }}_{i}\left( X_{p}^{i}\right) \) computed as

$$\begin{aligned} {\bar{\pi }}_{i}\left( X_{p}^{i}\right) = \left( I_{m}-\pi \left( {\hat{\nu }}_{i}\right) \pi \left( {\hat{\nu }}_{i}\right) ^{t}\right) \pi \left( X_{p}^{i}\right) ,\end{aligned}$$
(10)

In practice, we computed \(\pi \left( {\hat{\nu }}_{i}\right) \) and \({\bar{\pi }}_{i}\left( X_{p}^{i}\right) \) for all \(1\le i\le N_{c}\) and \(1\le p\le N_{s}^{i}\) using the generalised procrustes analysis [15]. We emphasize here on two important features of the last function: first, we get always the same mean shape \(\pi \left( {\hat{\nu }}_{i}\right) \) even if we use left rotated versions of the actual shapes \(\pi \left( X_{p}^{i}\right) \), for all \(1\le i\le N_{c}\), respectively. Second, all the left rotated versions of the shape \(\pi \left( X_{p}^{i}\right) \) project always to the same point of the tangent space to \(\varSigma _{2}^{k}\) at \(\pi \left( {\hat{\nu }}_{i}\right) \). In a last step, we reshaped each one of the matrices \({\bar{\pi }}_{i}\left( X_{p}^{i}\right) \) onto a row vector \({\bar{\pi }}_{i}^{v}\left( X_{p}^{i}\right) \). Then, we profited from the maximum likelihood method to get likelihood parameters

$$\begin{aligned} \mu _{i} = \frac{1}{N_{s}^{i}}\sum _{p=1}^{N_{s}^{i}}{\bar{\pi }}_{i}^{v}\left( X_{p}^{i}\right) ,\end{aligned}$$
(11)
$$\begin{aligned} \varSigma _{i} = \frac{1}{N_{s}^{i}}\sum _{p=1}^{N_{s}^{i}}\left( {\bar{\pi }}_{i}^{v}\left( X_{p}^{i}\right) -\mu _{i}\right) ^{t}\left( {\bar{\pi }}_{i}^{v}\left( X_{p}^{i}\right) -\mu _{i}\right) ,\end{aligned}$$
(12)

Both parameters dimensions are \(1\times 2\left( k-1\right) \) and \(2\left( k-1\right) \times 2\left( k-1\right) \), respectively.

3.4 Generalization

In the present subsection we detail how we conducted the classification task of general shapes. After obtaining the likelihood phase (Cf. Sect. 3.3), we used the values of the Gaussian likelihood \(\mu _{i}\) and \(\varSigma _{i}\) of each class \(\omega _{i}\) to compute the posteriori probabilities

$$\begin{aligned} p\left( \omega _{i}|\pi \left( X\right) \right) \!=\! \frac{p\left( {\bar{\pi }}_{i}^{v}\left( X\right) |\omega _{i}\right) P\left( \omega _{i}\right) }{p\left( {\bar{\pi }}_{i}^{v}\left( X\right) \right) }\end{aligned}$$
(13)

where

$$\begin{aligned} p\left( {\bar{\pi }}_{i}^{v}\left( X\right) |\omega _{i}\right) \!=\!\frac{1}{\left( 2\pi \right) {}^{\left( k-1\right) }\sqrt{\det \left( \varSigma _{i}\right) }}\exp \left( -\frac{1}{2}\left( {\bar{\pi }}_{i}^{v}\left( X\right) -\mu _{i}\right) \varSigma _{i}^{-1}\left( {\bar{\pi }}_{i}^{v}\left( X\right) -\mu _{i}\right) ^{t}\right) ,\end{aligned}$$
(14)

\(P\left( \omega _{i}\right) \) is an a priori probability of \(\omega _{i}\), and \(p\left( {\bar{\pi }}_{i}^{v}\left( X\right) \right) \) is the evidence term. Finally, the maximum amongst the values of \(p\left( \omega _{i}|\pi \left( X\right) \right) \) indicates the class of \(\pi \left( X\right) \). Here again, we draw the attention of the readers that if we represent the shape \(\pi \left( X\right) \) by any TX where \(T\in \mathbf {SO}\left( 2\right) \), then we still have exactly the same likelihood value in (14), because all such TX matrices project always through (10) to the same \({\bar{\pi }}_{i}\left( X\right) \) in the tangent space to \(\varSigma _{2}^{k}\) at \(\pi \left( {\hat{\nu }}_{i}\right) \); that is to remind that our a posteriori probability distribution does depend on shapes rather than pre-shapes.

4 Experiments and Results Analysis

We ran several experiments for the purpose of evaluating the behavior of our classifier on \(\varSigma _{2}^{k}\) (see Sect. 3 above). We used mainly four 2D shape benchmarks namely, MPEG-7, Swedish leaves [14], great apes data, and T2 mouse vertebrae data [15]. Our classifier faced an important challenge since the handled shapes correspond to domestic objects, leaves, skulls or even mouse vertebrae. The MPEG-7 offers 70 classes and 20 images per class, yet in our experiments we only used 60 classes. Figure 1 contains some annotated samples from MPEG-7. We assigned 10 images per class to the learning phase, and the rest for generalization. We will now demonstrate the benefits of our Classifier on the problem of leaf identification. We employed the Swedish leaves dataset, which contains 15 different classes with 75 images per class. We used 35 images for training and the rest for testing. Some annotated samples are shown in Fig. 2. The apes database contains the skulls of 167 specimens of great apes, with different species and both sexes: chimpanzee (26 females and 28 males), gorilla (30 females and 29 males) and orangutan (30 females and 30 males). We used 15 samples of each class for the learning phase, the rest were used for the test. Finally, the T2 mouse vertebrae database contains the second thoracic vertebra of three groups of mice: large (23 samples), small (23 samples) and control group (30 samples). We used 13 samples for each group for the learning stage and the reset for generalization.

In Table 1 we summarize the outcomes of the experiments on the aforementioned benchmarks, where the numbers of landmarks are 12 and 27 for MPEG-7 and Swedish leaves respectively, which we selected since they are not available as shown in the table. For the case of apes and T2 mouse vertebrae, the landmarks have been provided by an expert. We end the current subsection by comparing the results of our classifier in \(\varSigma _{2}^{k}\) to those of the Gaussian Bayes classifier in Euclidean space and to the kernel SVM classifiers in Hilbert space. It should be noted that the result of the last classifier is derived from this paper [12]. We gathered the results of generalization of all these classifiers in Table 2 where it is clear that our classifier outperforms the classical one, proof of its robustness.

Fig. 1.
figure 1

Samples of hammer, bell, bone, heart, bottle, and apple shapes from MPEG-7. The red dots indicate the selected landmarks for each shape sample. (Color figure online)

Fig. 2.
figure 2

Swedish leaves dataset samples.

We expect that the improvement in classification results comes from the nonexistence of redundancy within the learning sets \(\left\{ \pi \left( X_{p}^{i}\right) \right\} _{1\le p\le N_{s}^{i}}\) used by our classifier in Kendall space, compared to the sets \(\left\{ X_{j}^{i\star }\right\} _{1\le j\le N_{ps}}\) used by the classical classifier. Here, the nonexistence of the redundancy stems straightforwardly from the systematic elimination of translation, scale, and rotation effects during the construction of learning shapes. Besides, the complexity of our classifier in Kendall space amounts to an order determined mainly by the \(2\left( k-1\right) {\displaystyle \sum _{i=1}^{N_{c}}}N_{s}^{i}\) coordinates values involved in learning phase, which should be much smaller than the \(2k{\displaystyle N_{c}}N_{ps}\) coordinates values involved in the learning phase of the classical classifier since \(N_{s}^{i}\ll N_{ps}\), for all \(1\le i\le N_{c}\), when the initial learning data sets get larger.

Table 1. Computer simulations summary
Table 2. Comparison between our classifier and two other supervised classifiers

5 Conclusion and Future Works

In the present paper we proposed an approach of supervised learning for shape classification in \(\varSigma _{2}^{k}\). The supervised learning has concerned the inference of the expectation vectors and covariance matrices of the Gaussian distributions. The inference here has been realized in the context of maximum likelihood, using the available learning shapes. We detailed the procedure that helped us to construct a learning shape set from initial configurations. Then, we used the Riemannian structure to specify the mean shape where the \(\varSigma _{2}^{k}\) is approximated by its tangent space. Consequently, we succeeded in establishing a likelihood density function on \(\varSigma _{2}^{k}\). The results of the experiments has confirmed the robustness of our model where the success rates are outstanding. The major drawback of our approach comes from the landmarks selection which is at best a semi-automatic process where the selection remains subjective because the human supervision is necessary. In our future work, we propose to use a deformable templates for automatic detection of landmarks. We will also further the analysis of the abilities of our classifier through considering noisy or corrupt samples.