Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

9.1 Motivations

Currently, only a very few advanced techniques on shoeprint image noise reduction, robust thresholding (segmentation) and pattern descriptors have been proposed for use in shoeprint image matching and retrieval. However, a number of existing and elegant techniques of pattern (shape) descriptors have been ruled out due to the difficulty of the segmentation, i.e. to separate shoe mark profiles from backgrounds, and to further separate patterns in a profile from each other.

Local image features are computed from distinctive local regions and do not require a priori segmentation. They have proved to be very successful in applications such as image retrieval and matching [6, 9, 15, 22], object recognition and classification [5, 12, 14, 18, 21] and wide baseline matching [24]. Consequently, many different scale and affine invariant local feature detectors, robust local feature descriptors and their evaluations have been widely investigated in the literature [1, 2, 4, 8, 12, 13, 16, 17, 19, 20, 22].

This chapter is concerned with the retrieval of scene-of-crime (or scene) shoeprint images from a reference database of shoeprint images by using a new local feature detector and an improved local feature descriptor. Similar to most other local feature representations, the proposed approach can also be divided into two stages: (i) a set of distinctive local features is selected by first detecting scale adaptive Harris corners where each corner is associated with a scale factor. This allows for the selection of the final features whose scale matches the scale of blob-like structures around them and (ii) for each feature, an improved Scale Invariant Feature Transform (SIFT) descriptor is computed to represent it. Our investigation has led to the development of two novel methods which are referred to as the Modified Harris–Laplace (MHL) detector and the Modified SIFT descriptor, respectively.

9.2 Local Image Features

K. Mikolajczyk et al. [20] have given a detailed comparison of six state-of-the-art local feature detectors and have concluded that all detectors involved were complementary, and if combined, the performance can be improved. They have also stated that the Harris–Affine [17] and Hessian–Affine [20] detectors provide more features than other detectors. This can be particularly useful when matching scenes with occlusion and clutter, though the Maximally Stable Extremal Region (MSER) detector [13] achieves the highest score in many cases in terms of repeatability. In our case, an affine invariant local feature refers only to the translation-, rotation- and scale-invariant (covariant) local regions. We shall not consider other general affine or even perspective invariant cases, which rarely happen in the case of shoeprint images. Furthermore, K. Mikolajczyk and Schmid in [16, 19] have evaluated the performance of ten state-of-the-art local feature descriptors in the presence of real geometric and photometric transformations. They have claimed that an extension of the SIFT descriptor [11, 12], called Gradient Location and Orientation Histogram (GLOH) [20] performs slightly better than the SIFT itself while both outperform the other descriptors. The authors have also suggested that local feature detectors, such as Hessain–Affine and Hessian–Laplace, which mainly detect the blob-like structures, can only perform well with larger neighbourhoods. However, this conflicts with one of the basic properties of local image features – the locality.

Typically, a local image feature should have four properties: locality, repeatability, distinctiveness and robustness against different degradations. The above studies suggest that none of the current local feature representation can outperform others in terms of all the above four properties. Therefore an efficient local feature representation should be a trade-off of these properties. The work described in this chapter aims to firstly detect a set of distinctive local features from an image by combining a scale adaptive Harris corner detector with an automatic Laplace-based scale selection. Here, the location of the features is determined by the scale adaptive Harris corner detector where the characteristic size of a local feature depends on the scale of the blob-like structure around this corner, which is determined by the automatic Laplace-based scale selection. Then, for each local feature, an improved SIFT descriptor is computed to represent the feature. This descriptor actually further enhances the GLOH method by using a circle binary template for rotation invariance, and by binning the SIFT histogram into a range of 180° rather than the original 360° for complement image robustness. Finally, the matching of the descriptors is carried out by combining nearest neighbour measure with threshold-based screening; two descriptors match only if one is the nearest neighbour of the other (i.e. they are distant by a value smaller than a threshold). Then, the distance between two shoeprint images is computed from the matched pairs.

9.2.1 New Local Feature Detector: Modified Harris–Laplace Detector

A local feature here refers to any point and its neighbourhood in an image where the signal changes significantly in two dimensions. Conventional “corners”, such as L-corners, T-junctions and Y-junctions satisfy this, but so do isolated points, the endings of branches and any location with significant 2D texture. Also, all of these local structures have a characteristic size. Mikolajczyk and Schmid [15] have extended the Harris corner detector to a multiscale form to detect the corners at different scales [7]. In an earlier work [10], Lindeberg has presented in detail a feature detector with an automatic scale selection where a Laplace of Gaussians (LoG) transform has been demonstrated to be successful in scale selection of blob-like structures. Likewise in [15], the authors have proposed a new Harris–Laplace detector by exploiting (i) the high accuracy of location of a Harris corner detector and (ii) the robust scale selection of the LoG detector. However, the way in which they are combined does not necessarily result in an accurately located and stable scaled local feature detector, since the detector is actually required to determine when the response of the Harris measure reaches a maximum in the spatial domain and so does the response of the LoG at the same location but in the scale direction. In most cases, the unstable component of such a detector is related to the scale selection since the stability of a scale selection based on LoG is conditional upon this measure being computed at the centre of a blob structure, rather than at locations with the Harris maxima. In this section, we will propose a solution to this problem. Following the name of the Harris–Laplace detector, we call this detector a Modified Harris–Laplace detector.

9.2.1.1 Modified Harris–Laplace (MHL) Detector

A scale adaptive Harris detector is based on an extension of the second moment matrix of Eq. (9.1), where, \(\sigma_{i}\), \(\sigma_{d}\) and \(\boldsymbol{f}_{\alpha}\) are the integration scale, the differentiation scale and the derivative computed in the direction of \(\alpha\), respectively [15]. The strength of a scale adaptive corner can be measured by \(\det (\boldsymbol{A}(\boldsymbol{x},\sigma _i ,\sigma _d )) -\break \kappa \cdot \boldsymbol{trace}^2 (\boldsymbol{A}(x,\sigma _i ,\sigma _d )).\)

$$ \boldsymbol{A}(x,\sigma _i ,\sigma _d ) = \sigma _d^2 \cdot \boldsymbol{g}(\sigma _i ) * {\begin{bmatrix} {\boldsymbol{f}_x ^2 (x,\sigma _d )} & {\boldsymbol{f}_x \boldsymbol{f}_y (x,\sigma _d )} \\ {\boldsymbol{f}_y \boldsymbol{f}_x (x,\sigma _d )} & {\boldsymbol{f}_y ^2 (x,\sigma _d )} \\ \end{bmatrix}} $$
((9.1))
$$\boldsymbol{LoG}(x,\sigma ) = \sigma ^2 \left| {\boldsymbol{L}_{xx} (x,\sigma ) + \boldsymbol{L}_{yy} (x,\sigma )} \right|$$
((9.2))

An example of a scale-adaptive Harris corner detection on a synthetic image is shown in Fig. 9.1, where the figure suggests that for one corner, there might be a series of points detected by this approach depending on the diffusion and the spatial extent of the corner. Obviously, only a few of them (normally one or two) represent the characteristic size of this corner. Here, a LoG-based automatic scale selection can be applied to remove these redundant points which are actually the drift of the same corner due to scale-space smoothing. The normalised LoG response of an image is defined by Eq. (9.2), where \(\sigma\) is the scale and \(\boldsymbol{L}_{\alpha \alpha }\) is the second derivative computed in the direction of \(\alpha\). The principle of this technique stems from the fact that the characteristic scale of a local structure very often corresponds to the local extremum of the LoG along the scale direction at its centre (see the right curves of Fig. 9.2). However, it must be noted that, in most cases, this principle does not work well for structures like corners. Figure 9.2 shows the LoG responses at the centre of a blob structure (red cross) and a corner (blue cross) over the scales on two synthetic \(128 \times 128\) images, where the sizes of white squares for (a) and (b) are \(11 \times 11\) and \(21 \times 21\), respectively. The maxima of the red curves clearly reflect the scales of the white square in (a) and (b). The figure also illustrates why the scale selection of the Harris–Laplace detector is unstable, i.e. the middle curves (blue) have too many extrema, thus leading to redundant and unstable scales.

Fig. 9.1
figure 9_1_151161_1_En

Example of adaptive Harris corner detection, and LoG-based scale selection

Fig. 9.2
figure 9_2_151161_1_En

The responses of the LoG measures at different locations along the scale direction, where the x-axis denotes the scale direction. The middle curves correspond to the top-left corner of the white square, and the right curves correspond to the centre of the white square

It is also noted from Fig. 9.1 that the scale of a blob structure (red circle) selected by LoG can be related to the scale of the corners around the structure of this blob. Actually, in most cases, it is reasonable to assume that a corner can be associated with a blob structure around this corner. Based on this assumption, only those candidate corners whose scale has a predefined relationship with the scale of a blob structure around them can be selected as a characteristic scale of that corner. There are two factors which should be considered for this strategy: (i) the search region and (ii) the relationship between the scale of the blob structure and that of the corner. Figure 9.3 illustrates this strategy, where the red solid circle (\(\boldsymbol{radius} = \boldsymbol{r}\)) denotes a blob structure while the red dashed circles denote the search region with the radius of \(\boldsymbol{r}_1\) and the reference circle with the radius of \(\boldsymbol{r}_0\). The green circles are the candidates of the same corner located at the top-left of the square, and the blue circle represents the selected characteristic scale of this corner. (here only the candidate scale whose value is nearest to the reference radius (\(\boldsymbol{r}_0\)) will be selected as the characteristic scale of the corner). In all of our experiments, Eq. (9.3) is applied to relate the reference scale \(\boldsymbol{r}_0\), the search region \(\boldsymbol{r}_1\) with the blob scale \(\boldsymbol{r}\).

$$ \sqrt 2 \cdot \boldsymbol{r}_0 = \boldsymbol{r} = {\frac{{\sqrt 2 }} {2}}\cdot \boldsymbol{r}_1 $$
((9.3))
Fig. 9.3
figure 9_3_151161_1_En

Illustration of automatic scale selection based on the scale of a blob-like structure

9.2.1.2 Repeatability Evaluation

The repeatability score of a local feature detector is given by computing the ratio between the number of correct matches and the smaller number of features detected in one of the images. A typical definition of repeatability has considered the overlap error, which is defined as the error of the corresponding features in terms of area [20]. Two features are claimed to correspond if the overlap error is less than a predefined threshold. Here, to put our detector into the context of the studies in this problem area, we apply the codes and the benchmark images from [5] to measure the repeatability of detectors, and compare the performance of our detector with three other similar detectors (Fig. 9.4), namely Harris Laplace (HarLap), Hessian Laplace (HesLap) and Harris Hessian Laplace (HarHesPal) detectors. Two sets of images (Boats and Bikes) have been tested in this experiment, each containing six images with either scale decreasing (zoom out) or blur increasing. For each set of image sequence, five repeatability scores have been computed between the first image and the reminders, and the results are shown in Fig. 9.4. It is worth noting that Harris and Hessian measures can detect two different structures (corner-like and blob-like). Therefore, by simply combining them one can build another detector, referred to as the Harris–Hessian–Laplace detector, which considers the most significant responses of the Harris and Hessian measures, since the spatial location of a feature, while the scale is still determined by LoG measure.

Fig. 9.4
figure 9_4_151161_1_En

The repeatability comparison of four detectors. (a) is on the images with scale and rotation changing. (b) is on the images with increasing blur. (Referring to the Boat and the Bike images from [5])

Figure 9.4 suggests that, in most cases, our proposed detector outperforms the other three in terms of repeatability. Here, it should be noted that we have limited the number of the raw features to under 400 by a universal significance measure of a feature (this measure is defined as the multiplication of the response of LoG and the area of the local region).

Figure 9.5 shows an example of image matching based on the proposed MHL detector. The local feature descriptor and the matching strategy used in this matching process will be detailed in the following sections. The main transformations between two images are a scale change (scale ratio = 2.8) and an in-plane rotation [20]. In this example, 23 out of 32 matches are correctly computed thus outperforming the Harris–Laplace detector (where only 6 out of 26 matches are correctly computed), provided that all other conditions are same.

Fig. 9.5
figure 9_5_151161_1_En

Matching result of two images, 23 out of 32 matches are correct. (Refer to the Boat images from [5])

9.2.2 Local Feature Descriptors

The local photometric descriptors computed in this work, as mentioned in the first section, are a further extension of the GLOH, originally from SIFT descriptors. Our method is different from GLOH in that

  1. (i)

    First, we apply a circular binary template on each normalised local region to increase the rotation invariance of the descriptor. Both SIFT and GLOH obtain the rotation robustness by weighting the local region with a Gaussian window. However, very often, when one chooses a larger sigma for the Gaussian kernel, the descriptors computed for the region are distinctive but rotation sensitive. On the other hand, when one chooses a smaller sigma for the Gaussian kernel, the descriptors are rotation robust but not distinctive. In most cases, it is hard to choose a proper sigma. In this work, we apply a binary template to limit the region to a circular one, and meanwhile use a larger sigma for the Gaussian window to keep the distinctiveness of the descriptors.

  2. (ii)

    Second, for complement image robustness, we bin the histogram with the orientation range of 180° rather than the original 360°. In our application of shoeprint image matching, the complement image robustness is very important, since often the query shoeprint image from scenes of crime is the complement of the shoeprint image in the reference database. Complement robustness can be easily obtained by binning the histogram with the orientation range of 180°, i.e. without considering the polarity of the gradients.

The construction of our local descriptors is similar to GLOH, i.e. we bin the gradients in a log-polar location grid with three bins in the radial direction and four bins in the angular direction (the central grid does not apply angular binning), resulting in a nine location grid. Noting the orientation range of 180°, four bins are applied in the gradient orientations. Finally, the descriptors of an image comprise a \(\boldsymbol{N} \times 36\) (\(4 \times 9\)) matrix, where N is the number of the local features detected in the working image.

9.2.3 Similarity Measure

The similarity between two images depends on the matching strategy of the local features. For the sake of retrieval speed, we apply the nearest neighbour and thresholding jointly to compute the distance between two images, i.e. for each descriptor in one image, the nearest neighbour in another image is found as a potential match, then only those matches whose distance is below a threshold are selected as the final matches. The similarity of two images is computed from the summation of \(\exp ( - \boldsymbol{d})\), where \(\boldsymbol{d}\) denotes the distance of a match. Of course, there are many other strategies for computing the similarity or matching score between two images. The example of image matching in Fig. 9.5 applies the nearest neighbour to obtain the initial matches, and then the RANSAC (Random Sample Consensus) algorithm is used to reject mismatches. RANSAC is a general algorithm for robustly fitting models in the presence of many data outliers. Here, the model is a \(3 \times 3\) fundamental matrix. The final matches/correspondences are shown in Fig. 9.5.

9.3 Experimental Results

9.3.1 Shoeprint Image Databases

The test databases of shoeprint images used in this work are based on images provided by Foster and Freeman Ltd (UK) [http://www.fosterfreeman.com] and Forensic Science Services (UK) [http://www.forensic.gov.uk]. A small selection of scene-of-crime shoeprint images was also supplied. These are of very poor quality, and it is not known if the shoe soles which made the scene impression are contained in the databases.

The starting point for the databases is a dataset of high-quality images collected from shoemakers, called dClean, consisting of 500 shoeprint images. dClean contains prints from shoes made by most shoe sellers, such as Nike, Adidas, Fila, etc. and these images are different from each other in terms of patterns (or structures).

With dClean as the base data set, we have generated a number of degraded data sets using different types of degradations which attempt to emulate some of the deformations which result in real scene images:

  • Gaussian noise shoeprint data set: dGaussian

    This data set, called dNoise, consists of 2500 noisy prints. Five different levels of Gaussian noise is added to each of the 500 shoeprints in the base data set. The noise level (\(\sigma\)) varies from 5 to 25 with a step of 5, considering the range of the grey level is from 0 to 255.

  • Partial shoeprint data set: dPartial

    This data set, called dPartial, is produced to emulate the fact that scene images often only contain part of a shoemark. It consists of 2500 partial prints – 5 partial shoemarks generated from each shoemark in dClean. To create a partial shoeprint image, the silhouette of a complete shoeprint is extracted first, and then two points on the silhouette perimeter are chosen to generate a cross-sectional line joining the two points. To avoid the end points being too close to each other (resulting in a meaningless partial print), a minimum ratio can be set to bind the length of the complete boundary. Then several random points around the line are selected as the samples of the partial boundary. (Figure 9.6 illustrates complete and partial boundaries.) With these samples, a spline interpolation is applied to produce the partial boundary. The pixels on one side of the curve are set to 1 or 0 thus producing a partial mask, which is then used to generate a partial shoeprint. Five partial shoeprint images are generated for each of shoeprint in the dClean data set. The percentage of the partial shoeprint which remains is varied from \(40\% \ {\rm to} \ 95\%\). An illustration of the partial shoeprint creation and one example are shown in Fig. 9.6.

  • Rescaled shoeprint data set: dRescale

    This data set, termed dRescale, consists of 2500 rescaled prints where each shoeprint in dClean has been rescaled with five random scale ratios in the range of 0.35–0.65. Here, we did not use a scale ratio larger than 1.0, because (i) up-sampling always has a similar influence to down-sampling on the scale robustness of an approach; and (ii) the original size of the shoeprint images in dClean is large enough, so any expansion will bring much trouble in the feature extraction computation.

  • Rotated shoeprint data set: dRotate

    This data set, called dRotate, is used to test algorithms for rotation invariance and consists of 2500 rotated prints. Each shoeprint in the base data set has been rotated with five random orientations in the range of 0°–90°. The selection of this range (rather than 0°–360°) is based on the fact that the algorithms developed in this thesis are flip invariant both in horizontal and vertical directions.

  • Scene shoeprint data set: dScene

    Unfortunately, the scene images from F&F Ltd. do not have corresponding reference images in the dClean data set. So to test the algorithm on shoemarks with some of the background and artefacts which occur in real scene images, we have simulated scene images by combining the actual scene images with the base data set dClean. For each image in the dClean, we randomly select one of our small set of actual scene images as the background, and superimpose the “clean” shoeprint on the background scene, resulting in 500 scene images, called dScene. (Figure 9.7 shows some examples of the resulting scene images.)

  • Complex scene shoeprint image data set: dComplexDegrade

    To simulate other kinds of real scene images with several degradations, a small set of 50 degraded images has been generated. The combined degradations, totalling 50 images, are produced as follows: (i) Scene background additions (complex) – the weights for a scene and a clean image are 0.7 and 0.3, respectively, (ii) Partial prints + scene background additions, (iii) Rotations + scene background additions, (iv) Rescales + scene background images and (v) Patterns + scene background images – the patterns are manually extracted from clean images (normally the size of a pattern is less than 30% of the total size of the print). The weights for the scene and the clean image (pattern) are set to 0.6 and 0.4 for all of the above combined degradations, respectively. Figure 9.8 shows some examples of the complex degraded shoeprint images.

Fig. 9.6
figure 9_6_151161_1_En

‘S’ – sample, ‘L’ – line, ‘BC’ – complete boundary, ‘BP’ – partial boundary. Left image on the second row is a “complete” one, while right is a “partial” one

Fig. 9.7
figure 9_7_151161_1_En

Examples of shoeprint images from dScene

Fig. 9.8
figure 9_8_151161_1_En

Examples of synthetic scene shoeprint images; the images of (a), (b) and (c) correspond to (scale + scene), (scene+complex), and (pattern + scene), respectively

The following experiments were carried out to compare the performance, in terms of cumulative matching curve (CMC), of four signatures: i.e. Edge Directional Histogram (EDH), Power Spectral Distribution (PSD), Pattern and Topological Spectra (PTS), and our Local Image Feature (LIF) method. The experiments are conducted as follows: we first consider a shoeprint image from the six degraded data sets as a query image. However, because of the computation requirements, we randomly choose just 50 images from each of the degraded data sets, except for dComplexDegrade, where we take all 50 (therefore, 50 trails for each degraded data set). Then we proceed with a search against the reference data set dClean (containing 500 shoeprints). For each data set, we compute the CMC curve and the results are depicted in Fig. 9.9. Besides the above quantitative comparisons, a few retrieval examples are shown in Fig. 9.10, and Table 9.1 also lists the signature sizes of the compared methods.

Fig. 9.9
figure 9_9_151161_1_Enfigure 9_9_151161_1_Enfigure 9_9_151161_1_En

Performance evaluation of four signatures (EDH, PTS, PSD and LIF) in terms of cumulative matching score on six degraded image data sets. RAND here is the worst case, i.e. the rank of the images in the reference data set is randomly assigned

Fig. 9.10
figure 9_10_151161_1_En

Examples of shoeprint image retrieval. In each row, the leftmost image is a noisy query shoeprint from dComplexDegrade, and the rest of the row shows the top ranked shoeprint images in dClean. The distance is shown under each retrieved image, and the red squares denote the corresponding patterns contained in the query images

Table 9.1 The signature size for the four techniques

Edge Directional Histogram (EDH) – (Zhang and Allinson [25]) – this technique assumes that the interior shape information of a shoeprint image is described in terms of its significant edges. Therefore, the authors have applied a histogram of the edge direction as the signature of a shoeprint image. This method first extracts the edges using a Canny edge detector followed by a total of 72 bins which are used to bin the edge directions quantised at 5° intervals. To achieve rotation invariance, a 1D FFT of the normalised edge direction histogram is computed and used as the final signature of the shoeprint image.

Power Spectral Distribution (PSD; deChazal et al. [3]) – this method considers the power spectral distribution as the signature of shoeprint image. To compute the PSD, one needs to first down-sample an input image, and second take 2D DFT on the down-sampled image; then the power spectral distribution is computed. Finally, a masking step is taken to obtain the signature. In the similarity computation, the PSD of a query image has to be rotated 30 times, with a 1° step, and the largest similarity value is considered over the 30 rotated versions. In our experiments, the step of rotation is removed because it is computationally intensive and also a rotation range of 30° is not suitable in most practical situations.

Pattern and Topological Spectra (PTS; Su et al. [23]) – this method considers the problem of automatic classification of noisy and incomplete shoeprint images and employs the principle of topological and pattern spectra. A Topological Spectrum for a shoeprint image, based on repeated open operations with increasing size of structuring elements, giving a distribution of Euler numbers is computed. The normalised differential of this produces the topological spectrum and a hybrid algorithm, which uses a distance measure based on a combination of both spectra as the feature of a shoeprint image, is proposed and applied successfully.

The above results suggest that

  1. (i)

    For the data sets degraded with Gaussian noise, cutting-out, and rescaling, the signatures of PSD and LIF can achieve almost perfect results. Further, LIF can achieve similar performance for the data sets degraded by rotation and scene background addition.

  2. (ii)

    The performance of EDH and PTS is marginally worse than that of PSD and LIF for the degradations of Gaussian noise, cutting-out, rescaling and rotation. However, both methods are efficient, noting that the cost (signature size) of the two signatures is significantly smaller than for the other two. Further it can be observed that PTS outperforms EDH in most cases (with the exception on the rescaled database).

  3. (iii)

    The signature of LIF works very well for all kinds of degradations. It clearly outperforms other signatures on the data set with the most complex degradations. However, LIF is more computationally intensive than both EDH, and PTS. (For instance, it takes about 40 sec, on average, to compute the LIF of a shoeprint image with the size of \(768 \times 280\) on our machine – Pentium 4 CPU 2.40 GHz, 760 MB of RAM, while it takes less than 1 sec and around 2 seconds for computing the EDH and PTS of an image with the same size, respectively).

Two further shoeprint matching examples based on local features are given in Fig. 9.11. The synthetic scene images contain degradations of rotation, rescaling, pattern segmentation and scene addition. It can be seen from Fig. 9.11(a,b) that more than 80 percent of feature matches are correct.

Fig. 9.11
figure 9_11_151161_1_En

Examples of shoeprint matching based on local image features

9.4 Summary

This chapter has discussed a local feature detector (Modified Harris–Laplace detector) which employs a scale-adaptive Harris corner detector to determine the local feature candidates and a Laplace-based automatic scale selection strategy in order to select the final local features. We have further improved the widely used local feature descriptors to be more robust to rotation and complement operations. To assess the performance of the system, a set of synthetic scene shoeprint images (modelling real world degradations) were used. A number of experiments on shoeprint image matching and retrieval were also conducted.

The experimental results have indicated that (i) compared with the Harris–Laplace detector, the Modified Harris–Laplace detector provides more stable local regions, (ii) the local image descriptors perform significantly better than the global descriptors on shoeprint image matching and retrieval.

Further issues to be investigated include

  • to further reduce the dimensions of a local feature descriptor. Even though we have taken some measures to reduce this, such as applying 36 dimensions instead of the original 128 dimensions, and limiting the number of detected local features to under 400;

  • to develop a fast and accurate matching strategy to deal with a large shoeprint image database. The current matching based on nearest neighbour and thresholding is fast but not accurate enough, while a more accurate matching strategy based on RANSAC is computationally intensive;

  • to develop more advanced local feature detectors and descriptors;

  • to extend the evaluation of shoeprint image retrieval and matching using local image features with real scene images.