From 2D Silhouettes to 3D Object Retrieval: Contributions and Benchmarking
- 1.4k Downloads
- 9 Citations
Abstract
3D retrieval has recently emerged as an important boost for 2D search techniques. This is mainly due to its several complementary aspects, for instance, enriching views in 2D image datasets, overcoming occlusion and serving in many real-world applications such as photography, art, archeology, and geolocalization. In this paper, we introduce a complete "2D photography to 3D object" retrieval framework. Given a (collection of) picture(s) or sketch(es) of the same scene or object, the method allows us to retrieve the underlying similar objects in a database of 3D models. The contribution of our method includes (i) a generative approach for alignment able to find canonical views consistently through scenes/objects and (ii) the application of an efficient but effective matching method used for ranking. The results are reported through the Princeton Shape Benchmark and the Shrec benchmarking consortium evaluated/compared by a third party. In the two gallery sets, our framework achieves very encouraging performance and outperforms the other runs.
Keywords
Dynamic Programming Visual Hull Discount Cumulative Gain Discount Cumulative Gain Canonical View1. Introduction
3D object recognition and retrieval recently gained a big interest [27] because of the limitation of the "2D-to-2D" approaches. The latter suffer from several drawbacks such as the lack of information (due for instance to occlusion), pose sensitivity, illumination changes, and so forth. This is also due to the exponential growth of storage and bandwidth on Internet, the increasing needs for services from 3D content providers (museum institutions, car manufacturers, etc.), and the easiness in collecting gallery sets^{1}. Furthermore, computers are now equipped with highly performant, easy to use, 3D scanners and graphic facilities for real-time modeling, rendering, and manipulation. Nevertheless, at the current time, functionalities including retrieval of 3D models are not yet sufficiently precise in order to be available for large usage.
Almost all the 3D retrieval techniques are resource (time and memory) demanding prior to achieve recognition and ranking. They usually operate on massive amount of data and require many upstream steps including object alignment, 3D-to-2D projections and normalization. However and when no hard runtime constraints are expected, 3D search engines offer real alternatives and substantial gains in performance, with respect to (only) image-based retrieval approaches; mainly when the relevant informations are appropriately extracted and processed (see, e.g., [8]).
Existing 3D object retrieval approaches can either be categorized into those operating directly on the 3D content and those which extract "2.5D" or 2D contents (stereo-pairs or multiple views of images, artificially rendered 3D objects, silhouettes, etc.). Comprehensive surveys on 3D retrieval can be found in [6, 8, 9, 34, 35, 41]. Existing state of the art techniques may also be categorized depending on the fact that they require a preliminary step of alignment or operate directly by extracting global invariant 3D signatures such as Zernike's 3D moments [28]. The latter are extracted using salient characteristics on 3D, "2.5D," or 2D shapes and ranked according to similarity measures. Structure-based approaches, presented in [19, 36, 37, 43], encode topological shape structures and make it possible to compute efficiently, without pose alignment, similarity between two global or partial 3D models. Authors in [7, 18] introduced two methods for partial shape-matching able to recognize similar subparts of objects represented as 3D polygonal meshes. The methods in [17, 23, 33] use spherical harmonics in order to describe shapes, where rotation invariance is achieved by taking only the power spectrum of the harmonic representations and discarding all "rotation-dependent" informations. Other approaches include those which analyze 3D objects using analytical functions/transforms [24, 42] and also those based on learning [29].
Another family of 3D object retrieval approaches belongs to the frontier between 2D and 3D querying paradigms. For instance, the method in [32] is based on extracting and combining spherical 3D harmonics with "2.5D" depth informations and the one in [15, 26] is based on selecting characteristic views and encoding them using the curvature scale space descriptor. Other "2.5D" approaches [11] are based on extracting rendered depth lines (as in [10, 30, 39]), resulting from vertices of regular dodecahedrons and matching them using dynamic programming. Authors in [12, 13, 14] proposed a 2D method based on Zernike's moments that provides the best results on the Princeton Shape Benchmark [34]. In this method, rotation invariance is obtained using the light-field technique where all the possible permutations of several dodecahedrons are used in order to cover the space of viewpoints around an object.
1.1. Motivations
Due to the compactness of global 3D object descriptors, their performance in capturing the inter/intraclass variabilities are known to be poor in practice [34]. In contrast, local geometric descriptors, even though computationally expensive, achieve relatively good performance and capture inter/intraclass variabilities (including deformations) better than global ones (see Section 5). The framework presented in this paper is based on local features and also cares about computational issues while keeping advantages in terms of precision and robustness.
Our target is searching 3D databases of objects using one or multiple 2D views; this scheme will be referred to as "2D-to-3D". We define our probe set as a collection of single or multiple views of the same scene or object (see Figure 2) while ourgallery set corresponds to a large set of 3D models. A query, in the probe set, will either be (i) multiple pictures of the same object, for instance stereo-pair, user's sketches, or (ii) a 3D object model processed in order to extract several views; so ending with the "2D-to-3D" querying paradigm in both cases (i) and (ii). Gallery data are also processed in order to extract several views for each 3D object (see Section 2).
- (i)
The difficulty of getting "3D query models" when only multiple views of an object of interest are available (see Figure 2). This might happen when 3D reconstruction techniques [21] fail or when 3D acquisition systems are not available. "2D-to-3D" approaches should then be applied instead.
- (ii)
3D gallery models can be manipulated via different similarity and affine transformations, in order to generate multiple views which fit the 2D probe data, so "2D-to-3D" matching and retrieval can be achieved.
1.2. Contributions
- (i)
A new generative approach is proposed in order to align andnormalize the pose of 3D objects and extract their 2D canonical views. The method is based on combining three alignments (identity and two variants of principal component analysis (PCA)) with the minimal visual hull (see Figure 1 and Section 2). Given a 3D object, this normalization is achieved by minimizing its visual hull with respect to different pose parameters (translation, scale, etc.). We found in practice that this clearly outperforms the usual PCA alignment (see Figure 10 and Table 2) and makes the retrieval process invariant to several transformations including rotation, reflection, translation, and scaling.
- (ii)
Afterwards, robust and compact contour signatures are extracted using the set of 2D canonical views. Our signature is an implementation of the multiscale curve representation first introduced in [2]. It is based on computing convexity/concavity coefficients on the contours of the (2D) object views. We also introduce a global descriptor which captures the distributions of these coefficients in order to perform pruning and speed up the whole search process (see Figures 3 and 12).
- (iii)
Finally, ranking is performed using our variant of dynamic programming which considers only a subset of possible matches thereby providing a considerable gain in performance for the same amount of errors (see Figure 12).
Figures 1, 2, and 3 show our whole proposed matching, querying, and retrieval framework which was benchmarked through the Princeton Shape Benchmark [34] and the international Shrec'09 contest on structural shape retrieval [1]. This framework achieves very encouraging performance and outperforms almost all the participating runs.
In the remainder of this paper, we consider the following terminology and notation. A probe (query) data is again defined either as (i) a 3D object model (denoted Open image in new window or Open image in new window ) processed in order to extract multiple 2D silhouettes, (ii) multiple sketched contours of the same mental query (target), or (iii) simply 2D silhouettes extracted from multiple photos of the same category (see Figure 2). Even though these acquisition scenarios are different, they allcommonly end up by providing multiple silhouettes describing the user's intention.
Let Open image in new window be a random variable standing for the 3D coordinates of vertices in any 3D model. For a given object, we assume that Open image in new window is drawn from an existing but unknown probability distribution Open image in new window . Let us consider Open image in new window as Open image in new window realizations of Open image in new window , forming a 3D object model. Open image in new window or Open image in new window will be used in order to denote a 3D model belonging to the gallery set while Open image in new window is a generic 3D object either belonging to the gallery or the probe set. Without any loss of generality 3D models are characterized by a set of vertices which may be meshed in order to form a closed surface or compact manifold of intrinsic dimension two. Other notations and terminologies will be introduced as we go through different sections of this paper which is organized as follows. Section 2 introduces the alignment and pose normalization process. Section 3 presents the global and the local multiscale contour convexity/concavity signatures. The matching process together with pruning strategies are introduced in Section 4, ending with experiments and comparison on the Princeton Shape Benchmark and the very recent Shrec'09 international benchmark in Section 5.
2. Pose Estimation
The goal of this step is to make retrieval invariant to 3D transformations (including scaling, translation, rotation, and reflection) and also to generate multiple views of 3D models in the gallery (and possibly the probe^{2}) sets. Pose estimation consists in finding the parameters of the above transformations (denoted resp. Open image in new window , Open image in new window , Open image in new window and Open image in new window ) by normalizing 3D models in order to fit into canonical poses. The underlying orthogonal 2D views will be referred to as the canonical views (see Figure 1).
Our alignment process is partly motivated by advances in cognitive psychology of human perception (see, e.g., [25]). These studies have shown that humans recognize shapes by memorizing specific views of the underlying 3D real-world objects. Following these statements, we introduce a new alignment process which mimics and finds specific views (also referred to as canonical views). Our approach is based on the minimization of a visual-hull criterion defined as the area surrounded by silhouettes extracted from different object views.
here Open image in new window denotes the global normalization transformation resulting from the combination of translation, rotation, scaling, and reflection. Open image in new window , Open image in new window , denote, respectively, the "3D-to-2D" parallel projections on the Open image in new window , Open image in new window , and Open image in new window canonical 2D planes. These canonical planes are, respectively, characterized by their normals Open image in new window , Open image in new window , and Open image in new window . The visual hull in (1) is defined as the sum of the projection areas of Open image in new window using Open image in new window . Let Open image in new window , Open image in new window , here Open image in new window provides this area on each 2D canonical plane.
It is clear that the objective function (1) is difficult to solve as one needs to recompute, for each possible Open image in new window the underlying visual hull. So it becomes clear that parsing the domain of variation of Open image in new window makes the search process tremendous. Furthermore, no gradient descent can be achieved, as there is no guarantee that Open image in new window is continuous w.r.t., Open image in new window . Instead, we restrict the search by considering few possibilities; in order to define the optimal pose of a given object Open image in new window , the alignment, which locally minimizes the visual-hull criterion (1), is taken as one of the three possible alignments obtained according to the following procedure.
Translation and Scaling
Open image in new window and Open image in new window are recovered simply by centering and rescaling the 3D points in Open image in new window so that they fit inside an enclosing ball of unit radius. The latter is iteratively found by deflating an initial ball until it cannot shrink anymore without losing points in Open image in new window (see [16] for more details).
Rotation
Open image in new window is taken as one of the three possible candidate matrices including (i) identity^{4} (i.e., no transformation, denoted none), or one of the transformation matrices resulting from PCA either on (ii) gravity centers or (iii) face normals, of Open image in new window . The two cases (ii), (iii) will be referred to as PCA and normal PCA (NPCA), respectively, [39, 40].
Axis Reordering and Reflection
This step processes only 3D probe objects and consists in re-ordering and reflecting the three projection planes Open image in new window , in order to generate Open image in new window possible triples of 2D canonical views (i.e., Open image in new window for reordering Open image in new window for reflection). Reflection makes it possible to consider mirrored views of objects while reordering allows us to permute the principal orthogonal axes of an object and therefore permuting the underlying 2D canonical views.
For each combination taken from "scaling Open image in new window translation Open image in new window 3 possible rotations" (see explanation earlier), the objective function (1) is evaluated. The combination Open image in new window that minimizes this function is kept as the best transformation. Finally, three canonical views are generated for each object Open image in new window in the gallery set.
3. Multiview Object Description
We use simple convexity/concavity coefficients as local descriptors for each 2D point Open image in new window on Open image in new window ( Open image in new window ). Each coefficient is defined as the amount of shift of Open image in new window between two consecutive scales Open image in new window and Open image in new window . Put differently, a convexity/concavity coefficient denoted Open image in new window is taken as Open image in new window , here Open image in new window denotes the Open image in new window norm.
Runtime
This table describes the average alignment and feature extraction runtime in order to process one object (with 3 and 9 silhouettes).
Alignment | Extraction | Total | |
---|---|---|---|
3 silhouettes | 1.7 s | 0.3 s | 2 s |
9 silhouettes | 1.7 s | 0.9 s | 2.6 s |
Results for different settings of alignment and pruning on the two datasets (W for Watertight, P for Princeton). The two rows shown in bold illustrate the performances of the best precision/runtime trade-off.
NN (%) | FT (%) | ST (%) | DCG (%) | ||
---|---|---|---|---|---|
Align (None), 3 Views, Prun ( Open image in new window ) | W | 92.5 | 51.6 | 65.6 | 82.1 |
P | 60.4 | 30.5 | 41.8 | 60.1 | |
Align (NPCA), 3 Views, Prun ( Open image in new window ) | W | 93.5 | 60.7 | 71.9 | 86 |
P | 62.7 | 37.1 | 49.2 | 64.1 | |
Align (PCA), 3 Views, Prun ( Open image in new window ) | W | 94.7 | 61.5 | 72.8 | 86.5 |
P | 65.4 | 38.2 | 49.7 | 64.7 | |
Align (Our), 3 Views, Prun ( Open image in new window ) | W | 95.2 | 62.7 | 73.7 | 86.9 |
P | 67.1 | 39.8 | 51 | 66.1 | |
Align (Our), 9 Views, Prun ( Open image in new window ) | W | 95.2 | 65.3 | 75.6 | 88 |
P | 71.9 | 45.1 | 55.6 | 70.1 | |
Align (Our), 3 Views, Prun ( Open image in new window ) | W | 89.5 | 57.8 | 72.3 | 83.9 |
P | 60.5 | 34.5 | 47.2 | 61.8 | |
Align (Our), 3 Views, Prun ( Open image in new window ) | W | 95.5 | 62.8 | 73.7 | 86.9 |
P | 66.1 | 40.1 | 51 | 66 |
4. Coarse-to-Fine Matching
4.1. Coarse Pruning
A simple coarse shape descriptor is extracted both on the gallery and probe sets. This descriptor quantifies the distribution of convexity and concavity coefficients through 2D points belonging to different silhouettes of a given object. This coarse descriptor is a multiscale histogram containing 100 bins as the product of Open image in new window scales of the Gaussian kernel (see Section 3) and Open image in new window quantification values for convexity/concavity coefficients. Each bin of this histogram counts, through all the viewpoint silhouettes of an object, the frequency of the underlying convexity/concavity coefficients. This descriptor is poor in terms of its discrimination power, but efficient in order to reject almost all the false matches while keeping candidate ones when ranking the gallery objects w.r.t. the probe ones (see also processing time in Figure 9).
4.2. Fine Matching by Dynamic Programming
here Open image in new window is the number of silhouettes per probe image (in practice, Open image in new window or Open image in new window , see Section 5).
Dynamic programming pseudodistance provides a good discrimination power and may capture the intraclass variations better than the global distance (discussed in Section 4.1). Nevertheless, it is still computationally expensive but when combined with coarse pruning the whole process is significantly faster and also precise (see Figure 9 and Table 2). Finally, this elastic similarity measure allows us to achieve retrieval while being robust to intraclass object articulations/deformations (observed in the Shrec Watertight set) and also to other effects (including noise) induced by hand-drawn sketches (see Figures 14, 15, 16, and 17).
Runtime
Using the coarse-to-fine querying scheme described earlier, we adjust the speedup/precision trade-off via a parameter Open image in new window . Given a query, this parameter corresponds to the fraction of nearest neighbors (according to our global descriptor) used in order to achieve dynamic programming. Lower values of Open image in new window make the retrieval process very fast at the detriment of a slight decrease of precision and vice versa. Figure 9 shows runtime performance with respect to Open image in new window on the same hardware platform (with 9 views).
5. Experiments
5.1. Databases
In order to evaluate the robustness of the proposed framework, we used two datasets. The first one is the Watertight dataset of the Shrec benchmark while the second one is the Princeton Shape Benchmark, widely used in the 3D content-based retrieval community.
Shrec Watertight Dataset
This dataset contains 400 "3D" objects represented by seamless surfaces (without defective holes or gaps). The models of this database have been divided into Open image in new window classes each one contains Open image in new window objects. The 3D models were taken from two sources: the first one is a deformation of an initial subset of objects (octopus, glasses, Open image in new window ), while the second one is a collection of original 3D models (chair, vase, four legs, Open image in new window ).
Princeton Shape Benchmark
This dataset contains 907 "3D" objects organized in 92 classes. This dataset offers a large variety of objects for evaluation.
For the two datasets, each 3D object belongs to a unique class among different semantic concepts with strong variations including human, airplane, chair, and so forth. For instance, the human class contains persons with different poses and appearances "running, seating, walking, etc.", globally the two databases are very challenging.
5.2. Evaluation Criteria
- (i)
The nearest neighbor (NN). It represents the fraction of the first nearest neighbors which belong to the same class as the query.
- (ii)
The first-tier (FT) and the second-tier (ST). These measures give the percentage of objects in the same class as the query that appear in the Open image in new window best matches. For a given class Open image in new window containing Open image in new window objects, Open image in new window is set to Open image in new window for the first-tier measure while Open image in new window is set to Open image in new window for second-tier (ST).
- (iii)Finally, we use the discounted cumulative gain (DCG) measure which gives more importance to well-ranked models. Given a query and a list of ranked objects, we define for each ranked object a variable Open image in new window equal to Open image in new window if its class is equal to the class of the query and Open image in new window otherwise. The DCG measure is then defined as.
We take the expectation of these measures on the entire database, that is, by taking all the possible object queries.
5.3. Performance and Discussion
Alignment
Figure 10 shows the performance of our alignment method presented in Section 2 on the Watertight dataset. For that purpose, we define a ground truth, by manually aligning Open image in new window "3D" models^{5} ( Open image in new window categories each one has Open image in new window objects) in order to make their canonical views parallel to the canonical planes Open image in new window , Open image in new window and Open image in new window (see Figure 11, Open image in new window and also Figure 4). The error is then defined as the deviation (angle in degrees or radians) of the automatically aligned objects w.r.t. the underlying ground truth (see Figure 11, Open image in new window ).
Different alignment methods were compared including the classic (PCA), normal PCA (NPCA), and our method. We also show the alignment error of the initial (not) aligned database (None). The plot in Figure 10 shows a comparison of the percentage of 3D objects in the database, which are automatically and correctly aligned up to an angle Open image in new window w.r.t. the underlying 3D models in the ground truth.
Coarse-to-Fine Retrieval
In order to control/reduce the runtime to process and match local signatures, we used our pruning approach based on the global signature discussed in Section 4.1. The parameter Open image in new window allows us to control the trade-off between robustness and speed of the retrieval process. A small value of Open image in new window gives real-time (online) responses with an acceptable precision while a high value requires more processing time but gives better retrieval performance. Figure 12 shows the NN, FT, ST, and DCG measures for different pruning thresholds Open image in new window . Table 2 shows different statistics for Open image in new window , Open image in new window and Open image in new window .
This tableshows the comparison of dynamic programming w.r.t adhoc matching on the two datasets (W for Watertight, P for Princeton). We use our pose estimation and alignment technique and we generate 3 views per 3D object. DP stands for dynamic programming while NM stands for naive matching.
NN (%) | FT (%) | ST (%) | DCG (%) | ||
---|---|---|---|---|---|
DP + pruning ( Open image in new window ) | W | 95.2 | 62.7 | 73.7 | 86.9 |
P | 67.1 | 39.8 | 51 | 66.1 | |
NM + pruning ( Open image in new window ) | W | 92 | 57.7 | 71.9 | 84.5 |
P | 65.8 | 37.7 | 48.7 | 64.6 | |
DP + pruning ( Open image in new window ) | W | 95.5 | 62.8 | 73.7 | 86.9 |
P | 66.1 | 40.1 | 51 | 66 | |
NM + pruning ( Open image in new window ) | W | 91.5 | 52.6 | 63.8 | 81.1 |
P | 62.9 | 35.4 | 45.2 | 62.6 |
5.4. Benchmarking and Comparison
Shrec Watertight Dataset
First, comparisons of our approach with respect to different methods/participants are available and were generated by a third party in the Shrec'09 Structural Shape Retrieval contest (see Table 4). This dataset contains 200 objects and results were evaluated on 10 queries. The performance of this shape retrieval contest were measured using 1st (10 objects) and 2nd (20 objects) tier precision and recall, presented as the F-measure. This is a global measure which provides us with the overall retrieval performance.
- (i)
Run 1 (MCC 1): 9 silhouettes and pruning threshold Open image in new window . The average runtime for each query is 0.03 s.
- (ii)
Run 2 (MCC 2): 9 silhouettes and pruning threshold Open image in new window . The average runtime for each query is 9.4 s.
- (iii)
Run 3 (MCC 3): 9 silhouettes and pruning threshold Open image in new window . The average runtime for each query is 36.2 s.
- (iv)
Run 4 (MCC 4): 3 silhouettes and pruning threshold Open image in new window . The average runtime for each query is 3.1 s.
Methods | Precision | Recall | Precision | Recall |
---|---|---|---|---|
FT (%) | FT (%) | ST (%) | ST (%) | |
MCC 3 | 81 | 54 | 51 | 68 |
CSID- CMVD 3 | 77 | 52 | 52 | 70 |
CSID- CMVD 2 | 76 | 51 | 51 | 68 |
MCC 2 | 74 | 49 | 48 | 64 |
CSID- CMVD 1 | 74 | 49 | 48 | 64 |
MRSPRH-UDR 1 | 74 | 49 | 48 | 64 |
BFSIFT 1 | 72 | 48 | 48 | 64 |
MCC 4 | 71 | 48 | 45 | 60 |
CMVD 1 | 69 | 46 | 47 | 62 |
MCC 1 | 68 | 46 | 45 | 61 |
ERG 2 | 61 | 41 | 40 | 53 |
ERG 1 | 56 | 37 | 36 | 49 |
BOW 1 | 29 | 19 | 17 | 23 |
CBOW 2 | 25 | 17 | 16 | 21 |
We can see in Table 4 that the third run of our method (shown in bold) outperforms the others for the first-tier and is equivalent to the C SID CMVD 3 for the second-tier (see Table 7 for the significance of method acronyms). The results for the second run are similar to the BF SIFT 1 and to the C SID CMVD 1 methods.
Princeton Shape Benchmark Dataset
Methods | NN (%) | FT (%) | ST (%) | DCG (%) |
---|---|---|---|---|
MCC 3 | 71.9 | 47.2 | 58.6 | 71.5 |
MCC 2 | 71.9 | 45.1 | 55.6 | 70.1 |
MCC 4 | 67.1 | 39.8 | 51 | 66.1 |
MCC 1 | 65.9 | 39.4 | 50.7 | 65.8 |
LFD | 65.7 | 38 | 48.7 | 64.3 |
EDBA | 65.4 | 38.3 | 49.8 | 64.1 |
AVC | 62 | 35.5 | 45.5 | 63 |
ESA | 57.8 | 32.6 | 44.4 | 60.2 |
REXT | 60.2 | 32.7 | 43.2 | 60.1 |
DBD | 59.2 | 32.9 | 41.8 | 58.9 |
SHD | 55.6 | 30.9 | 41.1 | 58.4 |
GEDT | 60.3 | 31.3 | 40.7 | 58.4 |
SIL | 52.8 | 28.5 | 38.8 | 56.3 |
EXT | 54.9 | 28.6 | 37.9 | 56.2 |
SECSHEL | 54.6 | 26.7 | 35 | 54.5 |
VOXEL | 54 | 26.7 | 35.5 | 54.3 |
SECTORS | 50.4 | 24.9 | 33.4 | 52.9 |
CEGI | 42 | 21.1 | 28.7 | 47.9 |
EGI | 37.7 | 19.7 | 27.7 | 47.2 |
D2 | 31.1 | 15.8 | 23.5 | 43.4 |
SHELLS | 22.7 | 11.1 | 17.3 | 38.6 |
Hand-Drawn Sketches and Photos
Query | Number of views | NN (%) | FT (%) | ST (%) | DCG (%) |
---|---|---|---|---|---|
Watertight-Photos-Fish | 1 | 0 | 35 | 60 | 55 |
2 | 100 | 65 | 85 | 83.2 | |
3 | 100 | 80 | 85 | 95.8 | |
Watertight-Photos-Teddy | 1 | 0 | 15 | 55 | 50.8 |
2 | 100 | 55 | 80 | 76.6 | |
3 | 100 | 65 | 75 | 91.5 | |
Watertight-Sketches-Chair | 1 | 100 | 30 | 35 | 74.3 |
2 | 100 | 80 | 90 | 96.5 | |
3 | 100 | 90 | 90 | 97.8 | |
Watertight-Sketches-Human | 1 | 100 | 40 | 50 | 79.7 |
2 | 100 | 50 | 60 | 86.1 | |
3 | 100 | 55 | 55 | 89.1 | |
Princeton-Photos-Commercial | 1 | 100 | 45.4 | 45.4 | 75.6 |
2 | 100 | 45.4 | 63.6 | 84 | |
3 | 100 | 54.5 | 63.6 | 87.8 | |
Princeton-Photos-Hand | 1 | 100 | 17.6 | 23.5 | 61.3 |
2 | 100 | 35.3 | 35.3 | 75.1 | |
3 | 100 | 41.2 | 41.2 | 75.4 | |
Princeton-Sketches-Glass with stem | 1 | 100 | 44.4 | 44.4 | 70 |
2 | 0 | 66.7 | 66.7 | 76.7 | |
3 | 0 | 77.8 | 88.9 | 83.5 | |
Princeton-Sketches-Eyeglasses | 1 | 0 | 28.6 | 28.6 | 38.6 |
2 | 0 | 28.6 | 28.6 | 58.8 | |
3 | 100 | 57.1 | 57.1 | 82.7 |
This table describes the significance of different acronyms/methods which participate in the Watertight and the Princeton benchmarks.
CSID-CMVD 1, 2, 3 | Compact shape impact descriptor and multi view descriptor [1] |
---|---|
MRSPRH-UDR | Unsupervised dimension reduction approach [1] |
BFSIFT | Bag-of-local visual feature [1] |
ERG 1, 2 | Enhanced reeb graph [1] |
BOW, CBOW | (Concentric) bag of words [1] |
LFD | |
EDBA | Enhanced depth buffer approach [10] |
AVC | Adaptive view clustering [15] |
ESA | Enhanced silhouettes approach [10] |
REXT | Radialized spherical extent function [38] |
DBD | Depth buffer descriptor [39] |
SHD | Spherical harmonic descriptor [23] |
GEDT | Gaussian euclidean distance transform [23] |
SIL | Silhouettes approach [39] |
EXT | Spherical extent function [33] |
SECSHEL | Shape histogram [3] |
VOXEL | 3D shape voxelization [39] |
SECTORS | Shape histogram [3] |
CEGI | Complex extended gaussian image [22] |
EGI | Extended gaussian image [20] |
D2 | D2shape distribution [31] |
SHELLS | Shape histogram [3] |
6. Conclusion
We introduced in this paper a novel and complete framework for "2D-to-3D" object retrieval. The method makes it possible to extract canonical views using a generative approach combined with principal component analysis. The underlying silhouettes/contours are matched using dynamic programming in a coarse-to-fine way that makes the search process efficient and also effective as shown through extensive evaluations.
One of the major drawbacks of dynamic programming resides in the fact that it is not a metric, so one cannot benefit from lossless acceleration techniques which provide precise results and efficient computation. Our extension is to tackle this issue by introducing new matching approaches that allow us to speedup the search process while keeping high precision.
Endnotes
1. Even though in a chaotic way because of the absence of consistent alignments of 3D models.
2. Obviously, normalization is achieved on the probe set only when queries are 3D models. As for the 2D photo or the sketch scenarios, one assumes that at least three silhouettes are available corresponding to three canonical views.
3. Again, this is in accordance with cognitive psychology of human perception (defined, e.g., in [25]).
4. The initial object pose is assumed to be the canonical one.
5. http://perso.enst.fr/~sahbi/file/Watertight_AlignmentGroundTruth.zip.
6. The user will imagine a category existing in the Watertight gallery set and will draw it.
Notes
Acknowledgment
This work was supported by the European Network of Excellence KSpace and the French National Research Agency (ANR) under the AVEIR Project, ANR-06-MDCA-002.
References
- 1.Hartveldt J, Spagnuolo M, Axenopoulos A, et al.: SHREC'09 track: structural shape retrieval on watertight models. Proceedings of Eurographics Workshop on 3D Object Retrieval, March 2009, Munich, Germany 77-83.Google Scholar
- 2.Adamek T, O'Connor NE: A multiscale representation method for nonrigid shapes with a single closed contour. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(5):742-753. 10.1109/TCSVT.2004.826776CrossRefGoogle Scholar
- 3.Ankerst M, Kastenmüller G, Kriegel HP, Seidl T: Nearest neighbor classification in 3D protein databases. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB '99), August 1999, Heidelberg, Germany 34-43.Google Scholar
- 4.Ansary TF: Model retrieval using 2d characteristic views, Ph.D. thesis. 2006.Google Scholar
- 5.Bellman R: Dynamic programming. Science 1966,153(3731):34-37. 10.1126/science.153.3731.34CrossRefMATHGoogle Scholar
- 6.Biasotti S, Giorgi D, Marini S, Spagnuolo M, Falcidieno B: A comparison framework for 3D object classification methods. Proceedings of the International Workshop on Multimedia Content Representation, Classification and Security (MRCS '06), 2006, Lecture Notes in Computer Science 4105: 314-321.CrossRefGoogle Scholar
- 7.Biasotti S, Marini S, Spagnuolo M, Falcidieno B: Sub-part correspondence by structural descriptors of 3D shapes. Computer Aided Design 2006,38(9):1002-1019. 10.1016/j.cad.2006.07.003CrossRefGoogle Scholar
- 8.Del Bimbo A, Pala P: Content-based retrieval of 3D models. ACM Transactions on Multimedia Computing, Communications and Applications 2006,2(1):20-43. 10.1145/1126004.1126006CrossRefGoogle Scholar
- 9.Bustos B, Keim D, Saupe D, Schreck T, Vranić D: An experimental comparison of feature-based 3D retrieval methods. In Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization, and Transmission, September 2004, Thessaloniki, Greece. IEEE Computer Society; 215-222.Google Scholar
- 10.Chaouch M, Verroust-Blondet A: Enhanced 2D/3D approaches based on relevance index for 3D-shape retrieval. Proceedings of IEEE International Conference on Shape Modeling and Applications (SMI '06), June 2006, Matsushima, Japan 36.Google Scholar
- 11.Chaouch M, Verroust-Blondet A: A new descriptor for 2D depth image indexing and 3D model retrieval. Proceedings 14th IEEE International Conference on Image Processing (ICIP '07), September 2006 6: 373-376.Google Scholar
- 12.Chen D: Three-dimensional model shape description and retrieval based on lightfield descriptors, Ph.D. thesis. Department of Computer Science and Information Engineer, National Taiwan University, Taipei, Taiwan; June 2003.Google Scholar
- 13.Chen D, Ouhyoung M: A 3d model alignment and retrieval system. Proceedings of the International Workshop on Multimedia Technologies, December 2002 1436-1443.Google Scholar
- 14.Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M: On visual similarity based 3D model retrieval. Computer Graphics Forum 2003,22(3):223-232. 10.1111/1467-8659.00669CrossRefGoogle Scholar
- 15.Ansary TF, Daoudi M, Vandeborre J-P: A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 2007,9(1):78-88.CrossRefGoogle Scholar
- 16.Fischer K, Gärtner B: The smallest enclosing ball of balls: combinatorial structure and algorithms. International Journal of Computational Geometry and Applications 2004,14(4-5):341-378.CrossRefMathSciNetMATHGoogle Scholar
- 17.Funkhouser T, Min P, Kazhdan M, et al.: A search engine for 3D models. ACM Transactions on Graphics 2003,22(1):83-105. 10.1145/588272.588279CrossRefGoogle Scholar
- 18.Funkhouser T, Shilane P: Partial matching of 3d shapes with priority-driven search. Proceedings of the 4th Eurographics Symposium on Geometry Processing, June 2006, Cagliari, Italy 131-142.Google Scholar
- 19.Hilaga M, Shinagawa Y, Kohmura T, Kunii TL: Topology matching for fully automatic similarity estimation of 3D shapes. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 203-212.CrossRefGoogle Scholar
- 20.Horn BKP: Extended Gaussian images. Proceedings of the IEEE 1984,72(12):1671-1686.CrossRefGoogle Scholar
- 21.Jin H, Soatto S, Yezzi AJ: Multi-view stereo reconstruction of dense shape and complex appearance. International Journal of Computer Vision 2005,63(3):175-189. 10.1007/s11263-005-6876-7CrossRefGoogle Scholar
- 22.Kang SB, lkeuchi K: Determining 3-d object pose using the complex extended Gaussian image. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 1991, Maui, Hawaii, USA 580-585.Google Scholar
- 23.Kazhdan MM, Funkhouser TA, Rusinkiewicz S: Rotation invariant spherical harmonic representation of 3d shape descriptors. In Proceedings of the Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (SGP '03), June 2003, Aachen, Germany. Eurographics Association; 156-165.Google Scholar
- 24.Laga H, Takahashi H, Nakajima M: Spherical wavelet descriptors for content-based 3D model retrieval. Proceedings of IEEE International Conference on Shape Modeling and Applications (SMI '06), June 2006, Matsushima, Japan 15.Google Scholar
- 25.Leek EC: Effects of stimulus orientation on the identification of common polyoriented objects. Psychonomic Bulletin and Review 1998,5(4):650-658. 10.3758/BF03208841CrossRefGoogle Scholar
- 26.Mahmoudi S, Daoudi M: 3D models retrieval by using characteristic views. Proceedings of the 16th International Conference on Pattern Recognition (ICPR '02), August 2002, Quebec, Canada (2):457-460.Google Scholar
- 27.NIST : Shape retrieval contest on a new generic shape benchmark. 2004, http://www.itl.nist.gov/iad/vug/sharp/benchmark/shrecGeneric/Evaluation.htmlGoogle Scholar
- 28.Novotni M, Klein R: 3D Zernike descriptors for content based shape retrieval. In Proceedings of the 8th Symposium on Solid Modeling and Applications, 2003, Seattle, Wash, USA. ACM Press; 216-225.CrossRefGoogle Scholar
- 29.Ohbuchi R, Kobayashi J: Unsupervised learning from a corpus for shape-based 3D model retrieval. Proceedings of the ACM International Multimedia Conference and Exhibition, October 2006, Santa Barbara, Calif, USA 163-172.Google Scholar
- 30.Ohbuchi R, Nakazawa M, Takei T: Retrieving 3d shapes based on their appearance. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR '03), 2003 Edited by: Sebe N, Lew MS, Djeraba C. 39-45.Google Scholar
- 31.Osada R, Funkhouser T, Chazelle B, Dobkin D: Matching 3d models with shape distributions. In Proceedings of the International Conference on Shape Modeling and Applications (SMI '01), May 2001, Los Alamitos, Calif, USA. Edited by: Werner B. IEEE Computer Society; 154-166.CrossRefGoogle Scholar
- 32.Papadakis P, Pratikakis I, Perantonis S, Theoharis T, Passalis G: SHREC'08 entry: 2D/3D hybrid. Proceedings of IEEE International Conference on Shape Modeling and Applications (SMI '08), June 2008 247-248.Google Scholar
- 33.Saupe D, Vranic DV: 3d model retrieval with spherical harmonics and moments. In Proceedings of the 23rd DAGM-Symposium on Pattern Recognition, 2001, Lecture Notes in Computer Science. Volume 2191. Edited by: Radig B, Florczyk S. Springer; 392-397.Google Scholar
- 34.Shilane P, Min P, Kazhdan M, Funkhouser T: The Princeton shape benchmark. Proceedings of the Shape Modeling International (SMI '04), 2004, Washington, DC, USA 167-178.Google Scholar
- 35.Tangelder JWH, Veltkamp RC: A survey of content based 3D shape retrieval methods. Proceedings of the Shape Modeling International (SMI '04), June 2004 145-156.Google Scholar
- 36.Tierny J, Vandeborre J-P, Daoudi M: 3d mesh skeleton extraction using topological and geometrical analyses. Proceedings of the 14th Pacific Conference on Computer Graphics and Applications, October 2006, Taipei, Taiwan 85-94.Google Scholar
- 37.Tung T, Schmitt F: The augmented multiresolution Reeb graph approach for content-based retrieval of 3D shapes. International Journal of Shape Modeling 2005,11(1):91-120. 10.1142/S0218654305000748CrossRefMATHGoogle Scholar
- 38.Vranić DV: An improvement of rotation invariant 3D-shape descriptor based on functions on concentric spheres. Proceedings of IEEE International Conference on Image Processing, 2003 3: 757-760.Google Scholar
- 39.Vranic DV: 3D model retrieval, Ph.D. thesis. University of Leipzig; 2004.MATHGoogle Scholar
- 40.Vranić DV, Saupe D, Richter J: Tools for 3D-object retrieval: Karhunen-Loeve transform and spherical harmonics. In Proceedings of the 4th IEEE Workshop on Multimedia Signal Processing, September 2001, Budapest, Hungary Edited by: Dugelay J-L, Rose K. 293-298.Google Scholar
- 41.Zaharia T, Prêteux F: 3D versus 2D/3D shape descriptors: a comparative study. Imaging Processing: Algorithms and Systems III, January 2004, San Jose, Calif, USA, Proceedings of SPIE 5298: 47-58.Google Scholar
- 42.Zarpalas D, Daras P, Axenopoulos A, Tzovaras D, Strintzis MG: 3D model search and retrieval using the spherical trace transform. EURASIP Journal on Advances in Signal Processing 2007, 2007:-14.Google Scholar
- 43.Tierny J, Vandeborre J-P, Daoudi M: Invariant high level reeb graphs of 3D polygonal meshes. Proceedings of the 2rd International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT '06), June 2007, Chapel Hill, NC, USA 105-112.Google Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.