Keywords

1 Introduction

With the increasing availability of high-resolution 3D scans, topological surface description is becoming increasingly important. In recent years, methods for sparse and dense 3D scene reconstruction have progressed strongly due to availability of inexpensive, off-the-shelf hardware (e.g. Microsoft Kinect) and the development of robust reconstruction algorithms (e.g. structure from motion techniques, SfM) [5, 28]. Since 3D scanning has become an affordable process the amount of available 3D data has increased significantly. At the same time, the reconstruction accuracy has increased strongly, which enables 3D reconstructions with sub-millimeter resolution [27]. The high resolution enables the accurate description of a 3D surface’s geometric micro-structure, which opens up new opportunities for search and retrieval in 3D scenes, such as the recognition of objects by their specific surface properties as well as the distinction of different types of materials for improved scene understanding.

In this paper, we investigate the problem of describing and classifying 3D surfaces according to their geometric micro-structure. Two different types of approaches exist for this problem. Firstly, the dense processing of the surface in 3D space and secondly, the processing of the surface geometry in image-space based on depth maps derived from the surface.

For the representation of surface geometry in 3D, descriptors are required that capture the local geometry around a given point or mesh vertex. Different types of local 3D descriptors have been developed recently that are suitable for the description of the local geometry around a 3D point, such as spin images [12], 3D shape context [4], and persistent point feature histograms [24].

The dense extraction of surface geometry by local 3D descriptors, however, becomes a computationally demanding task when several millions of points need to be processed. A computationally more efficient approach is the analysis of 3D surfaces in image space. In such approaches a 3D surface is first mapped to a depth map which represents a height field of the surface. This processing step maps the 3D surface analysis problem to a 2D texture analysis task which can be approached by analyzing the surface by texture descriptors, such as HOG, GLCM, and Wavelet-based features [19, 29, 30].

The presented approach falls into the category of image-space approaches. We first map the surface to image-space by a depth projection. Next, we divide the resulting depth map into patches and describe them with traditional non-topological as well as with topological surface descriptors. For the classification of surface patches we use random undersampling boosting (RUSBoost) [25] due to its high accuracy for imbalanced class distributions [16].

2 Topological Approach

By mathematical standards topology, with its 120 years of history, is a relatively young discipline. It grew out of H. Poincares seminal work on the stability of the solar system as a qualitative tool to study the dynamics of differential equations without explicit formulas for solutions [2022]. Due to the lack of useful analytic methods, topology soon became a purely theoretical discipline. However, in the last few years we observe a rapid development of topological data analysis tools, which open new applications for topology.

Topological spaces appearing in data analysis are typically constructed from small pieces or cells. A natural tool in the study of multidimensional images with topological methods are hypercubes (points, edges, squares, cubes etc.), e.g. a pixel in a 2 dimensional image is equivalent to a square, a voxel in a 3 dimensional volume is equivalent to a cube. Hypercubes are building blocks for structures called cubical complexes. Such representations give topology a combinatorial flavour and make it a natural tool in the study of multi-dimensional data sets.

Intuitively, the rank of the nth homology group, the so called nth Betti number denoted \(\beta _n\), counts the number of n-dimensional holes in the topological space. In particular, \(\beta _0\) counts the number of connected components. As an example consider the image of the digit 8. In this image there is one connected component and two holes, hence \(\beta _0=1\) and \(\beta _1 =2\). For a hollow sphere we have \(\beta _0 =1\), \(\beta _1 =0\), \(\beta _2 =1\). For a tube in a tire we have \(\beta _0 =1\), \(\beta _1 =2\), \(\beta _2 =1\).

Betti numbers do not differentiate between small and large holes. In consequence, the holes resulting from the noise in the data cannot be distinguished from the holes indicative for the nature of the data. For instance, in a noisy image of the digit 8 one can get easily \(\beta _0 > 1\). A remedy for this drawback is persistent homology, a tool invented at the beginning of the 20th century [7]. Persistent homology studies how the Betti numbers change when the topological space is gradually built by adding cubes in some prescribed order.

If X is a cubical complex, one can add cubes step by step. Typically, the construction goes through different scales, starting from the smallest pieces. However, in general an arbitrary function \(f:X\rightarrow \mathbb {R}\), called the Morse function or measurement function, may be used to control the order in which the complex is built, starting from low values of f and increasing subsequently. This way we obtain a sequence of topological spaces, called a filtration,

$$ \emptyset =X_{r_0}\subset X_{r_1}\subset X_{r_2}\subset \cdots \subset X_{r_n}=X, $$

where \(X_r:=f^{-1}((-\infty ,r])\) and \(r_i\) is a growing sequence of values of f at which the complex changes. As the space is gradually constructed, holes are born, persist for some time and eventually may die. The length of the associated birth-death intervals (persistence intervals) indicates if the holes are relevant or merely noise. The lifetime of holes is usually visualized by the so called persistence diagram (PD). Persistence diagrams constitute the main tool of topological data analysis. They visualize geometrical properties of a multidimensional object X in a simple two dimensional diagram.

Figure 1(a) shows a 3D surface as a 2D depth map, where colors corresponds to depth (blue refers to low depth, yellow to high depth). In this case pixels are represented as 2-dimensional cells of a cubical complex. For the complex we can obtain a filtration \(X_r\) using a measuring function which has a value for a 2-dimensional cube equal to height (pixel color). For a lower dimensional cell (a vertex or an edge) we can set the function value as a maximum from the higher-dimensional neighborhoods of the cell. Figure 1(b) shows the persistence diagram for \(X_r\).

Fig. 1.
figure 1

Example patch: (a) the original 3D surface as a 2D depth map; (b) the corresponding persistent diagram; (c) and the persistent image with \(\sigma =0.001\) and resolution \(16\times 16\). (Color figure online)

There is still no specific answer on how and when the tools of computational topology and machine learning should be used together. A first attempt is to provide a descriptor of a topological space filtration based on elementary statistics of persistence intervals (or equivalently on persistence diagrams). Let

$$ I := \{[b_1,e_1], [b_2,e_2], \ldots , [b_n,e_n] \} $$

be a set of persistence intervals. Let \(D := \{ d_i := (e_i - b_i)\}_{i=1}^n\) be a set of the interval lengths. We build an aggregated descriptor of D, denoted by PD_AGG, using following measures: number of elements, minimum, maximum, mean, standard deviation, variance, 1-quartile, median, 3-quartile, and norms \(\sum \sqrt{d_i}\), \(\sum d_i\), and \(\sum (d_i)^2\).

Except the PD_AGG descriptor described above, which can be used with standard classification methods, there are also attempts to use PD directly with appropriately modified classifiers. Reininghaus et al. [23] proposed a multiscale kernel for PDs, which can be used with a support vector machine (SVM). While this kernel is well-defined in theory, in practice it becomes highly inefficient for a large number of training vectors (as the entire kernel matrix must be computed explicitly). As an alternative, Chepushtanova et al. [1] introduced a novel representation of a PD, called a persistence image (PI), which is faster and can be used with a broader range of machine learning (ML) techniques.

A PI is derived from mapping a PD to an integrable function \(G_p: \mathbb {R}^2 \rightarrow \mathbb {R}\), which is a sum of Gaussian functions centered at each point of the PD. Taking a discretization of a subdomain of \(G_p\) defines a grid. An image can be created by computing the integral of \(G_p\) on each grid box, thus defining a matrix of pixel values. Formally, the value of each pixel p within a PI is defined by the following equation:

$$ PI(p) = \iint \limits _{p} \sum _{[b_i, e_i] \in I}{g(b_i, e_i) \, \dfrac{1}{2 \pi \sigma _x \sigma _y} \, e^{-\frac{1}{2} \left( \frac{(x - b_i)}{\sigma _x^2} + \frac{(y - e_i)}{\sigma _y^2} \right) }} \,dy\,dx, $$

where \(g(b_i, e_i)\) is a weighting function, which depends on the distance from the diagonal (points close to the diagonal are usually considered as noise, therefore they should have low weights), \(\sigma _x\) and \(\sigma _y\) are the standard deviations of the Gaussians in x and y direction. The resulting image (see Fig. 1c) is vectorized to achieve a standardized vectorial representation which is compatible to a broad range of ML techniques.

The advantage of PIs compared to PDs descriptor is a high classification accuracy and stability [1]. However, they require numerous parameters like the PI resolution, the weighting function g, as well as \(\sigma _x\) and \(\sigma _y\).

3 Experimental Setup

In our experiments we investigate the robustness and expressiveness of the topological descriptors presented in Sect. 2 for 3D surface analysis and compare and combine them with traditional non-topological descriptors. For our experiments, we employ a dataset of high-resolution 3D reconstructions from the archaeological domain with a resolution below 0.1 mm [30]. The dimension of the scanned surfaces ranges from approx. \(20 \times 30\) cm to \(30 \times 50\) cm. The reconstructions represent natural rock surfaces that exhibit human-made engravings (so-called rock-art). The engravings represent symbols and figures (e.g. animals and humans) engraved by humans in ancient times. See Fig. 2 for an example surface. The engraved regions in the surface exhibit a different surface geometry than the surrounding natural rock surface. In our experiments we aim at automatically separating the engraved areas from the natural rock surface. The corresponding ground truth is depicted in Fig. 2c.

The employed dataset contains 4 surface reconstructions with a total number of 12.3 millions of points. For each surface a precise ground truth has been generated by domain experts that labels all engravings on the surface. The dataset contains two classes of surface topographies: class 1 represents engraved areas and class 2 represents the natural rock surface. Class priors are imbalanced. Class 1 represents 16.6 % of the data and is thus underrepresented.

Fig. 2.
figure 2

Example data: (a) the 3D point cloud of the surface; (b) the depth projection of the surface with compensated global curvature; (c) ground truth labeling that specifies areas with different topography, such as the human-shaped figure in the center whose head is marked with an arrow.

For each scan we perform depth projection and preprocessing as described in [30]. The result is a depth map that reflects the geometric micro-structure of the surface, see Fig. 2b. This representation is the input to feature extraction.

From the depth map we extract a number of non-topological image descriptors in a block-based manner that serve as a baseline in our experiments. The block size is \(128 \times 128\) pixels (i.e. \(10.8 \times 10.8\) mm) and the step size between two blocks is 16 pixels (1.35 mm). The baseline features include: MPEG-7 Edge Histogram (EH) [11], Dense SIFT (DSIFT) [17], Local Binary Patterns (LBP) [18], Histogram of Oriented Gradients (HOG) [6], Gray-Level Co-occurrence Matrix (GLCM) [10], Global Histogram Shape (GHS), Spatial Depth Distribution (SDD), as well as manually modified enhanced versions of GHS and SDD (short EGHS and ESDD) that apply additional enhancements to the depth map described in [30].

Additionally to the baseline descriptors, we extract persistent homology descriptors in the same block-wise manner. For each patch, we compute a persistence diagram and derive the 12-dimensional aggregated descriptor (PD_AGG) as described in Sect. 2. Additionally, we extract persistence images (PIs) for different resolutions (8, 16, 32, 64) and standard deviations (0.00025, 0.0005, 0.001, 0.002) with and without weighting (see Sect. 2).

Alternatively, we first extract Completed LBP (CLBP) features [9] from the depth map as proposed in [15, 23] and then extract PD_AGG and PIs from the CLBP_S and CLBP_M maps.

For the discrimination of different surface topographies we employ supervised machine learning. All employed descriptors are represented by numerical vectors of fixed dimension and are thus suitable for statistical classification. As mentioned above, the class priors in our dataset are imbalanced. Skewed datasets pose problems to most classification techniques and often yield suboptimal models as one class dominates the other classes. A classifier expecially designed for imbalanced datasets is Random Undersampling Boosting (RUSBoost) [25]. RusBoost builds upon AdaBoost [8] which is an ensemble method that combines the weighted decisions of weak classifiers to obtain a final decision for a given input sample. RUSBoost extends this concept by a data sampling strategy that enforces similar class priors. During each training iteration the majority class in the training set is undersampled in a random fashion to balance the resulting class priors. In this manner, the weak classifiers can be learned from balanced datasets without being biased from the skewed class distribution.

For training the RUSBoost classifier we split the entire dataset into independent training and evaluation sets. The training set contains image patches from scans 1 and 2 from the dataset. Scans 3 and 4 make up the evaluation set. From the training set we randomly select 50 % of the blocks from class 1 (2962 blocks) and 30 % from class 2 (7592 blocks). From this subset of 9654 samples we train the RUSBoost classifier. For training we apply 5-fold cross-validation to estimate suitable classifier parameters (primarily the number of weak classifiers of the ensemble). The best parameters are used to train the classifier on the entire training set. The trained classifier is finally applied to the independent evaluation set of 27192 patches.

As a performance measure we employ the Dice Similarity Coefficient (DSC). DSC measures the mutual overlap between an automatic labeling X of an image and a manual (ground truth) labeling Y:

$$ \text {DSC}(X,Y)=\frac{2 |X \cap Y|}{|X|+|Y|}. $$

DSC is between 0 and 1 where 1 means a perfect segmentation.

Each classification experiment is repeated 10 times with 10 different randomly selected subsets from the training set to reduce the dependency from the training data. From the 10 resulting DSC values we provide median and standard deviation as the final performance measures.

Aside from quantitative evaluations we investigate the following questions:

  • Can persistent homology descriptors outperform descriptors like HOG, SIFT, and GLCM for surface classification?

  • How does aggregation of the PD (PD_AGG) influence performance compared to non-aggregated representations like PI?

  • Is CLBP a suitable input representation for persistent homology descriptors?

  • How sensitive is PI to its parameters (resolution, sigma, weighting)?

  • Do persistent homology descriptors add beneficial or even necessary information to the baseline descriptors in our classification task?

The experiment was implemented in Matlab. Most of the descriptors were extracted with VLFeat library [26], except PD_AGG and PI. We compute persistence intervals of the images using CAPD::RedHom library [13, 14] with the PHAT [2, 3] algorithm for persistence homology.

4 Results

We start our evaluation with the aggregated descriptor PD_AGG. The descriptor applied to our surfaces yields a DSC of \(0.6528\pm 0.0118\) and represents a first baseline for further comparisons. Next, we apply PI with different resolutions, sigmas with and without weighting. Results are summarized in Table 1. All results for PI outperform that of PD_AGG. We assume the reason is that PD_AGG neglects the information about the points’ localization, which is preserved in PI. The best result for PI is a DSC of \(0.7335\pm 0.0024\) without weighting. The difference between the best weighting and no weighting result is statistically significantFootnote 1 with \(p-value=0.006\). This result is surprising as it is contrary to the results of [1] where artificial datasets were used for evaluation. Results in Table 1 further show that PI has low sensitivity to different resolutions and sigmas.

Table 1. DSC for PI descriptors depending on the sigma of the Gaussian function (\(\sigma \)) and resolution (res). Bold represents the best results for PI with and without weighting.

Next, we evaluate the performance of PD_AGG and PI with CLBP as input representation, see Table 2. The best result for PD_AGG (\(0.6874\pm 0.0030\)) is obtained for the rotation invariant CLBP maps with radius 5 and number of samples 16. This improvement is statistically significant, with \(p-value=0.002\) (compared to the PD_AGG without CLBP). For PI we do not observe an improvement. This was confirmed by further experiments, where we combined PI obtained for the original depth map with PI on CLBP maps. The resulting DSC equals \(0.7178\pm 0.0034\). This shows not only that CLBP brings no additional information for PI, but further indicates that it can even be confusing for the classifier. The expressiveness of PI seems to be at a level where CLBP is not able to add additional information. Whereas PD_AGG is less expressive and thus benefits from the additional processing.

Table 2. DSC for PD_AGG and PI descriptors extracted from the CLBP_S and CLBP_M maps. We consider two encodings for CLBP: rotation invariant uniform (riu2) and rotation invariant (ri) and vary radius r and the number of samples n. Bold numbers represent the best results for PD_AGG and PI.

As a next step we investigate which locations of PI are the most important ones for classification. For this purpose we computed Gini importance measure for each location of the PI, see Fig. 3a. The most important pixels are located in the middle of the PI. It is worth noting that there are only few very important pixels, while the others are more than 10 time less important. Moreover, there are few important pixels near to the center of the diagonal. To get a more complete picture, we compute the Fisher discriminant for each location of the PI, see Fig. 3b. The result is to a large degree consistent with the Gini measure and confirms our observation.

Fig. 3.
figure 3

Importance of the PI’s pixels obtained with Gini importance measure and Fisher discriminant.

Table 3. DSC for baseline descriptors (B) and their combination with PD_AGG and PI descriptors (B + PD_AGG and B + PI, respectively). Asterisks (\(^*\)) correspond to \(p-values < 0.01\) when comparing B to B + PD_AGG and B + PD_AGG to B + PI.

Finally, we investigate the performance of topological vs. non-topological descriptors and their combinations. The DSC for baseline descriptors and for their combination with PD_AGG and PI are presented in Table 3. Our experiments show that both topological descriptors contribute additional valuable information to the baseline descriptors and improve the classification accuracy. All combinations with PD_AGG are significantly better than the baseline itself. Moreover, PI works significantly better than PD_AGG with all of the baseline descriptors (except for GHS, GHS+SDD, EGHS+ESDD where the improvement is not significant).

5 Conclusion

We have presented an investigation of topological descriptors for 3D surface analysis. Our major conclusions are: (i) the aggregation of persistence diagrams removes important information which can be retained by using PI descriptors, (ii) PIs are expressive and robust descriptors that are well-suited to include topological information into ML pipelines, and (iii) topological descriptors are complementary to traditional image descriptors and represent necessary information to obtain peak performance in 3D surface classification. Furthermore, we observed that short intervals in the PD contribute more to classification accuracy than expected. This will be subject to future research.