1 Introduction

In various domains such as chemo- and bioinformatics, or social network analysis large amounts of graph structured data is becoming increasingly prevalent. Classification of these graphs remains a challenge as most graph kernels either do not scale to large data sets or are not applicable to all types of graphs. In the following we briefly summarize related work before discussing our recent progress in the development of efficient and expressive graphs kernels.

1.1 Related Work

In recent years, various graph kernels have been proposed. Gärtner et al. [5] and Kashima et al. [8] simultaneously developed graph kernels based on random walks, which count the number of walks two graphs have in common. Since then, random walk kernels have been studied intensively, see, e.g., [7, 10, 13, 19, 21]. Kernels based on shortest paths were introduced by Borgwardt et al. [1] and are computed by performing 1-step walks on the transformed input graphs, where edges are annotated with shortest-path lengths. A drawback of the approaches mentioned above is their high computational cost. Therefore, a different line of research focuses particularly on scalable graph kernels. These kernels are typically computed by explicit feature maps, see, e.g., [17, 18]. This allows to bypass the computation of a gram matrix of quadratic size by applying fast linear classifiers [2]. Moreover, graph kernels using assignments have been proposed [4], and were recently applied to geometric embeddings of graphs [6].

2 Recent Progress in the Design of Graph Kernels

We give an overview of our recent progress in the development of scalable and expressive graph kernels.

2.1 Hash Graph Kernels

In areas such as chemo- or bioinformatics edges and vertices of graphs are often annotated with real-valued information, e.g., physical measurements. It has been shown that these attributes can boost classification accuracies [1, 3, 9]. Previous graph kernels that can take these attributes into account are relatively slow and employ the kernel trick [1, 3, 9, 15]. Therefore, these approaches do not scale to large graphs and data sets. In order to overcome this, we introduced the hash graph kernel framework in [14]. The idea is to iteratively turn the continuous attributes of a graph into discrete labels using randomized hash functions. This allows to apply fast explicit graph feature maps, e.g., [17], which are limited to discrete annotations. In each iteration we sample new hash functions and compute the feature map. Finally, the feature maps of all iterations are combined into one feature map. In order to obtain a meaningful similarity between attributes in \(\mathbb {R}^d\), we require that the probability of collision \(\Pr [h_1(x) = h_2(y)]\) of two independently chosen random hash functions \(h_1, h_2 :\mathbb {R}^d \rightarrow \mathbb {N}\) equals an adequate kernel on \(\mathbb {R}^d\). Equipped with such a hash function, we derived approximation results for several state-of-the-art kernels which can handle continuous information. Moreover, we derived a variant of the Weisfeiler-Lehman subtree kernel which can handle continuous attributes.

Our extensive experimental study showed that instances of the hash graph kernel framework achieve state-of-the-art classification accuracies while being orders of magnitudes faster than kernels that were specifically designed to handle continuous information.

2.2 Explicit Graph Feature Maps

Explicit feature maps of kernels for continuous vectorial data are known for many popular kernels like the Gaussian kernel [16] and are heavily applied in practice. These techniques cannot be used to obtain approximation guarantees in the hash graph kernel framework. Therefore, in a different line of work, we developed explicit feature maps with the goal to lift the known approximation results for kernels on continuous data to kernels for graphs annotated with continuous data [11]. More specifically, we investigated how general convolution kernels are composed from base kernels and how to construct corresponding feature maps. We applied our results to widely used graph kernels and analyzed for which kernels and graph properties computation by explicit feature maps is feasible and actually more efficient. We derived approximative, explicit feature maps for state-of-the-art kernels supporting real-valued attributes. Empirically we observed that for graph kernels like GraphHopper [3] and Graph Invariant [15] approximative explicit feature maps achieve a classification accuracy close to the exact methods based on the kernel trick, but required only a fraction of their running time. For the shortest-path kernel [1] on the other hand the approach fails in accordance to our theoretical analysis.

Moreover, we investigated the benefits of employing the kernel trick when the number of features used by a kernel is very large [10, 11]. We derived feature maps for random walk and subgraph kernels, and applied them to real-world graphs with discrete labels. Experimentally we observed a phase transition when comparing running time with respect to label diversity, walk lengths and subgraph size, respectively, confirming our theoretical analysis.

2.3 Optimal Assignment Kernels

For non-vectorial data, Fröhlich et al. [4] proposed kernels for graphs derived from an optimal assignment between their vertices, where vertex attributes are compared by a base kernel. However, it was shown that the resulting similarity measure is not necessarily a valid kernel [20, 21]. Hence, in [12], we studied optimal assignment kernels in more detail and investigated which base kernels lead to valid kernels. We characterized a specific class of kernels and showed that it is equivalent to the kernels obtained from a hierarchical partition of their domain. When such kernels are used as base kernel the optimal assignment (i) yields a valid kernel; and (ii) can be computed in linear time by histogram intersection given the hierarchy. We demonstrated the versatility of our results by deriving novel graph kernels based on optimal assignments, which are shown to improve over their convolution-based counterparts. In particular, we proposed the Weisfeiler-Lehman optimal assignment kernel, which performs favorable compared to state-of-the-art graph kernels on a wide range of data sets.

3 Conclusion

We gave an overview about our recent progress in kernel-based graph classification. Our results show that explicit graph feature maps can provide an efficient computational alternative for many known graph kernels and practical applications. This is the case for kernels supporting graphs with continuous attributes and for those limited to discrete labels, even when the number of features is very large. Assignment kernels, on the other hand, are computed by histogram intersection and thereby again employ the kernel trick. This suggests to study the application of non-linear kernels to explicit graph feature maps in more detail as future work.