Recent Advances in Kernel-Based Graph Classification

Kriege, Nils M.; Morris, Christopher

doi:10.1007/978-3-319-71273-4_37

Nils M. Kriege²² &
Christopher Morris²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10536))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3089 Accesses

Abstract

We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.

You have full access to this open access chapter, Download conference paper PDF

A survey on graph kernels

Article Open access 14 January 2020

An Efficient Entropy-Based Graph Kernel

Extending Local Features with Contextual Information in Graph Kernels

1 Introduction

In various domains such as chemo- and bioinformatics, or social network analysis large amounts of graph structured data is becoming increasingly prevalent. Classification of these graphs remains a challenge as most graph kernels either do not scale to large data sets or are not applicable to all types of graphs. In the following we briefly summarize related work before discussing our recent progress in the development of efficient and expressive graphs kernels.

1.1 Related Work

In recent years, various graph kernels have been proposed. Gärtner et al. [5] and Kashima et al. [8] simultaneously developed graph kernels based on random walks, which count the number of walks two graphs have in common. Since then, random walk kernels have been studied intensively, see, e.g., [7, 10, 13, 19, 21]. Kernels based on shortest paths were introduced by Borgwardt et al. [1] and are computed by performing 1-step walks on the transformed input graphs, where edges are annotated with shortest-path lengths. A drawback of the approaches mentioned above is their high computational cost. Therefore, a different line of research focuses particularly on scalable graph kernels. These kernels are typically computed by explicit feature maps, see, e.g., [17, 18]. This allows to bypass the computation of a gram matrix of quadratic size by applying fast linear classifiers [2]. Moreover, graph kernels using assignments have been proposed [4], and were recently applied to geometric embeddings of graphs [6].

2 Recent Progress in the Design of Graph Kernels

We give an overview of our recent progress in the development of scalable and expressive graph kernels.

2.1 Hash Graph Kernels

In areas such as chemo- or bioinformatics edges and vertices of graphs are often annotated with real-valued information, e.g., physical measurements. It has been shown that these attributes can boost classification accuracies [1, 3, 9]. Previous graph kernels that can take these attributes into account are relatively slow and employ the kernel trick [1, 3, 9, 15]. Therefore, these approaches do not scale to large graphs and data sets. In order to overcome this, we introduced the hash graph kernel framework in [14]. The idea is to iteratively turn the continuous attributes of a graph into discrete labels using randomized hash functions. This allows to apply fast explicit graph feature maps, e.g., [17], which are limited to discrete annotations. In each iteration we sample new hash functions and compute the feature map. Finally, the feature maps of all iterations are combined into one feature map. In order to obtain a meaningful similarity between attributes in \(\mathbb {R}^d\), we require that the probability of collision \(\Pr [h_1(x) = h_2(y)]\) of two independently chosen random hash functions \(h_1, h_2 :\mathbb {R}^d \rightarrow \mathbb {N}\) equals an adequate kernel on \(\mathbb {R}^d\). Equipped with such a hash function, we derived approximation results for several state-of-the-art kernels which can handle continuous information. Moreover, we derived a variant of the Weisfeiler-Lehman subtree kernel which can handle continuous attributes.

Our extensive experimental study showed that instances of the hash graph kernel framework achieve state-of-the-art classification accuracies while being orders of magnitudes faster than kernels that were specifically designed to handle continuous information.

2.2 Explicit Graph Feature Maps

Explicit feature maps of kernels for continuous vectorial data are known for many popular kernels like the Gaussian kernel [16] and are heavily applied in practice. These techniques cannot be used to obtain approximation guarantees in the hash graph kernel framework. Therefore, in a different line of work, we developed explicit feature maps with the goal to lift the known approximation results for kernels on continuous data to kernels for graphs annotated with continuous data [11]. More specifically, we investigated how general convolution kernels are composed from base kernels and how to construct corresponding feature maps. We applied our results to widely used graph kernels and analyzed for which kernels and graph properties computation by explicit feature maps is feasible and actually more efficient. We derived approximative, explicit feature maps for state-of-the-art kernels supporting real-valued attributes. Empirically we observed that for graph kernels like GraphHopper [3] and Graph Invariant [15] approximative explicit feature maps achieve a classification accuracy close to the exact methods based on the kernel trick, but required only a fraction of their running time. For the shortest-path kernel [1] on the other hand the approach fails in accordance to our theoretical analysis.

Moreover, we investigated the benefits of employing the kernel trick when the number of features used by a kernel is very large [10, 11]. We derived feature maps for random walk and subgraph kernels, and applied them to real-world graphs with discrete labels. Experimentally we observed a phase transition when comparing running time with respect to label diversity, walk lengths and subgraph size, respectively, confirming our theoretical analysis.

2.3 Optimal Assignment Kernels

For non-vectorial data, Fröhlich et al. [4] proposed kernels for graphs derived from an optimal assignment between their vertices, where vertex attributes are compared by a base kernel. However, it was shown that the resulting similarity measure is not necessarily a valid kernel [20, 21]. Hence, in [12], we studied optimal assignment kernels in more detail and investigated which base kernels lead to valid kernels. We characterized a specific class of kernels and showed that it is equivalent to the kernels obtained from a hierarchical partition of their domain. When such kernels are used as base kernel the optimal assignment (i) yields a valid kernel; and (ii) can be computed in linear time by histogram intersection given the hierarchy. We demonstrated the versatility of our results by deriving novel graph kernels based on optimal assignments, which are shown to improve over their convolution-based counterparts. In particular, we proposed the Weisfeiler-Lehman optimal assignment kernel, which performs favorable compared to state-of-the-art graph kernels on a wide range of data sets.

3 Conclusion

We gave an overview about our recent progress in kernel-based graph classification. Our results show that explicit graph feature maps can provide an efficient computational alternative for many known graph kernels and practical applications. This is the case for kernels supporting graphs with continuous attributes and for those limited to discrete labels, even when the number of features is very large. Assignment kernels, on the other hand, are computed by histogram intersection and thereby again employ the kernel trick. This suggests to study the application of non-linear kernels to explicit graph feature maps in more detail as future work.

References

Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In: IEEE International Conference on Data Mining, pp. 74–81 (2005)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Feragen, A., Kasenburg, N., Petersen, J., Bruijne, M.D., Borgwardt, K.: Scalable kernels for graphs with continuous attributes. In: Advances in Neural Information Processing Systems, pp. 216–224 (2013)
Google Scholar
Fröhlich, H., Wegner, J.K., Sieker, F., Zell, A.: Optimal assignment kernels for attributed molecular graphs. In: 22nd International Conference on Machine Learning, pp. 225–232 (2005)
Google Scholar
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45167-9_11
Chapter Google Scholar
Johansson, F.D., Dubhashi, D.: Learning with similarity functions on graphs using matchings of geometric embeddings. In: 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 467–476 (2015)
Google Scholar
Kang, U., Tong, H., Sun, J.: Fast random walk graph kernel. In: SIAM International Conference on Data Mining, pp. 828–838 (2012)
Google Scholar
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: 20th International Conference on Machine Learning, pp. 321–328 (2003)
Google Scholar
Kriege, N., Mutzel, P.: Subgraph matching kernels for attributed graphs. In: 29th International Conference on Machine Learning (2012)
Google Scholar
Kriege, N., Neumann, M., Kersting, K., Mutzel, M.: Explicit versus implicit graph feature maps: a computational phase transition for walk kernels. In: IEEE International Conference on Data Mining, pp. 881–886 (2014)
Google Scholar
Kriege, N.M., Neumann, M., Morris, C., Kersting, K., Mutzel, P.: A unifying view of explicit and implicit feature maps for structured data: Systematic studies of graph kernels. CoRR abs/1703.00676 (2017). http://arxiv.org/abs/1703.00676
Kriege, N.M., Giscard, P.-L., Wilson, R.C.: On valid optimal assignment kernels and applications to graph classification. In: Advances in Neural Information Processing Systems, pp. 1615–1623 (2016)
Google Scholar
Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P.: Extensions of marginalized graph kernels. In: Twenty-First International Conference on Machine Learning, pp. 552–559 (2004)
Google Scholar
Morris, C., Kriege, N.M., Kersting, K., Mutzel, P.: Faster kernel for graphs with continuous attributes via hashing. In: IEEE International Conference on Data Mining, pp. 1095–1100 (2016)
Google Scholar
Orsini, F., Frasconi, P., De Raedt, L.: Graph invariant kernels. In: Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 3756–3762 (2015)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)
Google Scholar
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011)
MathSciNet MATH Google Scholar
Shervashidze, N., Vishwanathan, S.V.N., Petri, T.H., Mehlhorn, K., Borgwardt, K.M.: Efficient graphlet kernels for large graph comparison. In: Twelfth International Conference on Artificial Intelligence and Statistics, pp. 488–495 (2009)
Google Scholar
Sugiyama, M., Borgwardt, K.M.: Halting in random walk kernels. In: Advances in Neural Information Processing Systems, pp. 1639–1647 (2015)
Google Scholar
Vert, J.P.: The optimal assignment kernel is not positive definite. CoRR abs/0801.4061 (2008). http://arxiv.org/abs/0801.4061
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
MathSciNet MATH Google Scholar

Download references

Acknowledgement

We would like to thank the co-authors of our publications [10,11,12, 14]. This research was supported by the German Science Foundation (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Data Analysis”, project A6.

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, Dortmund, Germany
Nils M. Kriege & Christopher Morris

Authors

Nils M. Kriege
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Morris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nils M. Kriege .

Editor information

Editors and Affiliations

Google Research, Google Inc., Zurich, Switzerland
Yasemin Altun
NASA Ames Research Center, Mountain View, USA
Kamalika Das
Oath, Sunnyvale, USA
Taneli Mielikäinen
Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
Donato Malerba
Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Jerzy Stefanowski
Laboratoire d’ Informatique (LIX), École Polytechnique, Palaiseau, France
Jesse Read
Department of Computer Science, Stanford University, Stanford, USA
Marinka Žitnik
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kriege, N.M., Morris, C. (2017). Recent Advances in Kernel-Based Graph Classification. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-71273-4_37
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Recent Advances in Kernel-Based Graph Classification

Abstract

Similar content being viewed by others

A survey on graph kernels

An Efficient Entropy-Based Graph Kernel

Extending Local Features with Contextual Information in Graph Kernels

1 Introduction

1.1 Related Work

2 Recent Progress in the Design of Graph Kernels

2.1 Hash Graph Kernels

2.2 Explicit Graph Feature Maps

2.3 Optimal Assignment Kernels

3 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Recent Advances in Kernel-Based Graph Classification

Abstract

Similar content being viewed by others

A survey on graph kernels

An Efficient Entropy-Based Graph Kernel

Extending Local Features with Contextual Information in Graph Kernels

1 Introduction

1.1 Related Work

2 Recent Progress in the Design of Graph Kernels

2.1 Hash Graph Kernels

2.2 Explicit Graph Feature Maps

2.3 Optimal Assignment Kernels

3 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation