Skip to main content

GTED: Graph Traversal Edit Distance

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2018)

Abstract

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications.

In this paper, we give a new graph kernel which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the graph traversal edit distance is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics.

We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of an SVM classifier on a few datasets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested datasets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next generation sequencing reads. Our GTED implementation can be downloaded from http://chitsazlab.org/software/gted/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, Y., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730 (2011)

    Article  Google Scholar 

  2. Movahedi, N.S., Forouzmand, E., Chitsaz, H.: De novo co-assembly of bacterial genomes from multiple single cells. In: IEEE Conference on Bioinformatics and Biomedicine, pp. 561–565 (2012)

    Google Scholar 

  3. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012)

    Article  Google Scholar 

  4. Taghavi, Z., Movahedi, N.S., Draghici, S., Chitsaz, H.: Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities. Bioinformatics 29(19), 2395–2401 (2013)

    Article  Google Scholar 

  5. Movahedi, N.S., Embree, M., Nagarajan, H., Zengler, K., Chitsaz, H.: Efficient synergistic single-cell genome assembly. Front. Bioeng. Biotechnol. 4, 42 (2016)

    Article  Google Scholar 

  6. Hormozdiari, F., Hajirasouliha, I., McPherson, A., Eichler, E., Sahinalp, S.C.: Simultaneous structural variation discovery among multiple paired-end sequenced genomes. Genome Res. 21, 2203–2212 (2011)

    Article  Google Scholar 

  7. Mak, C.: Multigenome analysis of variation (research highlights). Nat. Biotechnol. 29, 330 (2011)

    Article  Google Scholar 

  8. Jones, S.: True colors of genome variation (research highlights). Nat. Biotechnol. 30, 158 (2012)

    Google Scholar 

  9. Inokuchi, A., Washio, T., Motoda, H.: Complete mining of frequent patterns from graphs: mining graph data. Mach. Learn. 50(3), 321–354 (2003)

    Article  Google Scholar 

  10. Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.-P.: Protein function prediction via graph kernels. Bioinformatics 21(1), 47–56 (2005)

    Article  Google Scholar 

  11. Kubinyi, H.: Drug research: myths, hype and reality. Nat. Rev. Drug Discov. 2(8), 665–668 (2003)

    Article  Google Scholar 

  12. G"artner, T.: Exponential and geometric kernels for graphs. In: NIPS 2002 Workshop on Unreal Data, Principles of Modeling Nonvectorial Data (2002)

    Google Scholar 

  13. Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)

    MathSciNet  MATH  Google Scholar 

  14. Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In Fifth IEEE International Conference on Data Mining (ICDM 2005), p. 8, November 2005

    Google Scholar 

  15. Feragen, A., Kasenburg, N., Petersen, J., de Bruijne, M., Borgwardt, K.: Scalable kernels for graphs with continuous attributes. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 216–224. Curran Associates Inc. (2013)

    Google Scholar 

  16. Kondor, R., Borgwardt, K.M.: The skew spectrum of graphs. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 496–503. ACM, New York (2008)

    Google Scholar 

  17. Kondor, R., Pan, H.: The multiscale laplacian graph kernel. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2990–2998. Curran Associates Inc. (2016)

    Google Scholar 

  18. Shervashidze, N., Vishwanathan, S.V.N., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: van Dyk, D., Welling, M. (eds.) Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009, vol. 5, pp. 488–495 (2009). PMLR

    Google Scholar 

  19. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Neumann, M., Garnett, R., Bauckhage, C., Kersting, K.: Propagation kernels: efficient graph kernels from propagated information. Mach. Learn. 102(2), 209–245 (2016)

    Article  MathSciNet  Google Scholar 

  21. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98, 9748–9753 (2001)

    Article  MathSciNet  Google Scholar 

  22. Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Res. 14(9), 1786–1796 (2004)

    Article  Google Scholar 

  23. Ronen, R., Boucher, C., Chitsaz, H., Pevzner, P.: SEQuel: improving the accuracy of genome assemblies. Bioinformatics 28(12), i188–i196 (2012). Also ISMB proceedings

    Article  Google Scholar 

  24. Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. J. Comput. Biol. 2, 275–290 (1995)

    Article  Google Scholar 

  25. Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26, 367–373 (2010)

    Article  Google Scholar 

  26. Jones, N.C., Pevzner, P.: An Introduction to Bioinformatics Algorithms. MIT press, Cambridge (2004)

    Google Scholar 

  27. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady 10(8), 707–710 (1966). Original. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)

    MathSciNet  MATH  Google Scholar 

  28. Tutte, W.T., Smith, C.A.B.: On unicursal paths in a network of degree 4. Am. Math. Mon. 48(4), 233–237 (1941)

    Article  MathSciNet  Google Scholar 

  29. van Aardenne-Ehrenfest, T., de Bruijn, N.G.: Circuits and trees in oriented linear graphs. In: Gessel, I., Rota, G.-C. (eds.) Classic Papers in Combinatorics, Modern Birkhäuser Classics, pp. 149–163. Birkhäuser, Boston (1987)

    Google Scholar 

  30. Dey, T., Hirani, A., Krishnamoorthy, B.: Optimal homologous cycles, total unimodularity, and linear programming. SIAM J. Comput. 40(4), 1026–1044 (2011)

    Article  MathSciNet  Google Scholar 

  31. Vick, J.W.: Homology Theory: An Introduction to Algebraic Topology, vol. 145. Springer, New York (1994). https://doi.org/10.1007/978-1-4612-0881-5

    Book  MATH  Google Scholar 

  32. Massey, W.: A Basic Course in Algebraic Topology, vol. 127. Springer, New York (1991)

    MATH  Google Scholar 

  33. Debnath, A.K., de Compadre, R.L.L., Debnath, G., Shusterman, A.J., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34(2), 786–797 (1991)

    Article  Google Scholar 

  34. Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 14(3), 347–375 (2008)

    Article  Google Scholar 

  35. Toivonen, H., Srinivasan, A., King, R.D., Kramer, S., Helma, C.: Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics 19(10), 1183–1193 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Chitsaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Boroojeny, A.E., Shrestha, A., Sharifi-Zarchi, A., Gallagher, S.R., Sahinalp, S.C., Chitsaz, H. (2018). GTED: Graph Traversal Edit Distance. In: Raphael, B. (eds) Research in Computational Molecular Biology. RECOMB 2018. Lecture Notes in Computer Science(), vol 10812. Springer, Cham. https://doi.org/10.1007/978-3-319-89929-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89929-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89928-2

  • Online ISBN: 978-3-319-89929-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics