Skip to main content

An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6508))

Abstract

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2log2(|V|)) time.

In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a \(\Theta(p(|V|+|E|)\log(|V|) + (d_{max}p)^3 )\) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p =  max {|{v | d in (v) − d out (v) > 0}|, |{ v | d in (v) − d out (v) < 0}|} and d max  =  max { |d in (v) − d out (v)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V| lies between 0.08% and 0.13% with 95% probability.

Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. We also present a Θ((|V| + |E|)log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C.e.a.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  2. Craig Venter, J., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J.e.: The sequence of the human genome. Science 291, 1304–1351 (2001)

    Article  Google Scholar 

  3. Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome research 18, 821–829 (2008)

    Article  Google Scholar 

  4. Pevzner, P.A., Tang, H., Waterman, M.S.: An eulerian path approach to dna fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98, 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Myers, E.W.: The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005)

    Google Scholar 

  6. Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kundeti, V., Rajasekaran, S., Dinh, H. (2010). An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs. In: Wu, W., Daescu, O. (eds) Combinatorial Optimization and Applications. COCOA 2010. Lecture Notes in Computer Science, vol 6508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17458-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17458-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17457-5

  • Online ISBN: 978-3-642-17458-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics