Skip to main content

Parallelizing Big De Bruijn Graph Traversal for Genome Assembly on GPU Clusters

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Included in the following conference series:

  • 3627 Accesses

Abstract

De Bruijn graph traversal is a critical step in de novo assemblers. It uses the graph structure to analyze genome sequences and is both memory space intensive and time consuming. To improve the efficiency, we develop ParaGraph, which parallelizes De Bruijn graph traversal on a cluster of GPU-equipped computer nodes. With effective vertex partitioning and fine-grained parallel algorithms, ParaGraph utilizes all cores of each CPU and GPU, all CPUs and GPUs in a computer node, and all computer nodes of a cluster. Our results show that ParaGraph is able to traverse billion-node graphs within three minutes on a cluster of six GPU-equipped computer nodes. It is an order of magnitude faster than the state-of-the-art shared memory based assemblers, and more than five times faster than the current distributed assemblers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avery, C.: Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the Hadoop Summit. Santa Clara, vol. 11, pp. 5–9 (2011)

    Google Scholar 

  2. Chikhi, R., Limasset, A., Medvedev, P.: Compacting de bruijn graphs from sequencing data quickly and in low memory. Bioinformatics 32(12), i201–i208 (2016)

    Article  Google Scholar 

  3. Li, Y., Kamousi, P., Han, F., Yang, S., Yan, X., Suri, S.: Memory efficient minimum substring partitioning. In: Proceedings of the VLDB Endowment, vol. 6, pp. 169–180. VLDB Endowment (2013)

    Google Scholar 

  4. Luo, R., et al.: Soapdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)

    Article  Google Scholar 

  5. Meng, J., Seo, S., Balaji, P., Wei, Y., Wang, B., Feng, S.: Swap-assembler 2: optimization of de novo genome assembler at extreme scale. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 195–204. IEEE (2016)

    Google Scholar 

  6. Minkin, I., Pham, S., Medvedev, P.: Twopaco: an efficient algorithm to build the compacted de bruijn graph from many complete genomes. Bioinformatics 33(24), 4024–4032 (2016)

    Google Scholar 

  7. Qiu, S., Luo, Q.: Parallelizing big de bruijn graph construction on heterogeneous processors. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1431–1441. IEEE (2017)

    Google Scholar 

  8. Yan, D., Chen, H., Cheng, J., Cai, Z., Shao, B.: Scalable de novo genome assembly using pregel. arXiv preprint arXiv:1801.04453 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Qiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, S., Feng, Z., Luo, Q. (2019). Parallelizing Big De Bruijn Graph Traversal for Genome Assembly on GPU Clusters. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18590-9_68

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18589-3

  • Online ISBN: 978-3-030-18590-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics