Skip to main content

Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9838))

Included in the following conference series:

Abstract

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for \(k\ge 32\). Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools by up to a factor of 4 for large genome data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The tool is named Gerbil because of its modest resource requirements, which it has in common with the name-giving mammal.

References

  1. Chikhi, R., Medvedev, P.: Informed and automated \(k\)-mer size selection for genome assembly. Bioinformatics 30(1), 31–37 (2014)

    Article  Google Scholar 

  2. Deorowicz, S., Debudaj-Grabysz, A., Grabowski, S.: Disk-based \(k\)-mer counting on a PC. BMC Bioinform. 14(1), 1–12 (2013)

    Article  Google Scholar 

  3. Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal \(k\)-mer counting. Bioinformatics 31(10), 1569–1576 (2015)

    Article  Google Scholar 

  4. Li, Y., et al.: MSPKmerCounter: a fast and memory efficient approach for\(k\)-mer counting (2015). arXiv preprint arXiv:1505.06550

  5. Marçais, G., Kingsford, C.: A fast, lock-free approach for efficient parallel counting of occurrences of \(k\)-mers. Bioinformatics 27(6), 764–770 (2011)

    Article  Google Scholar 

  6. Melsted, P., Pritchard, J.K.: Efficient counting of \(k\)-mers in DNA sequences using a bloom filter. BMC Bioinform. 12(1), 1–7 (2011)

    Article  Google Scholar 

  7. Rizk, G., Lavenier, D., Chikhi, R.: DSK: \(k\)-mer counting with very low memory usage. Bioinformatics 29(5), 652–653 (2013)

    Article  Google Scholar 

  8. Roberts, M., Hayes, W., Hunt, B.R., Mount, S.M., Yorke, J.A.: Reducing storage requirements for biological sequence comparison. Bioinformatics 20(18), 3363–3369 (2004)

    Article  Google Scholar 

  9. Roberts, M., Hunt, B.R., Yorke, J.A., Bolanos, R.A., Delcher, A.L.: A preprocessor for shotgun assembly of large genomes. J. Comput. Biol. 11(4), 734–752 (2004)

    Article  Google Scholar 

  10. Roy, R.S., Bhattacharya, D., Schliep, A.: Turtle: identifying frequent \(k\)-mers with cache-efficient algorithms. Bioinformatics 30(14), 1950–1957 (2014)

    Article  Google Scholar 

  11. Sameith, K., Roscito, J.G., Hiller, M.: Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Briefings in Bioinformatics (2016). http://dx.org/10.1093/bib/bbw003

  12. Suzuki, S., Kakuta, M., Ishida, T., Akiyama, Y.: Accelerating identification of frequent \(k\)-mers in DNA sequences with GPU. In: GTC 2014 (2014)

    Google Scholar 

  13. Xavier, B.B., Sabirova, J., Pieter, M., Hernalsteens, J.P., de Greve, H., Goossens, H., Malhotra-Kumar, S.: Employing whole genome mapping for optimal de novo assembly of bacterial genomes. BMC Res. Notes 7(1), 1–4 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Rechner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Erbert, M., Rechner, S., Müller-Hannemann, M. (2016). Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43681-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43680-7

  • Online ISBN: 978-3-319-43681-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics