Abstract
The colored de Bruijn graph, an extension of the de Bruijn graph, is routinely applied for variant calling, genotyping, genome assembly, and various other applications [11]. In this data structure, the edges are labeled with one or more colors from a set \(\{c_1, \dots , c_{\alpha } \}\), and are stored as a \(m \times \alpha \) matrix, where m is the number of edges. Recently, there has been a significant amount of work in developing compacted representations of this color matrix but all existing methods have focused on compressing the color matrix [3, 10, 12, 14]. In this paper, we explore the problem of recoloring the graph in order to reduce the number of colors, and thus, decrease the size of the color matrix. We show that finding the minimum number of colors needed for recoloring is not only NP-hard but also, difficult to approximate within a reasonable factor. These hardness results motivate the need for a recoloring heuristic that we present in this paper. Our results show that this heuristic is able to reduce the number of colors between one and two orders of magnitude. More specifically, when the number of colors is large (>5,000,000) the number of colors is reduced by a factor of 136 by our heuristic. An implementation of this heuristic is publicly available at https://github.com/baharpan/cosmo/tree/Recoloring.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ZPP is the complexity class of problems solvable by a randomized algorithm in expected polynomial time.
References
The 100,000 Genomes Project Protocol v3 (2017). https://doi.org/10.6084/m9.figshare.4530893.v2
Alipanahi, B., et al.: Resistome SNP calling via read colored de Bruijn graphs. In: RECOMB-Seq (2018)
Almodaresi, F., Pandey, P., Patro, R.: Rainbowfish: a succinct colored de Bruijn graph representation. In: WABI, pp. 251–256 (2017)
Belazzougui, D., Gagie, T., Mäkinen, V., Previtali, M.: Fully dynamic de Bruijn graphs. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 145–152. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46049-9_14
Bermond, J.C., Hell, P.: On even factorizations and the chromatic index of the Kautz and de Bruijn digraphs. J. Graph Theory 17(5), 647–655 (1993)
Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)
Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM (JACM) 21(2), 246–260 (1974)
Feige, U., Kilian, J.: Zero knowledge and the chromatic number. In: Conference on Computational Complexity, pp. 278–287 (1996)
Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07959-2_28
Holley, G.: Bloom filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms Mol. Biol. 11, 3 (2016)
Iqbal, Z.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44(2), 226–232 (2012)
Marcus, S.: Splitmem: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics 30, 3476–3483 (2014)
Mario, F.R.: On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC (1971)
Muggli, M.D., et al.: Succinct colored de Bruijn graphs. Bioinformatics 33, 3181–3187 (2017)
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of the Meeting on Algorithm Engineering & Expermiments, pp. 60–70 (2007)
Sánchez-Arroyo, A.: Determining the total colouring number is NP-hard. Discret. Math. 78, 315–319 (1989)
Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Alipanahi, B., Kuhnle, A., Boucher, C. (2018). Recoloring the Colored de Bruijn Graph. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds) String Processing and Information Retrieval. SPIRE 2018. Lecture Notes in Computer Science(), vol 11147. Springer, Cham. https://doi.org/10.1007/978-3-030-00479-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-00479-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00478-1
Online ISBN: 978-3-030-00479-8
eBook Packages: Computer ScienceComputer Science (R0)