Abstract
Scaffolding is the final step in assembling Next Generation Sequencing data, in which pre-assembled contiguous regions (“contigs”) are oriented and ordered using information that links them (for example, mapping of paired-end reads). As the genome of some species is highly repetitive, we allow placing some contigs multiple times, thereby generalizing established computational models for this problem. We study the subsequent problems induced by the translation of solutions of the model back to actual sequences, proposing models and analyzing the complexity of the resulting computational problems. We find both polynomial-time and \(\mathcal {NP}\)-hard special cases like planarity or bounded degree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Solution graphs differ from scaffold graphs in that they might not abide by the condition that m(uv) equals the smaller of the multiplicities of the contig edges incident with u and v.
- 2.
The “Exponential Time Hypothesis” (ETH) states that boolean satisfiability (SAT) cannot be decided in \(2^{o(n)}\) time, where n is the number of variables in the formula.
References
Phillippy, A.M.: New advances in sequence assembly. Genome Res. 27(5), xi–xiii (2017)
Treangen, T.J., Salzberg, S.L.: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13(1), 36–46 (2012)
Tang, H.: Genome assembly, rearrangement, and repeats. Chem. Rev. 107(8), 3391–3406 (2007)
Lerat, E.: Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104(6), 520–533 (2010)
Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 16(10), S11 (2015)
Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11, 345 (2010)
Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)
Donmez, N., Brudno, M.L.: SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428–434 (2013)
Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J., Arvestad, L.: BESST - efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15(1), 281 (2014)
Cao, M.D., Nguyen, S.H., Ganesamoorthy, D., Elliott, A.G., Cooper, M.A., Coin, L.J.M.: Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun. 8, 14515 (2017)
Chateau, A., Giroudeau, R.: A complexity and approximation framework for the maximization scaffolding problem. Theoret. Comput. Sci. 595, 92–106 (2015)
Weller, M., Chateau, A., Giroudeau, R.: Exact approaches for scaffolding. BMC Bioinform. 16(Suppl. 14), S2 (2015)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)
Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of 12th CCC, pp. 262–273 (1997)
Dinur, I., Safra, S.: On the hardness of approximation minimum vertex cover. Ann. Math. 162(1), 439–485 (2005)
Khot, S., Regev, O.: Vertex cover might be hard to approximate to within 2-epsilon. J. Comput. Syst. Sci. 74(3), 335–349 (2008)
Weller, M., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: On making directed graphs transitive. J. Comput. Syst. Sci. 78(2), 559–574 (2012)
Acknowledgments
This work was supported by the Institut de Biologie Computationnelle (ANR Projet Investissements d’Avenir en bioinformatique IBC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Weller, M., Chateau, A., Giroudeau, R. (2017). On the Linearization of Scaffolds Sharing Repeated Contigs. In: Gao, X., Du, H., Han, M. (eds) Combinatorial Optimization and Applications. COCOA 2017. Lecture Notes in Computer Science(), vol 10628. Springer, Cham. https://doi.org/10.1007/978-3-319-71147-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-71147-8_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71146-1
Online ISBN: 978-3-319-71147-8
eBook Packages: Computer ScienceComputer Science (R0)