Skip to main content

On the Linearization of Scaffolds Sharing Repeated Contigs

  • Conference paper
  • First Online:
Combinatorial Optimization and Applications (COCOA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10628))

Abstract

Scaffolding is the final step in assembling Next Generation Sequencing data, in which pre-assembled contiguous regions (“contigs”) are oriented and ordered using information that links them (for example, mapping of paired-end reads). As the genome of some species is highly repetitive, we allow placing some contigs multiple times, thereby generalizing established computational models for this problem. We study the subsequent problems induced by the translation of solutions of the model back to actual sequences, proposing models and analyzing the complexity of the resulting computational problems. We find both polynomial-time and \(\mathcal {NP}\)-hard special cases like planarity or bounded degree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Solution graphs differ from scaffold graphs in that they might not abide by the condition that m(uv) equals the smaller of the multiplicities of the contig edges incident with u and v.

  2. 2.

    The “Exponential Time Hypothesis” (ETH) states that boolean satisfiability (SAT) cannot be decided in \(2^{o(n)}\) time, where n is the number of variables in the formula.

References

  1. Phillippy, A.M.: New advances in sequence assembly. Genome Res. 27(5), xi–xiii (2017)

    Google Scholar 

  2. Treangen, T.J., Salzberg, S.L.: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13(1), 36–46 (2012)

    Article  Google Scholar 

  3. Tang, H.: Genome assembly, rearrangement, and repeats. Chem. Rev. 107(8), 3391–3406 (2007)

    Article  Google Scholar 

  4. Lerat, E.: Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104(6), 520–533 (2010)

    Article  Google Scholar 

  5. Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 16(10), S11 (2015)

    Article  Google Scholar 

  6. Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11, 345 (2010)

    Article  Google Scholar 

  7. Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)

    Article  Google Scholar 

  8. Donmez, N., Brudno, M.L.: SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428–434 (2013)

    Article  Google Scholar 

  9. Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J., Arvestad, L.: BESST - efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15(1), 281 (2014)

    Article  Google Scholar 

  10. Cao, M.D., Nguyen, S.H., Ganesamoorthy, D., Elliott, A.G., Cooper, M.A., Coin, L.J.M.: Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun. 8, 14515 (2017)

    Article  Google Scholar 

  11. Chateau, A., Giroudeau, R.: A complexity and approximation framework for the maximization scaffolding problem. Theoret. Comput. Sci. 595, 92–106 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  12. Weller, M., Chateau, A., Giroudeau, R.: Exact approaches for scaffolding. BMC Bioinform. 16(Suppl. 14), S2 (2015)

    Article  Google Scholar 

  13. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)

    Google Scholar 

  14. Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of 12th CCC, pp. 262–273 (1997)

    Google Scholar 

  15. Dinur, I., Safra, S.: On the hardness of approximation minimum vertex cover. Ann. Math. 162(1), 439–485 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Khot, S., Regev, O.: Vertex cover might be hard to approximate to within 2-epsilon. J. Comput. Syst. Sci. 74(3), 335–349 (2008)

    Article  MATH  Google Scholar 

  17. Weller, M., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: On making directed graphs transitive. J. Comput. Syst. Sci. 78(2), 559–574 (2012)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by the Institut de Biologie Computationnelle (ANR Projet Investissements d’Avenir en bioinformatique IBC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annie Chateau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weller, M., Chateau, A., Giroudeau, R. (2017). On the Linearization of Scaffolds Sharing Repeated Contigs. In: Gao, X., Du, H., Han, M. (eds) Combinatorial Optimization and Applications. COCOA 2017. Lecture Notes in Computer Science(), vol 10628. Springer, Cham. https://doi.org/10.1007/978-3-319-71147-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71147-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71146-1

  • Online ISBN: 978-3-319-71147-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics