Skip to main content

Detecting Repeat Families in Incompletely Sequenced Genomes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Abstract

Repeats form a major class of sequence in genomes with implications for functional genomics and practical problems. Their detection and analysis pose a number of challenges in genomic sequence analysis, especially if the genome is not completely sequenced. The most abundant and evolutionary active forms of repeats are found in the form of families of long similar sequences. We present a novel method for repeat family detection and characterization in cases where the target genome sequence is not completely known. Therefore we first establish the sequence graph, a compacted version of sparse de Bruijn graphs. Using appropriate analysis of the structure of this graph and its connected components after local modifications, we are able to devise two algorithms for repeat family detection. The applicability of the methods is shown for both simulated and real genomic data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bao, Z., Eddy, S.R.: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002)

    Article  Google Scholar 

  2. Bokhari, S.H., Sauer, J.R.: A parallel graph decomposition algorithm for DNA sequencing with nanopores. Bioinformatics 21(7), 889–896 (2005)

    Article  Google Scholar 

  3. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)

    Article  Google Scholar 

  4. Chaisson, M., Pevzner, P., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)

    Article  Google Scholar 

  5. Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008)

    Article  Google Scholar 

  6. Diestel, R.: Graph Theory, 3rd edn. Graduate Texts in Mathematics, vol. 173. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  7. Hall, A.E., Fiebig, A., Preuss, D.: Beyond the arabidopsis genome: Opportunities for comparative genomics. Plant Physiol. 129, 1439–1447 (2002)

    Article  Google Scholar 

  8. Heath, L.S., Pati, A.: Genomic signatures in de Bruijn chains. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 216–227. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)

    Article  Google Scholar 

  10. Jelinek, W.R., Toomey, T.P., Leinwald, L., Duncan, C.H., Biro, P.A., Choudary, P.V., Weissman, S.M., Rubin, C.M., Houck, C.M., Deininger, P.L., Schmid, C.W.: Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. USA 77(3), 1398–1402 (1980)

    Article  Google Scholar 

  11. Mahillon, J., Chandler, M.: Insertion sequences. Microbiol. Mol. Biol. Rev. 62(3), 725–774 (1998)

    Google Scholar 

  12. Myers, E.W.: The fragment assembly string graphs. Bioinformatics 21, ii79–ii85 (2005)

    Google Scholar 

  13. Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: Proceedings of RECOMB 2004, pp. 213–222 (March 2004)

    Google Scholar 

  14. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98(17), 9748–9753 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  15. Luning Prak, E.T., Kazazian Jr., H.H.: Mobile elements and the human genome. Nature Rev. 1, 134–144 (2000)

    Article  Google Scholar 

  16. Amgarten Quitzau, J.A., Stoye, J.: A space efficient representation for sparse de Bruijn subgraphs. Report, Technische Fakultät der Universität Bielefeld, Abteilung Informationstechnik (2008), http://bieson.ub.uni-bielefeld.de/frontdoor.php?source_opus=1308

  17. Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346 (2004)

    Article  Google Scholar 

  18. Setubal, J.C., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Publishing (1997)

    Google Scholar 

  19. Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)

    Article  Google Scholar 

  20. Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. Proc. Natl. Acad. Sci. USA 102(5), 1285–1290 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amgarten Quitzau, J.A., Stoye, J. (2008). Detecting Repeat Families in Incompletely Sequenced Genomes. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics