Abstract
Repeats form a major class of sequence in genomes with implications for functional genomics and practical problems. Their detection and analysis pose a number of challenges in genomic sequence analysis, especially if the genome is not completely sequenced. The most abundant and evolutionary active forms of repeats are found in the form of families of long similar sequences. We present a novel method for repeat family detection and characterization in cases where the target genome sequence is not completely known. Therefore we first establish the sequence graph, a compacted version of sparse de Bruijn graphs. Using appropriate analysis of the structure of this graph and its connected components after local modifications, we are able to devise two algorithms for repeat family detection. The applicability of the methods is shown for both simulated and real genomic data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bao, Z., Eddy, S.R.: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002)
Bokhari, S.H., Sauer, J.R.: A parallel graph decomposition algorithm for DNA sequencing with nanopores. Bioinformatics 21(7), 889–896 (2005)
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)
Chaisson, M., Pevzner, P., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)
Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008)
Diestel, R.: Graph Theory, 3rd edn. Graduate Texts in Mathematics, vol. 173. Springer, Heidelberg (2005)
Hall, A.E., Fiebig, A., Preuss, D.: Beyond the arabidopsis genome: Opportunities for comparative genomics. Plant Physiol. 129, 1439–1447 (2002)
Heath, L.S., Pati, A.: Genomic signatures in de Bruijn chains. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 216–227. Springer, Heidelberg (2007)
Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)
Jelinek, W.R., Toomey, T.P., Leinwald, L., Duncan, C.H., Biro, P.A., Choudary, P.V., Weissman, S.M., Rubin, C.M., Houck, C.M., Deininger, P.L., Schmid, C.W.: Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. USA 77(3), 1398–1402 (1980)
Mahillon, J., Chandler, M.: Insertion sequences. Microbiol. Mol. Biol. Rev. 62(3), 725–774 (1998)
Myers, E.W.: The fragment assembly string graphs. Bioinformatics 21, ii79–ii85 (2005)
Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: Proceedings of RECOMB 2004, pp. 213–222 (March 2004)
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98(17), 9748–9753 (2001)
Luning Prak, E.T., Kazazian Jr., H.H.: Mobile elements and the human genome. Nature Rev. 1, 134–144 (2000)
Amgarten Quitzau, J.A., Stoye, J.: A space efficient representation for sparse de Bruijn subgraphs. Report, Technische Fakultät der Universität Bielefeld, Abteilung Informationstechnik (2008), http://bieson.ub.uni-bielefeld.de/frontdoor.php?source_opus=1308
Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346 (2004)
Setubal, J.C., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Publishing (1997)
Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)
Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. Proc. Natl. Acad. Sci. USA 102(5), 1285–1290 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amgarten Quitzau, J.A., Stoye, J. (2008). Detecting Repeat Families in Incompletely Sequenced Genomes. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-87361-7_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)