Abstract
Phylogeny reconstruction on a genome scale remains computationally challenging even for closely related organisms. Here we propose an alignment-free pairwise distance measure, K r, for genomes separated by less than approximately 0.5 mismatches/nucleotide. We have implemented the computation of K r based on enhanced suffix arrays in the program kr, which is freely available from guanine.evolbio.mpg.de/kr/. The software is applied to genomes obtained from three sets of taxa: 27 primate mitochondria, eight Staphylococcus agalactiae strains, and 12 Drosophila species. Subsequent clustering of the K r values always recovers phylogenies that are similar or identical to the accepted branching order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aanensen, D.M., Spratt, B.G.: The multilocus sequence typing network: mlst.net. Nucleic Acids Res. 33(Web Server issue) , W728–W733 (2005)
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its applications to genome analysis. In: Proceedings of the second workshop on algorithms in bioinformatics. Springer, Heidelberg (2002)
Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Sciences, USA 83, 5155–5159 (1986)
Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Research 14, 693–699 (2004)
Chapus, C., Dufraigne, C., Edwards, S., Giron, A., Fertil, B., Deschavanne, P.: Exploration of phylogenetic data using a global sequence analysis method. BMC Evolutionary Biology 5, 63 (2005)
Dewey, C.N., Pachter, L.: Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum. Mol. Genet. 15(Spec. No. 1), R51–R56 (2006)
Efron, B.: Bootstrap methods: another look at the Jackknife. The Annals of Statistics 7, 1–26 (1979)
Eisen, J.A.: Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research 8, 163–167 (1998)
Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985)
Felsenstein, J.: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle (2005)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics 6, 123 (2005)
Haubold, B., Wiehe, T.: How repetitive are genomes? BMC Bioinformatics 7, 541 (2006)
Hervé, P., Delsuc, F., Lartillot, N.: Phylogenomics. Annual Review of Ecology, Evolution, and Systematics 36, 541–562 (2005)
Hudson, R.R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002)
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, vol. 3, pp. 21–132. Academic Press, New York (1969)
Kantorovitz, M.R., Robinson, G.E., Sinha, S.: A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23, i249–i255 (2007)
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal w and clustal x version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 698–710. Springer, Heidelberg (2002)
Moriyama, E.N., Gojobori, T.: Rates of synonymous substitution and base composition of nuclear genes in Drosophila. Genetics 130(4), 855–864 (1992)
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39, 4 (2007)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2007) ISBN 3-900051-07-0
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylgenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)
Tettelin, H., Masignani, V., Cieslewicz, M.J., Donati, C., Medini, D., Ward, N.L., Angiuoli, S.V., Crabtree, J., Jones, A.L., Durkin, A.S., Deboy, R.T., Davidsen, T.M., Mora, M., Scarselli, M., Margarit y Ros, I., Peterson, J.D., Hauser, C.R., Sundaram, J.P., Nelson, W.C., Madupu, R., Brinkac, L.M., Dodson, R.J., Rosovitz, M.J., Sullivan, S.A., Daugherty, S.C., Haft, D.H., Selengut, J., Gwinn, M.L., Zhou, L., Zafar, N., Khouri, H., Radune, D., Dimitrov, G., Watkins, K., O’Connor, K.J., Smith, S., Utterback, T.R., White, O., Rubens, C.E., Grandi, G., Madoff, L.C., Kasper, D.L., Telford, J.L., Wessels, M.R., Rappuoli, R., Fraser, C.M.: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ”pan-genome”. Proc. Natl. Acad. Sci. USA 102(39), 13950–13955 (2005)
Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007)
Vinga, S., Almeida, J.: Alignment-free sequence comparison—a review. Bioinformatics 19, 513–523 (2003)
Wilbur, W.J., Lipman, D.J.: Rapid similarity searches of nucleic acid and protein data banks. Proceedings of the National Academy of Sciences, USA 80, 726–730 (1983)
Yang, K., Zhang, L.: Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction. Nucleic Acids Res. 36(5), e33 (2008)
Yang, Z.: Computational Molecular Evolution. Oxford University Press, Oxford (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haubold, B., Domazet-Los̆o, M., Wiehe, T. (2008). An Alignment-Free Distance Measure for Closely Related Genomes. In: Nelson, C.E., Vialette, S. (eds) Comparative Genomics. RECOMB-CG 2008. Lecture Notes in Computer Science(), vol 5267. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87989-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-87989-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87988-6
Online ISBN: 978-3-540-87989-3
eBook Packages: Computer ScienceComputer Science (R0)