Abstract
During the course of evolution, an organism’s genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these “hidden” breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons.
We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm.
We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angiuoli, S.V., Salzberg, S.L.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27(3), 334–342 (2011)
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (2004)
Darling, A.C.E., Mau, B., Blattner, F.R., Perna, N.T.: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14(7), 1394–1403 (2004)
Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6), 11147 (2010)
De, S., Michor, F.: DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29(12), 1103–1108 (2011)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30(11), 2478–2483 (2002)
Fudenberg, G., Getz, G., Meyerson, M., Mirny, L.A.: High order chromatin architecture shapes the landscape of chromosomal alterations in cancer. Nat. Biotechnol. 29(12), 1109–1113 (2011)
Greenman, C.D., Pleasance, E.D., Newman, S., Yang, F., Fu, B., Nik-Zainal, S., Jones, D., Lau, K.W., Carter, N., Edwards, P.A.W., Futreal, P.A., Stratton, M.R., Campbell, P.J.: Estimation of rearrangement phylogeny for cancer genomes. Genome Res. 22(2), 346–361 (2012)
Kolmogorov, V.: Blossom V: a new implementation of a minimum cost perfect matching algorithm. Mathematical Programming Computation 1, 43–67 (2009)
Medini, D., Donati, C., Tettelin, H., Masignani, V., Rappuoli, R.: The microbial pan-genome. Curr. Opin. Genet. Dev. 15(6), 589–594 (2005)
Nowacki, M., Shetty, K., Landweber, L.F.: RNA-mediated epigenetic programming of genome rearrangements. Annu. Rev. Genomics Hum. Genet. 12, 367–389 (2011)
Rambaut, A., Grassly, N.C.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13(3), 235–238 (1997)
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human–mouse alignments with BLASTZ. Genome Res. 13(1), 103–107 (2003)
Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics 10, 120 (2009)
Umbarger, M.A., Toro, E., Wright, M.A., Porreca, G.J., Ba, D., Hong, S.H., Fero, M.J., Zhu, L.J., Marti-Renom, M.A., McAdams, H.H., Shapiro, L., Dekker, J., Church, G.M.: The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Mol. Cell. 44(2), 252–264 (2011)
Zhang, Y., Hu, F., Tang, J.: Phylogenetic reconstruction with gene rearrangements and gene losses. In: Park, T., Tsui, S.K.W., Chen, L., Ng, M.K., Wong, L., Hu, X. (eds.) BIBM, pp. 35–38. IEEE Computer Society (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kehr, B., Reinert, K., Darling, A.E. (2012). Hidden Breakpoints in Genome Alignments. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-33122-0_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33121-3
Online ISBN: 978-3-642-33122-0
eBook Packages: Computer ScienceComputer Science (R0)