Reconstructing the Evolutionary History of Complex Human Gene Clusters

  • Yu Zhang
  • Giltae Song
  • Tomáš Vinař
  • Eric D. Green
  • Adam Siepel
  • Webb Miller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)


Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data. This method allows the tempo of large-scale evolutionary events in human gene clusters to be estimated, which in turn will facilitate primate comparative sequencing studies that will aim to reconstruct their evolutionary history more fully.


Gene Cluster Duplication Event Tandem Duplication Target Distribution Atomic Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ohno, S.: Evolution by Gene Dupplication. Springer, Berlin (1970)Google Scholar
  2. 2.
    Lupski, J.R.: Genomic rearrangements and sporadic disease. Nat. Genet. 39(7 Suppl), 43–47 (2007)CrossRefGoogle Scholar
  3. 3.
    Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)CrossRefGoogle Scholar
  4. 4.
    Wong, K.K., de Leeuw, R.J., Dosanjh, N.S., Kimm, L.R., Cheng, Z., Horsman, D.E., MacAulay, C., Ng, R.T., Brown, C.J., Eichler, E.E., Lam, W.L.: A comprehensive analysis of common copy-number variations in the human genome. Am. J Hum. Genet. 80(1), 91–104 (2007)CrossRefGoogle Scholar
  5. 5.
    International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–935 (2004)Google Scholar
  6. 6.
    Green, E.D.: Strategies for the systematic sequencing of complex genomes. Nat. Rev. Genet. 2(8), 573–573 (2001)CrossRefGoogle Scholar
  7. 7.
    Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (2004)CrossRefGoogle Scholar
  8. 8.
    Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2336 (2004)CrossRefGoogle Scholar
  9. 9.
    Margulies, E.H., et al.: Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17(6), 760–764 (2007)CrossRefGoogle Scholar
  10. 10.
    Hou, M.: (unpublished data, 2007)Google Scholar
  11. 11.
    Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13(1), 103–107 (2003)CrossRefGoogle Scholar
  12. 12.
    Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstructing the duplication history of tandemly repeated genes. Mol. Biol. Evol. 19(3), 278–278 (2002)Google Scholar
  13. 13.
    Lajoie, M., Bertrand, D., El-Mabrouk, N., Gascuel, O.: Duplication and inversion history of a tandemly repeated genes family. J Comput. Biol. 14(4), 462–468 (2007)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Jiang, Z., Tang, H., Ventura, M., Cardone, M.F., Marques-Bonet, T., She, X., Pevzner, P.A., Eichler, E.E.: Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39(11), 1361–1368 (2007)CrossRefGoogle Scholar
  15. 15.
    Wilson, G.M., Flibotte, S., Missirlis, P.I., Marra, M.A., Jones, S., Thornton, K., Clark, A.G., Holt, R.A.: Identification by full-coverage array CGH of human DNA copy number increases relative to chimpanzee and gorilla. Genome Res. 16(2), 173–181 (2006)CrossRefGoogle Scholar
  16. 16.
    Dumas, L., Kim, Y.H., Karimpour-Fard, A., Cox, M., Hopkins, J., Pollack, J.R., Sikela, J.M.: Gene copy number variation spanning 60 million years of human and primate evolution. Genome Res. 17(9), 1266–1267 (2007)CrossRefGoogle Scholar
  17. 17.
    Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81(3), 814–818 (1984)CrossRefGoogle Scholar
  18. 18.
    Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)zbMATHGoogle Scholar
  19. 19.
    Kuhn, R.M., et al.: The UCSC genome browser database: update 2007. Nucleic Acids Res 35(Database issue), D668–D673 (2007)CrossRefGoogle Scholar
  20. 20.
    Becker, K.G., Barnes, K.C., Bright, T.J., Wang, S.A.: The genetic association database. Nat. Genet. 36(5), 431–432 (2004)CrossRefGoogle Scholar
  21. 21.
    Ma, J.: personal communication (2007)Google Scholar
  22. 22.
    Hurle, B., Swanson, W., Green, E.D.: Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage. Genome Res. 17(3), 276–276 (2007)CrossRefGoogle Scholar
  23. 23.
    The Rhesus Macaque Genome Sequencing and Analysis Consortium: Evolutionary and biomedical insights from the rhesus macaque genome. Science 316(5822), 222–224 (2007)Google Scholar
  24. 24.
    Daza-Vamenta, R., Glusman, G., Rowen, L., Guthrie, B., Geraghty, D.E.: Genetic divergence of the rhesus macaque major histocompatibility complex. Genome Res. 14(8), 1501–1505 (2004)CrossRefGoogle Scholar
  25. 25.
    Krebs, K.C., Jin, Z., Rudersdorf, R., Hughes, A.L., O’Connor, D.H.: Unusually high frequency MHC class I alleles in Mauritian origin cynomolgus macaques. J Immunol. 175(8), 5230–5239 (2005)Google Scholar
  26. 26.
    Smith, M.Z., Fernandez, C.S., Chung, A., Dale, C.J., De Rose, R., Lin, J., Brooks, A.G., Krebs, K.C., Watkins, D.I., O’Connor, D.H., Davenport, M.P., Kent, S.J.: The pigtail macaque MHC class I allele Mane-A*10 presents an immundominant SIV Gag epitope: identification, tetramer development and implications of immune escape and reversion. J Med. Primatol 34(5-6), 282–283 (2005)CrossRefGoogle Scholar
  27. 27.
    Averdam, A., Kuhl, H., Sontag, M., Becker, T., Hughes, A.L., Reinhardt, R., Walter, L.: Genomics and diversity of the common marmoset monkey NK complex. J Immunol. 178(11), 7151–7151 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yu Zhang
    • 1
    • 2
  • Giltae Song
    • 1
  • Tomáš Vinař
    • 3
  • Eric D. Green
    • 4
  • Adam Siepel
    • 3
  • Webb Miller
    • 1
  1. 1.Center for Comparative Genomics and Bioinformatics, 506B Wartik LabPenn State University, University ParkUSA
  2. 2.Department of StatisticsPenn State University, University ParkUSA
  3. 3.Department of Biological Statistics and Computational BiologyCornell UniversityIthacaUSA
  4. 4.Genome Technology Branch and NIH Intramural Sequencing CenterNational Human Genome Research Institute, National Institutes of HealthBethesdaUSA

Personalised recommendations