The universe of exons revisited

  • Serge Saxonov
  • Walter Gilbert
Part of the Contemporary Issues in Genetics and Evolution book series (CIGE, volume 10)


We study the distribution of exons in eukaryotic genes to determine whether one can detect the reuse of exon sequences and to use the frequency of such reuse to estimate how many ancestral exon sequences there might have been. We use two databases of exons. One contained 56,276 internal exons from putatively unrelated genes (less than 20% sequence identity) and the second contained 8917 internal exons from regions of these genes that are homologous and colinear with prokaryotic genes; these are ancient conserved regions (ACRs). At the 95% significance level we find 3500 exon-sequence matches in the large database and 500 matches in the ACR database. These matches correspond to groups of similar sequences. The size-rank relationship for these groups follows a power law, the size falling off as the inverse square root of the rank. This form of the power law distribution leads us to make an estimate for the size of a possible universe of ancestral exons. Using the data corresponding to the ACR regions, that universe is estimated to be about 15,000–30,000 in size.

Key words

ACR BLAST evolution exon gene-structure intron 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389–3402.PubMedCrossRefGoogle Scholar
  2. Benson, D.A., M.S. Boguski, D.J. Lipman, J. Ostell, B.F. Ouellette et al., 1999. GenBank. Nucl. Acids Res. 27: 12–17.PubMedCrossRefGoogle Scholar
  3. Brenner, S.E., C. Chothia & T.J. Hubbard, 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95: 6073–6078.PubMedCrossRefGoogle Scholar
  4. Cho, G. & R.F. Doolittle, 1997. Intron distribution in ancient paralogs supports random insertion and not random loss. J. Mol. Evol. 44: 573–584.PubMedCrossRefGoogle Scholar
  5. de Souza, S.J., M. Long, R.J. Klein, S. Roy, S. Lin et al., 1998. Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95: 5094–5099.PubMedCrossRefGoogle Scholar
  6. Doolittle, W.F., 1978. Genes in pieces: were they ever together? Nature 272: 581–582.CrossRefGoogle Scholar
  7. Dorit, R.L., L. Schoenbach & W. Gilbert, 1990. How big is the universe of exons? Science 250: 1377–1382.PubMedCrossRefGoogle Scholar
  8. Gilbert, W., 1978. Why genes in pieces? Nature 271: 501.PubMedCrossRefGoogle Scholar
  9. Gilbert, W., 1987. The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52: 901–905.CrossRefGoogle Scholar
  10. Logsdon Jr., J.M., A. Stoltzfus & W.F. Doolittle, 1998. Molecular evolution: recent cases of spliceosomal intron gain? Curr. Biol. 8: R560–563.PubMedCrossRefGoogle Scholar
  11. Roy, S.W., M. Nosaka, S.J. de Souza & W. Gilbert, 1999. Centripetal modules and ancient introns. Gene 238: 85–91.PubMedCrossRefGoogle Scholar
  12. Saxonov, S., I. Daizadeh, A. Fedorov & W. Gilbert, 2000. EID: the exon-intron database — an exhaustive database of protein-coding intron-containing genes. Nucl. Acids Res. 28: 185–190.PubMedCrossRefGoogle Scholar
  13. Zipf, G.K., 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley, Redwood City, CA.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2003

Authors and Affiliations

  • Serge Saxonov
    • 1
  • Walter Gilbert
    • 1
  1. 1.Department of Molecular and Cellular Biology, The Biological LaboratoriesHarvard UniversityCambridgeUSA

Personalised recommendations