TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes

  • Yong WangEmail author
  • Zhen Zeng
  • Tian-Lei Liu
  • Ling Sun
  • Qin Yao
  • Ke-Ping Chen
Original Article


Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.


Dinucleotide Composition Odds ratio Gene birth Genome evolution 



This study was supported by the National Natural Science Foundation of China (No. 31572467 and No. 31872425) and the Project Funded by Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.

Research involving human or animal participants

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

438_2019_1535_MOESM1_ESM.xlsx (117 kb)
Supplementary material 1 (XLSX 116 KB)
438_2019_1535_MOESM2_ESM.xlsx (2.5 mb)
Supplementary material 2 (XLSX 2585 KB)
438_2019_1535_MOESM3_ESM.xlsx (108 kb)
Supplementary material 3 (XLSX 108 KB)
438_2019_1535_MOESM4_ESM.xlsx (89 kb)
Supplementary material 4 (XLSX 89 KB)


  1. Behura SK, Severson DW (2012) Comparative analysis of codon usage bias and codon context patterns between Dipteran and Hymenopteran sequenced genomes. PLoS One 7:e43111CrossRefGoogle Scholar
  2. Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504CrossRefGoogle Scholar
  3. Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213CrossRefGoogle Scholar
  4. Burge C, Campbell AM, Karlin S (1992) Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA 89:1358–1362CrossRefGoogle Scholar
  5. Carmel L, Wolf YI, Rogozin IB, Koonin EV (2007) Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Res 17:1034–1044CrossRefGoogle Scholar
  6. Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B et al (2012) Proto-genes and de novo gene birth. Nature 487:370–374CrossRefGoogle Scholar
  7. Clutterbuck AJ (2017) Genomic CG dinucleotide deficiencies associated with transposable element hypermutation in Basidiomycetes, some lower fungi, a moss and a clubmoss. Fungal Genet Biol 104:16–28CrossRefGoogle Scholar
  8. Csuros M, Rogozin IB, Koonin EV (2011) A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes. PLoS Comput Biol 7:e1002150CrossRefGoogle Scholar
  9. Di Giallonardo F, Schlub TE, Shi M, Holmes EC (2017) Dinucleotide composition in animal RNA viruses is shaped more by virus family than by host species. J Virol 91:e02381–e02316CrossRefGoogle Scholar
  10. Doolittle WF, Stoltzfus A (1993) Molecular evolution: Genes-in-pieces revisited. Nature 361:403CrossRefGoogle Scholar
  11. Ekman D, Elofsson A (2010) Identifying and quantifying orphan protein sequences in fungi. J Mol Biol 396:396–405CrossRefGoogle Scholar
  12. Gentles AJ, Karlin S (2001) Genome-scale compositional comparisons in eukaryotes. Genome Res 11:540–546CrossRefGoogle Scholar
  13. Giacomelli MG, Hancock AS, Masel J (2007) The conversion of 3′ UTRs into coding regions. Mol Biol Evol 24:457–464CrossRefGoogle Scholar
  14. Gilbert W (1987) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52:901–905CrossRefGoogle Scholar
  15. Guerzoni D, McLysaght A (2016) De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol Evol 8:1222–1232CrossRefGoogle Scholar
  16. Jabbari K, Bernardi G (2004) Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. Gene 333:143–149CrossRefGoogle Scholar
  17. Jackson S, Cannone J, Lee J, Gutell R, Woodson S (2002) Distribution of rRNA introns in the three-dimensional structure of the ribosome. J Mol Biol 323:35–52CrossRefGoogle Scholar
  18. Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11:283–290CrossRefGoogle Scholar
  19. Karlin S, Mrázek J (1997) Compositional differences within and between eukaryotic genomes. Proc Natl Acad Sci USA 94:10227–10232CrossRefGoogle Scholar
  20. Karlin S, Ladunga I, Blaisdell BE (1994) Heterogeneity of genomes: measures and values. Proc Natl Acad Sci USA 91:12837–12841CrossRefGoogle Scholar
  21. Kjems J, Garrett R (1998) Novel splicing mechanism for the ribosomal RNA intron in the archaebacterium Desulfurococcus mobilis. Cell 54:693–703CrossRefGoogle Scholar
  22. Logsdon JM Jr (1998) The recent origins of spliceosomal introns revisited. Curr Opin Genet Dev 8:637–648CrossRefGoogle Scholar
  23. Ma YP, Ke H, Liang ZL, Liu ZX, Hao L, Ma JY, Li YG (2016) Multiple evolutionary selections involved in synonymous codon usages in the Streptococcus agalactiae genome. Int J Mol Sci 17:277CrossRefGoogle Scholar
  24. Marck C, Grosjean H (2003) Identification of BHB splicing motifs in intron-containing tRNAs from 18 archaea: evolutionary implications. RNA 9:1516–1531CrossRefGoogle Scholar
  25. McLysaght A, Guerzoni D (2015) New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 370:20140332CrossRefGoogle Scholar
  26. Nabholz B, Künstner A, Wang R, Jarvis ED, Ellegren H (2011) Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Mol Biol Evol 28:2197–2210CrossRefGoogle Scholar
  27. Rodríguez-Trelles F, Tarrío R, Ayala FJ (2006) Origins and evolution of spliceosomal introns. Annu Rev Genet 40:47–76CrossRefGoogle Scholar
  28. Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13:1512–1517CrossRefGoogle Scholar
  29. Rogozin IB, Carmel L, Csuros M, Koonin EV (2012) Origin and evolution of spliceosomal introns. Biol Direct 7:11CrossRefGoogle Scholar
  30. Roy SW (2003) Recent evidence for the exon theory of genes. Genetica 118:251–266CrossRefGoogle Scholar
  31. Salman V, Amann R, Shub DA, Schulz-Vogt HN (2012) Multiple self-splicing introns in the 16S rRNA genes of giant sulfur bacteria. Proc Natl Acad Sci USA 109:4203–4208CrossRefGoogle Scholar
  32. Schmitz JF, Bornberg-Bauer E (2017) Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res 6:57CrossRefGoogle Scholar
  33. Tarallo A, Angelini C, Sanges R, Yagi M, Agnisola C, D’Onofrio G (2016) On the genome base composition of teleosts: the effect of environment and lifestyle. BMC Genom 17:173CrossRefGoogle Scholar
  34. Tautz D, Domazet-Lošo T (2011) The evolutionary origin of orphan genes. Nat Rev Genet 12:692–702CrossRefGoogle Scholar
  35. Travers AA, Schwabe JW (1993) Spurring on transcription? Curr Biol 3:898–900CrossRefGoogle Scholar
  36. Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I (2018) A molecular portrait of de novo genes in yeasts. Mol Biol Evol 35:631–645CrossRefGoogle Scholar
  37. Wang Y, Tao XF, Su ZX, Liu AK, Liu TL, Sun L, Yao Q, Chen KP, Gu X (2016) Current bacterial gene encoding capsule biosynthesis protein CapI contains nucleotides derived from exonization. Evol Bioinform 12:303–312Google Scholar
  38. Wilson BA, Foy SG, Neme R, Masel J (2017) Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1:0146–0146CrossRefGoogle Scholar
  39. Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006) Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res 34:564–574CrossRefGoogle Scholar
  40. Zhou JH, Ding YZ, He Y, Chu YF, Zhao P, Ma LY, Wang XJ, Li XR, Liu YS (2014) The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome. PLoS One 9:e108949CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Food and Biological EngineeringJiangsu UniversityZhenjiangChina
  2. 2.Institute of Life SciencesJiangsu UniversityZhenjiangChina

Personalised recommendations