Advertisement

Bioinformatics and Translation Elongation

  • Xuhua Xia
Chapter

Abstract

Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.

1 Introduction

We will first learn a few key definitions and notations on tRNA, its anticodon, and codon families. We will then outline the conceptual framework of codon adaptation, mediated by mutation and selection. This brings us to indices of codon usage bias , their calculation and interpretations, and factors that may confound their interpretations. There are codon-specific indices such as relative synonymous codon usage (RSCU , Sharp et al. 1986) or gene-specific indices such as index of translation elongation (ITE , Xia 2015) and codon adaptation index (CAI, Sharp and Li 1987; Xia 2007c). All these indices are implemented in DAMBE (Xia 2013, 2017d).

ITE takes background mutation bias into consideration, while CAI does not. ITE is reduced to CAI if there is no background mutation bias. I will illustrate the applications of these indices in practical research. Keep in mind that a codon adaptation index is just one variable which will not be particularly interesting until you relate it to other variables and understand their relationships.

Two additional topics are dealt with close to the end of the chapter. The first involves how to discriminate between selection for translation efficiency and accuracy (Akashi 1994). The second is on the effect of amino acid usage on translation elongation efficiency. The general prediction concerning amino acid usage is that highly expressed proteins should maximize the use of amino acids that are abundant and energetically cheap (Akashi and Gojobori 2002) to make and have many tRNAs to carry them (Xia 1998a). The same argument has been used for transcription, i.e., an mRNA with many A nucleotides will be transcribed faster than one with many C nucleotides because A is in general far more abundant than C and it takes extra ATP to make CTP (Xia 1996; Xia et al. 2006).

1.1 Basic Notations, Definitions, and Abbreviations

Notations, definitions, and abbreviations are essential in science. We are lucky enough to have almost all of them unambiguous. If you were studying social sciences, you would have to come to define what is man and what is woman, and the debate on a proper definition will last forever, eventually with all debaters losing their mind and being called jerks.

1.1.1 tRNA Notation and Identification of tRNA Anticodon

The simplest notation of a tRNA is tRNAAA, where AA is a specific amino acid. For example, tRNAGly refers to all tRNAs that can be charged with amino acid glycine (Gly). A slightly more complicated notation is tRNAAA/AC, where AC refers to tRNA anticodon. For example, tRNAGly/GCC refers specifically to tRNAGly with a GCC anticodon. The general notation of a tRNA is AA2-tRNAAA1/AC, where AA1 is the amino acid the tRNA is supposed to carry, AA2 is the amino acid that is actually carried by the tRNA, and AC is the anticodon. In most cases, AA1 and AA2 are the same. However, there are two cases where AA1 and AA2 can be different. The first is modification of AA2 by a biochemist. The second occurs naturally in a number of species across all three domains of life (Sheppard et al. 2008; Yuan et al. 2008), where Gln-tRNAGln, Asn-tRNAAsn, Cys-tRNACys, and Sec-tRNASec are formed indirectly by two steps. Take Gln-tRNAGln and Asn-tRNAAsn, for example. Glu is first misacylated to tRNAGln, and Asp to tRNAAsn, to form Glu-tRNAGln and Asp-tRNAAsn, respectively. The resulting misacylated tRNAs are then converted to Gln-tRNAGln and Asn-tRNAAsn by a group of tRNA-dependent modifying enzyme.

Isoacceptor tRNA is a somewhat confusing term as it may carry two slightly different meanings. It could refer to a single tRNA decoding different synonymous codons, e.g., tRNAGly/GCC decoding GGC and GGU codons. Alternatively, it could refer to a set of different tRNAs that carry the same amino acid but decode different synonymous codons. For example, tRNAGly/GCC, tRNAGly/CCC, and tRNAGly/UCC are isoacceptor tRNA s. They all carry amino acid Gly but with different anticodons decoding different synonymous Gly codons. Different isoacceptor tRNAs could decode the same codon. For example, tRNAGly/CCC decodes GGG, but tRNAGly/UCC decodes both GGA and GGG, so GGG is decoded by both tRNAGly/CCC and tRNAGly/UCC. Thus, isoacceptor tRNA refers to (1) one tRNA decoding different synonymous codons or (2) a set of tRNAs that carry the same amino acid but decode different sets of synonymous codons. The intersection of different sets of synonymous codons may not be empty. For example, the set of codons decoded by tRNAGly/CCC is {GGG}, and the set of codons decoded by tRNAGly/UCC is {GGA, GGG}. The intersection of the two sets is {GGG}.

Related to isoacceptor tRNA is another potentially confusing concept, near-cognate tRNA , which is defined in two ways. The first is based on empirical evidence. If codon XYZ encoding amino acid AA1 can be misread by tRNA carrying amino acid AA2 (AA1 ≠ AA2), then that tRNA is a near-cognate tRNA for codon XYZ. The second definition is based on nucleotide similarity among codons. A codon XYZ has nine XYZ-like codons which differ from XYZ by a single nucleotide. Some of these XYZ-like codons are synonymous to XYZ and some not. The set of tRNAs that can decode any of those nonsynonymous XYZ-like codons are near-cognate tRNAs for codon XYZ because they can “potentially” misread codon XYZ. For example, tRNAAsp is a near-cognate for codons GAA and GAG because Asp is encoded by GAC and GAU which are GAA-like and GAG-like codons.

1.1.2 Genetic Code s and Associated Concepts and Definitions

It is through genetic code that the 64 codons are interpreted as encoding amino acids or translation stop. Nature is superfluous in her creation of genetic code. There are now 24 known genetic codes listed from 1 to 31 (Table 9.1). The standard genetic code is shown previously in Table  2.7.
Table 9.1

The 24 genetic tables named after representative species and corresponding translation tables (TT)

Name

TT

Standard

1

Vertebrate mitochondrial

2

Yeast mitochondrial

3

Mold, protozoan, and coelenterate mitochondrial code and the Mycoplasma/Spiroplasma

4

Invertebrate mitochondrial

5

Ciliate, Dasycladacean, and Hexamita nuclear

6

Echinoderm and flatworm mitochondrial

9

Euplotid nuclear

10

Bacterial, archaeal, and plant plastid

11

Alternative yeast nuclear

12

Ascidian mitochondrial

13

Alternative flatworm mitochondrial

14

Chlorophycean mitochondrial

16

Trematode mitochondrial

21

Scenedesmus obliquus mitochondrial

22

Thraustochytrium mitochondrial

23

Pterobranchia mitochondrial

24

Candidate division SR1 and Gracilibacteria

25

Pachysolen tannophilus nuclear

26

Karyorelict nuclear

27

Condylostoma nuclear

28

Mesodinium nuclear

29

Peritrich nuclear

30

Blastocrithidia nuclear

31

Some codons do not change their meanings, e.g., Phe (UUY), Tyr (UAY), and Pro (CCN), whereas some others change their meaning frequently. Table 9.2 lists those codons with different meanings in different genetic codes. These codons tend to end with a purine, except for CUN. However, even within the CUR codon family, CUR codons are involved in recoding more often than CUY codons (Table 9.2).
Table 9.2

Codons with different meanings in different translation tables (TT)

TT

UUA

UCA

UAA

UAG

UGA

CUU

CUC

CUA

CUG

AUA

AAA

AGA

AGG

1

L

S

*

*

*

L

L

L

L

I

K

R

R

2

L

S

*

*

W

L

L

L

L

M

K

*

*

3

L

S

*

*

W

T

T

T

T

M

K

R

R

4

L

S

*

*

W

L

L

L

L

I

K

R

R

5

L

S

*

*

W

L

L

L

L

M

K

S

S

6

L

S

Q

Q

*

L

L

L

L

I

K

R

R

9

L

S

*

*

W

L

L

L

L

I

N

S

S

10

L

S

*

*

C

L

L

L

L

I

K

R

R

11

L

S

*

*

*

L

L

L

L

I

K

R

R

12

L

S

*

*

*

L

L

L

S

I

K

R

R

13

L

S

*

*

W

L

L

L

L

M

K

G

G

14

L

S

Y

*

W

L

L

L

L

I

N

S

S

16

L

S

*

L

*

L

L

L

L

I

K

R

R

21

L

S

*

*

W

L

L

L

L

M

N

S

S

22

L

*

*

L

*

L

L

L

L

I

K

R

R

23

*

S

*

*

*

L

L

L

L

I

K

R

R

24

L

S

*

*

W

L

L

L

L

I

K

S

K

25

L

S

*

*

G

L

L

L

L

I

K

R

R

26

L

S

*

*

*

L

L

L

A

I

K

R

R

27

L

S

Q

Q

w

L

L

L

L

I

K

R

R

28

L

S

q

q

w

L

L

L

L

I

K

R

R

29

L

S

Y

Y

*

L

L

L

L

I

K

R

R

30

L

S

E

E

*

L

L

L

L

I

K

R

R

31

L

S

e

e

W

L

L

L

L

I

K

R

R

A small-case letter, such as q in translation table 28, means that the corresponding codon can mean either amino acid Q or a stop codon

We can build a distance tree from Table 9.2 by counting the pairwise number of reassignment events (i.e., when a codon for one amino acid is reassigned to a different amino acid or a stop codon). The only problem is how to treat reassignment between a sense codon and a stop. Such a change probably should occur less frequently than reassignments involving two sense codons. All pairwise comparisons among the 24 rows (24 genetic codes) generate 609 reassignments involving 2 sense codons and 445 reassignments between a sense codon and a stop codon. However, during the long evolutionary time, the more frequent reassignments will erase each other and the frequencies of their occurrences will be underestimated. So the actual difference between the two numbers must be much greater. If we count each reassignment between a sense codon and a stop codon as equivalent to four reassignments between two sense codons, we obtain a distance-based tree in Fig. 9.1. The topology remains the same if we treat each reassignment between a sense codon and a stop codon as equivalent to two, three, or five reassignments involving two sense codons.
Fig. 9.1

“Phylogenetic tree” of 24 genetic codes with their differences shown in Table 9.2, based on pairwise number of codon reassignments. A reassignment between a sense codon and a stop codon is treated as equivalent to four codon reassignment events between two nonsynonymous sense codons. Leaves labeled with a “MT”-ending are mitochondrial genetic codes

Most bacteria use genetic code 11 which is the same as the standard code except for the difference in start codon usage. The wall-less bacteria including Mycoplasma and Spiroplasma use genetic code 4 which is identical to the mitochondrial genetic code used in a number of fungal lineages, red algae, and protozoa. The use of the same genetic code 4 by bacteria and mitochondria in eukaryotic lineages suggests two alternative hypotheses. First, it is convergence. Second, the ancestor of mitochondrial lineages in Cluster 3 (Fig. 9.1) is a Mycoplasma-like bacteria. This would imply multiple origin of mitochondrial lineages.

The main arguments for a single origin of mitochondria are (1) extensive phylogenetic reconstruction with rRNA sequences from diverse array of mitochondrial and bacterial lineages appears to recover mitochondrial lineages as a monophyletic taxon, with its closest phylogenetic relative being in Alphaproteobacteria lineages, especially Rickettsiales (Williams et al. 2007), and (2) all diverse mitochondrial genomes appear to represent reduced form of the mitochondrial genome from Reclinomonas americana (Lang et al. 1997). In particular, the closest phylogenetic relative for the mitochondrial genome from R. Americana among bacterial lineages is Ehrlichia muris strain AS145 within Rickettsiales. These lines of evidence, taken together, represent compelling evidence for the single-origin hypothesis of mitochondria.

Genetic codes also differ in start codons (Table 9.3). While AUG is used universally and dominantly as a start codon, other codons are used as well, although there has been no species in which a non-AUG codon is used as a start codon more frequently than AUG. For eukaryotic species where AUG is part of translation initiation signal such as in the Kozak consensus RxxAUGG, non-AUG codons are rarely used. In bacterial species where start codon is localized by pairing of Shine-Dalgarno (SD) sequences and anti-SD sequences, the requirement for AUG as a start codon is less stringent.
Table 9.3

The 24 translation tables (24) differ in start codon usage

TT

TTA

TTG

CTG

ATT

ATC

ATA

ATG

GTG

1

M

M

M

2

M

M

M

M

M

3

M

M

4

M

M

M

M

M

M

M

M

5

M

M

M

M

M

M

6

M

9

M

M

10

M

11

M

M

M

M

M

M

M

12

M

M

13

M

M

M

M

14

M

16

M

21

M

M

22

M

23

M

M

M

24

M

M

M

M

25

M

M

M

26

M

M

27

M

28

M

29

M

30

M

31

M

A synonymous codon family refers to all codons coding the same amino acids. For example, GGA, GGC, GGG, and GGU codons all code Gly and are collectively referred to as the Gly codon family or just Gly family. I may use “family” for “synonymous codon family” when there is no confusion. A codon family such as Gly family that differs only at the third codon position is a simple family. The Gly codon family is a simple family. In contrast, a codon family that differs not only at the third codon position but also at other codon positions is a compound codon family. For example, in standard genetic code, Leu is coded by UUR (where R stands for purine) and CUN (where N stands for any nucleotide) codons. Therefore, Leu codon family is a compound family. Other compound families in the standard code include Ser (coded by UCN and AGY, where Y stands for pyrimidine) and Arg (coded by CGN and AGR). Compound families are often divided into subfamilies. For example, the Ser family is broken into UCN subfamily and AGY subfamily.

The phenomenon that one amino acid may be encoded by multiple codons is called codon degeneracy. This gives rise to 4-fold, 3-fold, 2-fold, and 1-fold (0-fold is a misnomer) degenerate sites. An n-fold site is one that can be occupied by n different nucleotides without changing the meaning of the encoded amino acid. For example, the third site in the four Gly codons above is fourfold degenerate. In the standard code, AUA, AUC, and AUU all encode amino acid Met, so that the third codon site is threefold degenerate. AAA and AAG both encode amino acid Lys, so that the third codon site is twofold degenerate. We may also have a twofold degenerate site at the first codon site. For example, both CUA and UUA encode amino acid Leu, so the first codon site is twofold degenerate. The second codon site of Gly codons is onefold degenerate because replacing it by any other nucleotide will change the meaning of the encoded amino acid.

A synonymous mutation refers to the change of a codon by another synonymous codon. A nonsynonymous mutation refers to codon replacement involving amino acid replacement. A substitution is a mutation that has spread to all individuals in the population. Synonymous substitutions occur often, but nonsynonymous substitutions occur rarely.

Throughout text, we will abbreviate highly and lowly expressed genes as HEGs and LEGs . Unless specified otherwise, HEGs and LEGs in this chapter pertain to protein expression, not mRNA expression. One may rank all proteins according to experimentally measured abundance and take the top and bottom 1/3 as HEGs and LEGs, respectively. Non-HEGs are simply all genes from a genome that is not included in HEGs. Protein abundance values for most model species may be found in PaxDb (Wang et al. 2012).

1.2 Elongation Efficiency Depends on Amino Acid and Codon Usage

Many unicellular organisms, especially bacterial species, need to grow and replicate the cell rapidly in order not to be outcompeted by others. For example, an E. coli cell replicates once every 20 min with unlimited nutrients. To replicate a cell, not only the genome needs to be replicated, but a large amount of proteins have to be produced, with some proteins produced in nearly half a million copies in an E. coli cell. For such highly expressed proteins, it is very important for their coding genes to have efficient coding strategy to maximize the rate of translation. Translation involves three sub-processes, initiation, elongation, and termination. The previous chapter illustrates how natural selection can drive evolution toward more efficient translation initiation. This chapter addresses the question of how translation elongation can be improved through codon adaptation.

There are two obvious ways of increasing translation elongation efficiency for mass-produced proteins. The first is to optimize amino acid usage , i.e., to use energetically cheap and typically abundant amino acids as building blocks (Akashi and Gojobori 2002). The second is to maximize the usage of codons that match the anticodon of the most abundant cognate tRNA (Gouy and Gautier 1982; Ikemura 1992; Xia 1998a, 2005, 2009, 2015). For example, the amino acid glycine (Gly) can be coded by GGA, GGC, GGG, and GGU codons, but tRNAGly species that decode GGY codons are more abundant than tRNAGly species that decode GGR codons in E. coli cells. What codons should E. coli use to code glycine? Obviously natural selection should favor those that maximize the usage of GGY codons against GGR codons given the differential tRNA availability. However, selection and mutation may go in opposite directions, so any study of codon adaptation would be incomplete without considering both selection and mutation.

1.3 Empirical Illustration of Codon-Anticodon Adaptation

Ikemura’s pioneering works established the relationship between differential tRNA abundance and its effect on codon usage in rapidly replicating bacterial species and unicellular eukaryotes (Ikemura 1981a, b, 1982, 1992). Many studies have since demonstrated a strong relationship not only between codon adaptation and gene expression (Coghlan and Wolfe 2000; Comeron and Aguade 1998; Duret and Mouchiroud 1999; Gouy and Gautier 1982; Xia 2007c) but also between experimentally modified codon usage and protein production (Haas et al. 1996; Ngumbela et al. 2008; Robinson et al. 1984; Sorensen et al. 1989). These results have led to the explicit formulation of codon-anticodon coevolution and adaptation theory (e.g., Akashi 1994; Moriyama and Powell 1997; Ran and Higgs 2012; Xia 1998a, 2008) which states that (1) protein production is rate-limited by both translation initiation and elongation efficiency; (2) codon usage and tRNA anticodon coevolve to adapt to each other, resulting in increased production of correctly translated proteins; and (3) the increased elongation efficiency and accuracy represent the driving force for the HEGs to acquire a high degree of codon-anticodon adaptation.

1.3.1 Empirical Illustration of Codon-Anticodon Adaptation in Yeast

The baker’s yeast, Saccharomyces cerevisiae , replicates rapidly and is expected to use codons with many decoding tRNAs and avoid codons with few decoding tRNAs. The earliest association between tRNA and codon usage was empirically demonstrated by Ikemura (1981a, b, 1992). Tables 9.4 and 9.5 show the association between tRNA gene copy number (T in Tables 9.4 and 9.5) in the genome and codon usage in highly expressed yeast genes (F in Tables 9.4 and 9.5). T is a good proxy for tRNA abundance (Percudani et al. 1997).
Table 9.4

Copy number of tRNA genes in the yeast Saccharomyces cerevisiae genome (T) and codon counts (F) in highly expressed yeast protein-coding genes, compiled in the Eyeastcai.cut file distributed with EMBOSS (Rice et al. 2000)

AAa

Codonb

T

F

 

AAa

Codonb

T

F

Arg

AGA

11

314

 

His

CAC

7

102

Arg

AGG

1

1

 

His

CAU

0

25

Asn

AAC

10

208

 

Leu

UUA

7

42

Asn

AAU

0

11

 

Leu

UUG

10

359

Asp

GAC

16

202

 

Lys

AAA

7

65

Asp

GAU

0

112

 

Lys

AAG

14

483

Cys

UGC

4

3

 

Phe

UUC

10

168

Cys

UGU

0

39

 

Phe

UUU

0

19

Gln

CAA

9

153

 

Ser

AGC

2

6

Gln

CAG

1

1

 

Ser

AGU

0

4

Glu

GAA

14

305

 

Tyr

UAC

8

141

Glu

GAG

2

5

 

Tyr

UAU

0

10

Only twofold codon families are included

aAmino acid carried by tRNA

bCodons forming Watson-Crick base pair with the anticodon of tRNA

Table 9.5

Copy number of tRNA genes in the yeast Saccharomyces cerevisiae genome (T) and codon counts (F) in highly expressed yeast protein-coding genes, compiled in the Eyeastcai.cut file distributed with EMBOSS (Rice et al. 2000)

AA

Codon

T

F

AA

Codon

T

F

Ala

GCA

5

6

Pro

CCA

10

211

Ala

GCG

0

0

Pro

CCG

0

0

Ala

GCC

0

130

Pro

CCC

0

2

Ala

GCU

11

411

Pro

CCU

2

10

Arg

CGA

0

0

Ser

UCA

3

7

Arg

CGG

1

0

Ser

UCG

1

1

Arg

CGC

0

0

Ser

UCC

0

133

Arg

CGU

6

43

Ser

UCU

11

192

Gly

GGA

3

1

Thr

ACA

4

2

Gly

GGG

2

2

Thr

ACG

1

1

Gly

GGC

16

9

Thr

ACC

0

164

Gly

GGU

0

459

Thr

ACU

11

151

Ile

AUA

2

0

Val

GUA

2

0

Ile

AUC

0

181

Val

GUG

2

5

Ile

AUU

13

149

Val

GUC

0

231

Leu

CUA

3

14

Val

GUU

14

278

Leu

CUG

0

1

    

Leu

CUC

1

1

    

Leu

CUU

0

2

    

Only threefold and fourfold codon families are included. Symbols as in Table 9.4

The association between T and F is obvious in Tables 9.4 and 9.5. Take the two Arg codons AGA and AGG in Table 9.4, for example. There are 11 tRNAArg/UCU genes in the yeast genome that form perfect Watson-Crick base pair with AGA but only one tRNAArg/CCU with AGG. So we expect yeast genes, especially highly expressed ones, to use AGA and avoid AGG, which is true (Table 9.4). The same applies to all other synonymous codon families or subfamilies, except for the Cys codon family. Why the rarely used Cys codon family should be exceptional remains unknown. It is possible that Cys codon UGC may happen to be followed by a GNN codon, leading to methylation of C at the third codon position which then changes to T via spontaneous deamination. Whether the yeast genome has cytosine methylation remains controversial, with both evidence for (Tang et al. 2012) and against (Capuano et al. 2014) the existence of methylation in S. cerevisiae . However, there is significant CpG deficiency and TpG and CpA surplus in genome, which is consistent with CpG-specific DNA methylation.

One can obtain tables similar to Tables 9.4 and 9.5 by downloading the yeast genome from GenBank and then using DAMBE to compile the data in three steps. First, read the GenBank files for yeast chromosome sequences into DAMBE (Xia 2013, 2017d) to extract the coding sequences (CDSs) and tRNA genes. Second, compute ITE (Xia 2015) as a proxy of gene expression, and choose a subset of CDSs with highest ITE as HEGs . Third, use DAMBE to obtain codon usage of these HEGs. In this way, a table similar to Table 9.4 can be generated in minutes.

1.3.2 Codon Usage Changes When tRNA Abundance Changes

An evolutionary change in tRNA composition or relative abundance is expected to alter codon-anticodon adaptation. This is not controversial theoretically, but empirically difficult to demonstrate. However, recent studies (Xia 2012c; Xia et al. 2007) have documented that changes in tRNAMet genes (where Met is the amino acid carried by the tRNA) in animal mitochondrial DNA (mtDNA) are associated with changes in Met codon usage.

In mtDNA of most animal species, Met is coded by AUA and AUG codons. In some animal species, e.g., vertebrates, these two codons are translated by a single tRNAMet/CAU species (where CAU is the anticodon in the 5′ to 3′ orientation) with a modified C (i.e., f5C) at the first anticodon position (Grosjean et al. 2010) to allow C/A pairing. In other animal species, e.g., tunicates, an additional tRNAMet/UAU gene is present in the mtDNA. One would expect that, when tRNAMet/UAU is absent, Met should be preferably coded by AUG with a reduced AUA usage. The gain of tRNAMet/UAU would favor more Met to be coded by AUA.

In addition to tunicates, MtDNA in bivalve species also have two tRNAMet genes. In some bivalve species (e.g., Acanthocardia tuberculata, Crassostrea gigas, C. virginica, Hiatella arctica, Placopecten magellanicus, and Venerupis philippinarum), both tRNAMet genes have a CAU anticodon forming Watson-Crick base pair with codon AUG. In some other bivalve species (e.g., Mytilus edulis, Mytilus galloprovincialis, and Mytilus trossulus), one tRNAMet has a CAU anticodon, and the other has a UAU anticodon forming Watson-Crick base pair with the AUA codon. One would predict that the latter should be more likely to code Met by AUA than the former, i.e., the proportion of AUA codon within the AUR codon family, designated PAUA, should be greater in the latter with both a tRNAMet/CAU and a tRNAMet/UAU gene than in the former with tRNAMet/CAU gene only (Xia et al. 2007).

One complication in testing the prediction is that AUA usage will increase with genomic AT%. To control for this effect, one may use another A-ending codon, such as UUA as a reference. Thus, given the same PUUA (the proportion of UUA codon in the UUR codon family), PAUA in the three Mytilus mtDNA with both a tRNAMet/CAU and a tRNAMet/UAU gene should be higher than that in the six bivalve species without a tRNAMet/UAU gene. This is supported by empirical evidence (ANCOVA test, p = 0.0111, Fig. 9.2a). Thus, the presence of tRNAMet/UAU increases AUA usage significantly.
Fig. 9.2

Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNAMet/CAU and tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene

A similar comparison can be performed between the urochordates (tunicates, with both tRNAMet/CAU and tRNAMet/UAU genes in their mtDNA) and cephalochordates (lancelets, with only a tRNAMet/CAU gene in their mtDNA). Figure 9.2b shows that PAUA is much smaller in lancelets than in tunicates at the same PUUA level. Thus, AUA usage is consistently increased by the gain of a tRNAMet/UAU gene (or consistently decreased by the loss of a tRNAMet/UAU gene) in animal mtDNA.

A gain of a tRNAMet/UAU gene is also associated with a surplus of AUG→AUA substitutions in animal mitochondrial coding sequences (results not shown). Similar associations can also be observed with other gain/loss of tRNA genes in animal mitochondrial. In contrast, a gain/loss of tRNA genes in plant mtDNA appears to have little effect on nucleotide substitutions or codon usage, presumably because such gain/loss events do not significantly alter the tRNA pool in plant cells where nuclear tRNAs are mass-imported into plant mitochondria.

1.4 Effect of Biased Mutation on Codon Usage and Some Misconceptions

Biased mutation has long been known to affect codon usage (Muto and Osawa 1987; Sueoka 1964; Xia and Yuen 2005; Xia et al. 2002). The third codon position is the most amenable to mutation bias (Fig. 9.4) because most nucleotide substitutions at the third codon position are synonymous. Nucleotide substitutions are synonymous at some first codon positions but nonsynonymous at all second codon position. Furthermore, all nucleotide substitutions at the second codon positions typically involve rather different amino acids and therefore should be subject to strong purifying selection (Xia 1998b; Xia and Li 1998). One therefore would predict that the third codon position should increase more rapidly with the genomic GC% than the first codon position which in turn should have its GC% increase more rapidly with the genomic GC% than the second codon position. The empirical results (Fig. 9.3) strongly support the prediction (Muto and Osawa 1987).
Fig. 9.3

Correlation of GC% between genomic DNA and first, second, and third codon positions (Muto and Osawa 1987). While the actual position of the points may be substantially revised with new genomic data (e.g., the GC% for the first, second, and third codon positions for Mycoplasma capricolum is 35.8%, 27.4%, and 8.8% based on all annotated CDSs in the genomic sequence), the general trend remains the same

However, the pattern in Fig. 9.3, while consistent with the mutation hypothesis, has resulted in two misconceptions. First, the pattern shown by the third codon position is often interpreted to reflect mutation bias. This interpretation is incorrect because the third codon position is subject to selection by differential availability of tRNA species (Carullo and Xia 2008; Xia 1998a, 2005, 2008; Xia et al. 2007). We may contrast a GC-rich Streptomyces coelicolor and a GC-poor Mycoplasma capricolum as an illustrative example. M. capricolum has no tRNA with a C or G at the wobble site for fourfold codon families (Ala, Gly, Pro, Thr, and Val), i.e., the translation machinery would be inefficient in translating C-ending or G-ending codons. This implies selection in favor of A-ending or U-ending codons and will consequently reduce GC% at the third codon position. This most likely has contributed to the low GC% at the third codon position in M. capricolum. In contrast, most of the tRNA genes translating the five fourfold codon families in the GC-rich S. coelicolor have G or C at the wobble site, and should favor the use of C-ending or G-ending codons. This most likely has contributed to the high GC% at the third codon position in S. coelicolor. In these two cases, mutation bias and tRNA-mediated selection are in the same direction to drive up or down GC% at the third codon position. The same pattern is observed for twofold codon families. The most conspicuous one is the Gln codon family (CAA and CAG). There is only one tRNAGln gene in M. capricolum with a UUG anticodon favoring the CAA codon. In contrast, there are two tRNAGln in S. coelicolor, both with a CUG anticodon favoring the CAG codon. Thus, the high slope for the third codon position in Fig. 9.3 is at least partially attributable to the tRNA-mediated selection. Relative contribution of mutation and tRNA-mediated selection to codon usage has been evaluated in several recent studies (Carullo and Xia 2008; Xia 2005, 2008; Xia et al. 2007).

The second misconception arising from Fig. 9.3 is that the frequency of G-ending and C-ending codons will increase and A-ending and U-ending codons decrease, with genomic GC% or GC-biased mutation (Kliman and Bernal 2005). This is not generally true (Palidwor et al. 2010). Take the arginine codons, for example. Given the transition probability matrix for the six synonymous codons shown in Table 9.6, the equilibrium frequencies (π) for the six codons are
$$ {\displaystyle \begin{array}{l}{\pi}_{\mathrm{AGA}}=\frac{1}{2{k}^2+3k+1}\\ {}{\pi}_{\mathrm{AGG}}={\pi}_{\mathrm{CGA}}={\pi}_{\mathrm{CGT}}=\frac{k}{2{k}^2+3k+1}\\ {}{\pi}_{\mathrm{CGC}}={\pi}_{\mathrm{CGG}}=\frac{k^2}{2{k}^2+3k+1}\end{array}} $$
(9.1)

The three solutions correspond to the number of GC in the codon, with AGA having one, AGG, CGA and CGT having two, and CGC and CGG having three G or C. One may note that the G-ending codon AGG has the same equilibrium frequency as that of the A-ending CGA and the T-ending CGT. Thus, we should not expect A-ending or T-ending codons to always decrease or G-ending and C-ending codons always increase, with increasing genomic GC% or GC-biased mutation. In fact, according to the solutions in Eq. (9.1), πAGG, πCGA, and πCGT will first increase with k until k reaches \( \sqrt{2}/2 \) and will then decrease with k when k > \( \sqrt{2}/2 \) (Palidwor et al. 2010).

1.5 Two Hypotheses on Translation Elongation Efficiency

It is controversial as to what degree is protein production limited by translation elongation. Early theoretical considerations (Andersson and Kurland 1983; Bulmer 1990, 1991; Liljenstrom and von Heijne 1987) tend to favor the argument that translation elongation is not rate-limiting in protein production, but translation initiation is. This hypothesis does not deny the existence of codon adaptation, but it asserts that codon-anticodon adaptation and increased elongation efficiency are not related to protein production. Instead, the benefit of codon adaptation and increased elongation efficiency is to increase ribosomal availability for global translation. This hypothesis was explicitly formulated only recently and empirically tested (Kudla et al. 2009).

We thus have two alternative hypotheses attributing different benefits to codon-anticodon adaptation. The first assumes that protein production is rate-limited by both initiation and elongation and codon-anticodon adaptation would result in higher elongation efficiency and more efficient and accurate protein production, especially for HEGs . The second claims that protein production is rate-limited only by initiation efficiency but improved codon adaptation and consequently increased elongation efficiency have the benefit of increasing ribosomal availability for global translation.
Table 9.6

Transition probability matrix for the six synonymous arginine codons, with α for transitions (C↔T and A↔G), β for transversions, and k modeling AT-biased mutation (0 ≤ k ≤ 1) or GC-biased mutation (k > 1)

 

CGT

CGC

CGA

CGG

AGA

AGG

CGT

 

β

0

0

CGC

α

 

β

β

0

0

CGA

β

 

β

0

CGG

β

β

α

 

0

β

AGA

0

0

0

 

AGG

0

0

0

α

 

We ignore nonsynonymous substitutions because nonsynonymous substitution rate is often negligibly low compared to synonymous rate. The diagonal is constrained by the row sum equal to 1

How should we go about testing these two hypotheses? Note that the two hypotheses make different predictions about the relationship among three variables: (1) translation initiation efficiency, (2) translation elongation efficiency, and (3) protein production. Before we can test these two hypotheses, we need to understand how these variables can be measured. The previous chapter outlines a few factors contributing to translation initiation efficiency. Here we first learn a few indices of codon usage bias as a proxy for translation elongation efficiency and then include them in the test of the two hypotheses in the section illustrating the application of index of translation elongation (Xia 2015).

1.6 Wobble Hypothesis and Its Extensions

The wobble hypothesis is proposed to explain how a set of tRNA molecules can decode all sense codons which are much larger in number. The wobble-pairing rules are specified in Fig. 9.4, together with the numbering system used here for individual codon and anticodon sites that is more precise than, but different from, the conventional one. The original wobble hypothesis (Crick 1966), with its extended codon-anticodon base pairs (Fig. 9.4), played a crucial role in understanding the working of the translation machinery. It explains why tRNAIle/IAU, where I in IAU is inosine derived from A, is able to translate all three Ile codons (AUC, AUU, and, albeit inefficiently, AUA), why a tRNA with a GI can translate Y-ending codons (where Y stands for C or U), and why a tRNA with a UI can translate R-ending codons (where R stands for A or G). The hypothesis also explains the lack of AI in tRNA genes for decoding twofold Y-ending codon family because such a tRNA, when its AI is modified to II, would misread the near-cognate R-ending codons.
Fig. 9.4

Base pairs between nucleotides at the first anticodon site (which can have I, G, C, U but rarely A) and the third codon site. The inset shows the site numbering system of codon and anticodon, with codon sites subscripted with 1, 2 and 3 and anticodon sites subscripted with I, II, and III, which is illustrated by the paring of II/C3, GII/C2, CIII/G1.

Wobble pairing reduces the number of tRNAs needed for translation and simplifies the translation machinery. As an example of parsimonious tRNA usage, the Y-ending codons, be they in twofold or fourfold codon families, are decoded by tRNAs with either a II or a GI, but never both. This rule is obeyed in all three kingdoms of life. Almost all fourfold codon families in Mycoplasma pulmonis (including the Ser UCN codon family and Leu CUN codon family) are decoded by a single tRNA species with a UI, except for the Thr ACN and Arg CGN codon families which are each decoded by two tRNA species, one with a UI and other with a GI. The most dramatic simplification of tRNome is observed in vertebrate mitochondria, e.g., vertebrate mitochondrial genomes which contain only 22 tRNA genes, with each tRNA species decoding a codon family. Instead of separate initiation tRNAiMet/CAU and elongation tRNAeMet/CAU present in all nuclear genomes, a single tRNAMet/CAU, with a modified CI, decodes both the initiation AUG codon and internal Met AUR codons. Each Y-ending codon family is decoded by a single tRNA species with a wobble GI and each R-ending codon family by a single tRNA with a wobble UI which is modified to prevent its pairing with U or C. All fourfold codon families are decoded by a tRNA with a wobble UI which is not modified.

Wobble pairing is not without cost as it often reduces translation efficiency and accuracy and is generally avoided (Xia 2008). For example, an II/A3 pair is bulky because it involves two purines (Fig. 9.4) in contrast to other base pairs which typically involve a large purine and a small pyrimidine. For this reason, Ile is rarely coded by AUA except for certain viruses with a strong A-biased mutation (van Weringh et al. 2011). Among a set of highly expressed genes in the yeast ( Saccharomyces cerevisiae ), AUA is not used at all (Table 9.5). Similarly, a tRNA with a UI can translate A-ending codons better than G-ending codons (Grosjean et al. 2010; Xia 2008). Most of the yeast tRNAArg have a UI, and only one AGG codon is found in contrast to 314 AGA codons in highly expressed yeast genes (Table 9.4). Yeast genomic data also suggest that a tRNA with a GI can translate C-ending codons better than U-ending codons. For example, the yeast tRNAAsn genes translating the Asn AAY codon family all have a GI. Among 219 Asn codons in highly expressed yeast genes, only 11 are AAU codons, suggesting strong selection against AAU codons in favor of AAC codons (Table 9.4). Note that the yeast genome is strongly AT-biased. If there is no selection against AAU codons, we would expect more AAU codons than AAC codons, which is contrary to the observed frequencies. However, the selection against GI/U3 pair is in general much weaker than that against UI/G3 pair. In fungal mitochondrial genomes, there is no avoidance of GI/U3 pair in favor of GI/C3 pair, although UI/G3 pair is strongly avoided in favor of UI/A3 pair (Xia 2008). The weak, or lack of, selection against GI/U3 can explain several puzzling counterexamples against the codon-anticodon adaptation theory (Bulmer 1991; Ikemura 1981b; Xia 1998a) which states that the most frequently used codon in each synonymous codon family should form Watson-Crick base paring with the anticodon of the most abundant tRNA species to reduce translation error and increase translation efficiency. For example, Cys codons (UGY) are translated by tRNACys/GCA in both cytoplasm and mitochondria in the yeast, yet most Cys codons have U3. If there is little selection against GI/U3 pair (i.e., GI/U3 pair is as efficient and accurate as GI/C3 pair), then the frequencies of UGC and UGU will be mostly determined by AT-bias. Because the yeast nuclear and mitochondrial genomes are both AT-rich, we have more UGU codons than UGC codons, in spite of GI in tRNACys. The weak selection against GI/U3 but strong selection against UI/G3 also explains why Y-ending codons are typically translated by a tRNA with a GI, whereas R-ending codons are typically translated by two different tRNAs, one with a UI and the other with a CI (Xia 2008).

The wobble hypothesis points to the necessity of nucleotide modification in tRNA to either increase or decrease the wobble versatility to improve accuracy and efficiency of translation. The observation that an unmodified UI can pair with all N3 in many mitochondrial genomes suggests that UI in tRNA for twofold R-ending codon families needs to be modified to restrict its wobble versatility to avoid misreading the near-cognate Y-ending codons. Chemical modification of UI to restrict its pair versatility to R3 in twofold R-ending codon family is universal in all three kingdoms of life and in organelles (Grosjean et al. 2010; Lim 1994). On the other hand, the tRNAMet/CAU in vertebrate mitochondria need to read both the initiation AUG codon and the internal AUG and AUA codons, and its CI is modified to f5CI to increase its wobble versatility so as to form a f5CI/A3 pairing between the anticodon and the AUA codon. Nucleotide modification in tRNA has been extensively reviewed (Grosjean et al. 2010) and chemically detailed in MODOMICS (Czerwoniec et al. 2009).

Wobble pairing implies the theoretical possibility of adding new base pairs of novel nucleotides to protein-coding genes to increase the coding capacity (Hirao and Kimoto 2010). A single novel base pair, involving two novel nucleotides, would increase the number of codons from 64 to 216 (=63), and one can then use these extra codons, together with engineered tRNAs to recognize these codons and to carry new amino acid analogs, to produce novel proteins.

The wobble hypothesis can be extended to explain the lack of UCG anticodon in Arg CGN codon family in a large number of evolutionary lineages. A tRNA species with a wobble UI is almost always present among tRNA species decoding fourfold codon families and twofold R-ending codon families, with most exceptions observed in the Arg CGN codon family. In the mitochondrial genomes of Caenorhabditis elegans (metazoan), Marchantia polymorpha (plant), Pichia canadensis (fungus), and Saccharomyces cerevisiae (fungus), there is no tRNAArg/UCG, and Arg CGN codon family is decoded by tRNAArg/ACG (Xia 2005). The lack of tRNAArg/UCG in the mitochondrial genome of these diverse taxa suggests that the lack is an ancestral state and that the presence of tRNAArg/UCG in vertebrate mitochondria is a derived state. This is substantiated by the fact that almost all eubacterial species, from which the mitochondrion was originally derived, lack tRNAArg/UCG (Grosjean et al. 2010).

The expanded wobble hypothesis for the lack of tRNAArg/UCG requires an extension of the wobble hypothesis by invoking wobble paring between the third anticodon site (NIII) and the first codon site (N1), conditional on a CII/G2 or GII/C2 with three hydrogen bonds. Thus, the anticodon UCG would wobble-pair with stop codon UGA through a wobble GIII/U1 pair and should therefore be strongly selected against (Carullo and Xia 2008). This explains not only the absence of tRNAArg/UCG in diverse evolutionary lineages but in particular why tRNAArg/UCG is absent in most eubacterial species and ancestral mitochondrial lineages where UGA is used as a stop codon and why it is present in derived mitochondrial lineages such as vertebrate mitochondrial genomes where UGA is no longer used as a stop codon.

2 Commonly Used Codon Usage Indices

There are two key factors contributing to codon usage bias : the mutation bias (Osawa et al. 1987) and the tRNA-mediated selection (Ikemura 1981a, 1982, 1992; Xia 1998a, 2015). There are also two types of codon usage indices, but they do not correspond to the two factors shaping codon usage. The first type of codon usage indices is codon-specific best represented by relative synonymous codon usage (RSCU , Sharp et al. 1986), which measures deviation of codon usage from equal usage. The second type of codon usage indices is gene-specific with several well-known representatives including codon adaptation index effective number of codons (ENC, Sun et al. 2013; Wright 1990), codon adaptation index (CAI, Sharp and Li 1987; Xia 2007c), codon bias index (CBI, Bennetzen and Hall 1982), frequency of optimal codons (Fop, Ikemura 1985), tRNA adaptation index (tAI, dos Reis et al. 2004), and index of translation elongation (ITE , Xia 2015).

ENC aims to measure deviation of codon usage from equal usage and may be considered as the gene-specific equivalent of the codon-specific RSCU . They are both descriptive and do not distinguish between mutation bias or tRNA-mediated selection in their contribution to codon usage bias . All other gene-specific indices aim to measure the intensity of the tRNA-mediated selection on codon usage bias. A gene encoding a mass-produced (highly expressed) protein is expected to be under stronger selection to optimize its codon usage corresponding to differential tRNA availability than a gene encoding lowly expressed protein, and we expect CAI , CBI, tAI, and ITE to be greater for the highly expressed gene than the lowly expressed gene. However, CAI, CBI, and tAI ignore background mutation bias. ITE is a generalization of CAI, by incorporating background mutation, and is reduced to CAI when there is no background mutation bias (Xia 2015).

Codon indices that aim to measure tRNA-mediated selection (i.e., CAI , CBI, Fop, tAI, and ITE ) all define a translationally optimal codon (TOC) within each codon family, and the codon usage index value will be the highest if all codons in a gene are TOCs. However, TOC is defined differently among these indices. CBI, Fop, and tRNA define a TOC mainly as one that corresponds to the most abundant isoacceptor tRNA , with CBI incorporating gene expression information as well. CAI defines a TOC as one in its codon family that is used most frequently in HEGs . ITE defines a TOC as one in its codon family that is used most frequently in HEGs after adjustment of mutation bias reflected in LEGs . Comparative studies (Coghlan and Wolfe 2000; Comeron and Aguade 1998) suggest that CAI is better than ENC, CBI, and Fop in predicting gene expression levels, tAI is better than CAI (dos Reis et al. 2004; Tuller et al. 2010), and ITE is better than CAI and tAI (Xia 2015). However, such comparison depends not only on the methods but also on the quality of the software that implements the methods. A good method could be conceptually sound but implemented erroneously and generate poor results. Moreover, the same index could be implemented differently. For example, one implementation could treat all synonymous codons into one family so that some codons could have six or even eight synonymous codons (trematode mitochondrial code has eight Ser codons: UCN and AGN), whereas another implementation would break all compound codon families, such as Leu, Ser, and Arg codon families, into separate fourfold and twofold codon families.

2.1 RSCU (Relative Synonymous Codon Usage)

RSCU measures codon usage bias for each codon within each codon family. It is essentially a normalized codon frequency so that the expectation is 1 when there is no codon usage bias. A codon is overused if its RSCU value is greater than 1 and underused if its RSCU value is less than 1. It is computed directly from input sequences.

2.1.1 Calculation of RSCU

The general equation for computing RSCU is
$$ {\mathrm{RSCU}}_{ij}=\frac{{\mathrm{CodFreq}}_j}{\frac{\left(\sum \limits_{j=1}^{{\mathrm{NumCodon}}_i}{\mathrm{CodFreq}}_i\right)}{{\mathrm{NumCodon}}_i}} $$
(9.2)
where i refers to a codon family and j to a specific codon within the family. For example, i may refer to the alanine codon family with four codons (GCU, GCC, GCA, and GCG) and j to a specific codon such as GCU. In this case, the numerator is the frequency of GCU, and the denominator is the summation of the four codon frequencies divided by the number of codons in the codon family, i.e., 4.
For biology students, it is always easier to learn by numerical examples. Suppose we counted the codon frequencies of one particular protein-coding sequence and have obtained the codon frequencies (Table 9.7). The RSCU for the GCU codon is computed, according to Eq. (9.2), as
$$ {\mathrm{RSCU}}_{\mathrm{GCU}}=\frac{52}{\frac{\left(52+91+103+2\right)}{4}}=0.84 $$
(9.3)
which is displayed in Table 9.7. Biology students are recommended to cover up the last column in Table 9.7 and finish the computation of the rest of the RSCU values.
Table 9.7

Data for illustrating the calculation of RSCU

Codon

AA

N

RSCU

GCU

Ala

52

0.84

GCC

Ala

91

1.47

GCA

Ala

103

1.66

GCG

Ala

2

0.03

GAA

Glu

78

1.64

GAG

Glu

17

0.36

AA amino acid, T codon frequency

2.1.2 Illustration of RSCU Applications

As I mentioned earlier, a variable such as RSCU is often not interesting by itself, but it becomes more interesting when you relate the variable to some other variables. Figure 9.5 shows the correlation of RSCU for Escherichia coli genes and that for the E. coli double-stranded DNA (dsDNA) phage TLS. This strong and positive correlation suggests adaptation of host tRNA pool. This adaptation the phage genes and the host genes to the same tRNA pool in E. coli cells and the evolution of the very similar codon usage patterns is an example of convergent evolution, i.e., phylogenetically remote organisms evolving similar features not due to coancestry, but in response to the same selection regime induced by the same environment.
Fig. 9.5

Correlation in RSCU between Escherichia coli and its double-stranded DNA phage TLS

What explanation would you offer if we find little correlation in RSCU between a phage and its host? There are in fact a large number of cases in which a virus and its host share little similarity in codon usage. Will such cases invalidate our convergent evolution explanation for the strong and positive correlation between phage TLS and its E. coli host? Science thrives in questions, and such questions immediately drive us to search for answers, and the answers enrich our explanatory conceptual framework. Ronald Fisher once said that “No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed” (Fisher 1926).

There are at least six factors that will weaken the correlation in RSCU between a virus and its host. First, some dsDNA phages carry many tRNA genes of their own genome, and the transcription of these tRNA genes would modify the host tRNA pool. For example, another dsDNA E. coli phage, enterobacteria phage WV8, carries 20 tRNA genes on its genome. In such cases, the phage genes would adapt to the modified tRNA pool which may be different from the tRNA pool where E. coli mRNAs are translated normally (i.e., without phage infection). Partly for this reason, the correlation in RSCU between enterobacteria phage WV8 and its E. coli host is much weaker than that shown in Fig. 9.5 (Chithambaram et al. 2014a). Phage TLS (Fig. 9.5) happens to have a genome that does not encode any tRNA genes of its own. So it depends entirely on the host tRNA pool to decode the codons of its genes.

Second, codon usage adaptation takes time. If a phage having adapted to one host has switched to a new host, and if the original host and the new host differ in their tRNA pools, then the phage codon usage will be more similar to that of the original host than the new host. This may be applicable to phage PRD1 which belongs to the peculiar Tectiviridae family with members parasitizing both gram-negative and gram-positive bacteria. Phage PRD1 is the only species in the family known to parasitize gram-negative bacteria, with other members of the family, i.e., phages PR3, PR4, PR5, L17, and PR772, parasitizing gram-positive bacteria (Bamford et al. 1995; Grahn et al. 2006). It is reasonably safe to assume that the phage PRD1 lineage has switched host from gram-positive to gram-negative bacteria. Furthermore, there is only one amino acid difference in the coat protein between phages PRDl and PR4 (Bamford et al. 1995). This suggests that PRD1 is phylogenetically close to its relative parasitizing gram-positive, i.e., the host-switching may have occurred quite recently. In fact, codon usage in phage PRD1 is more similar to that in gram-positive bacteria than in gram-negative bacteria (Chithambaram et al. 2014b). Among 87 bacterial genomes covering major groups of bacterial species, the host species with codon usage most similar to that of phage PRD1 are strains in the gram-positive Geobacillus (NC_014206, NC _012793, NC_014650, NC_014915, NC_013411).

Third, a phage with a wide range of host species may imply diverse tRNA pools that would represent fluctuating selection with different optima. Phage PRD1 mentioned above does have a variety of gram-negative bacteria as hosts, including Salmonella, Pseudomonas, Escherichia, Proteus, Vibrio, Acinetobacter, and Serratia species (Bamford et al. 1995; Grahn et al. 2006). However, this diverse array of hosts actually have rather similar codon usage, so host variability is not a good explanation for the lack of similarity in codon usage between PRD1 and E. coli (Chithambaram et al. 2014b).

Fourth, the tRNA-mediated selection differs in its effectiveness between temperate phages (i.e., those with lysogeny) and virulent phages (i.e., those without lysogeny). The lysogenic phase effectively hides protein-coding genes of the phage from tRNA-mediated selection, and the phage codon usage will be at the mercy of mutation bias in the host genome. In contrast, virulent phages have their codon usage under tRNA-mediated selection every time they enter the host cell. For this reason, one would expect better codon usage adaptation in virulent phages than in temperate phages, which is true (Prabhakaran et al. 2015).

Fifth, mass translation of phage mRNA often occurs in the late infection phase when the host cellular environment has already been dramatically altered, presumably with a quite different tRNA pool in the late phase from that in the early phase. In vaccinia virus, the degradation of host mRNA appears nearly complete 6 h after the viral infection as no host poly(A) mRNA is detectable at/after this time (Katsafanas and Moss 2007). Shutdown or drastic alteration of host protein and RNA expression implies that many tRNA species are no longer sequestered for host translation, which would dramatically alter availability of different tRNA species. Many other viruses, including hepatitis C (Chan and Egan 2009), SARS (Minakshi et al. 2009), Japanese encephalitis virus (Su et al. 2002), and coxsackie B2 virus (Zhang et al. 2010), can induce stress responses such as the UPR (unfolded protein response) in late phase. URP often results in the shutdown of transcription of ribosomal RNAs as well as repression of translation via phosphorylation of eukaryotic translation initiation factor eIF-2α (DuRose et al. 2009). All these suggest that the tRNA pool in the late phase differs from that in the normal cell. If codon usage of phage genes adapts to the altered tRNA pool in the late phase, whereas that of host genes adapts to the tRNA pool and normal cells, then we should not expect the parasite and the host share high similarity in codon usage. Interestingly, HIV-1 early genes have RSCU positively correlated with RSCU of human genes, but HIV-1 late genes have RSCU values negatively correlated with RSCU of human genes (van Weringh et al. 2011).

Sixth, if mutation bias is in different direction from tRNA-mediated selection, e.g., if tRNA-mediated selection favors Y-ending codons whereas mutation bias favors R-ending codons (where Y and R stand for pyrimidine and purine, respectively), then strong mutation bias will disrupt selection. This may well be the case for the poor codon adaptation in HIV-1. According to a recent compilation of tRNAs in human genome (Chan and Lowe 2009), the AUC codon can be translated by 17 tRNAIle species (14 tRNAIle/IAU and 3 tRNAIle/GAU) and AUU can be translated by 14 tRNAIle/IAU species, whereas AUA can be translated by only 5 tRNAIle/UAU species. In agreement with the tRNA-mediated selection, human genes code Ile mostly by AUC and least by AUA. In contrast, HIV-1 genes code Ile mostly by AUA and least by AUC (Haas et al. 1996; Nakamura et al. 2000). The poor codon adaptation of HIV-1 (Fig. 9.6a) reduces the translation efficiency of HIV-1 genes. Modifying HIV-1 codon usage according to host codon usage has been shown to increase the production of viral proteins (Haas et al. 1996; Ngumbela et al. 2008). The high frequency of maladaptive AUA codons in HIV-1 genes is due to high A-biased mutation at the third codon position of HIV-1 genes (Jenkins and Holmes 2003). The A-bias is mediated by the error-prone reverse transcriptase (Martinez et al. 1994; Vartanian et al. 2002) and the human APOBEC3 protein (Yu et al. 2004). The frequency of A can reach up to 40% in some HIV-1 genomes (Vartanian et al. 2002), resulting in a preponderance of A-ending codons which are typically rarely used in the human HEGs (Kypr and Mrazek 1987; Sharp 1986).
Fig. 9.6

Relative synonymous codon usage (RSCU) of HIV-1 (a) and HTLV-1 (b) plotted against RSCU of highly expressed human genes. Modified from van Weringh et al. (2011)

One would predict a better correlation in RSCU between HIV-1 genes and highly expressed human genes. One viral species that may shed light on this prediction is HTLV-1 which infects the same type of host cell as HIV-1. Both HIV-1 and HTLV-1 are retroviruses with RNA genomes, but HTLV-1 is exceptional in that it does not have a strong A-biased mutation (Van Dooren et al. 2004; van Hemert and Berkhout 1995). HTLV-1 relies for the most part on the host polymerase to replicate through clonal expansion of infected cells rather than undergoing iterative replication cycles like HIV-1 (Strebel 2005). The substitution rate of HTLV-1 is consequently lower, about 5.2 × 10−6 substitutions/site/year (Hanada et al. 2004; Van Dooren et al. 2004), whereas that of HIV-1 is around 2.5 × 10−3 substitutions/site/year (Hanada et al. 2004). Thus, although HTLV-1 infects the same cells as HIV-1, i.e., human CD4+ T cells (Rimsky et al. 1988), and both viruses are therefore subject to the same selective pressures on codon usage by the host tRNA pool, mutations are less likely to disrupt codon-anticodon adaptation in HTLV-1 than in HIV-1 as they occur at a lower rate in the former. The positive correlation in RSCU between HTLV-1 and highly expressed human genes (Fig. 9.6b) is highly significant (Pearson r = 0.4982, p < 0.0001, Spearman r = 0.4688, p = 0.0002).

2.2 CAI (Codon Adaptation Index)

CAI has been used extensively in biological research. Other than its primary use for measuring the efficiency of translation elongation, it has contributed to the finding that functionally related genes are conserved in their expression across different microbial species (Lithwick and Margalit 2005), to the prediction of protein production (Futcher et al. 1999; Gygi et al. 1999), and to the optimization of DNA vaccines (Ruiz et al. 2006).

2.2.1 Calculation of CAI

While RSCU characterizes codon usage bias in each codon family, CAI quantifies the codon usage bias in one gene. It is based on (1) the codon frequencies of the gene and (2) the codon frequencies of a set of known HEGs (often referred to as the reference set). The reference set of genes is used to generate a column of w values computed as

$$ {w}_{ij}=\frac{{\mathrm{RefCodFreq}}_{ij}}{{\mathrm{RefCodFreq}}_{i.\max }} $$
(9.4)
where RefCodFreqij is the frequency of codon j in synonymous codon family i and RefCodFreqi.max is the maximum codon frequency in synonymous codon family i. For example, if the four alanine codons GCA, GCC, GCG, and GCU have frequencies 20, 4, 4, and 2, respectively, then their associated w value are 1, 0.2, 0.2, and 0.1, respectively. The codon whose frequency is RefCodFreqi.max is often referred to as the major codon (whose w is 1), and the other codons in the synonymous codon family are referred to as minor codons. The major codon is assumed to be the translationally optimal codon.

It is easy to see the relationship between wij and RSCU . The former is obtained by dividing each RSCU by the largest RSCU value within each codon family. With the w values for a particular species, we can now compute the CAI value of any protein-coding sequence from the species by using the following equation:

$$ \mathrm{CAI}={e}^{\left(\frac{\sum \limits_{i=1}^n\left[{\mathrm{CodFreq}}_i\ln \left({w}_i\right)\right]}{\sum \limits_{i=1}^n{\mathrm{CodFreq}}_i}\right)} $$
(9.5)
where n is the number of sense codons (excluding codon families with a single codon, e.g., AUG for methionine and UGG for tryptophan in the standard genetic code). Note that the exponent is simply a weighted average of ln(w). Because the maximum of w is 1, ln(w) will never be greater than 0. Consequently, the exponent will never be greater than 0. Thus, the maximum CAI value is 1. The minimum CAI depends on the w values for minor codons in each codon family. If the minor codons all have w values close to zero, then the minimum CAI will also be very close to zero.
The calculation of CAI is numerically illustrated in Table 9.8 for a gene whose observed codon frequency is in column ObsFreq (Table 9.8). The codon frequency of the highly expressed reference set is in column “RefCodFreq.” The column “w” is obtained by dividing RefCodFreq values by the largest value in the codon family. For example, the first w value in the table, 0.606, is obtained by dividing RefCodFreq value 195 by the largest RefCodFreq value in the alanine codon family, i.e., 322. We take a weight average of ln(w) as shown in Eq. (9.5) and then exponentiate it to obtain CAI.
Table 9.8

Illustration of CAI calculation for a gene whose observed codon frequencies are in column “ObsFreq”

Codon

AA

ObsFreq

RefCodFreq

w

GCA

A

1

195

0.606

GCU

A

15

322

1.000

GCG

A

0

81

0.252

GCC

A

8

242

0.752

UGC

C

3

123

1.000

UGU

C

3

112

0.911

GAU

D

9

69

1.000

GAC

D

11

40

0.580

GAG

E

11

289

0.863

GAA

E

14

335

1.000

UUU

F

3

118

0.554

UUC

F

9

213

1.000

 

The codon frequency of the highly expressed reference set is in column “RefCodFreq.” The column “w” is obtained by dividing RefCodFreq values by the largest value in the codon family

The way w is calculated implies that, if a protein contains only methionine and tryptophan, both encoded by a single codon (AUG and UGG, respectively, in standard code), then the gene will have the highest CAI value of 1 because w values are 1 for such codons. Similarly, a gene with many AUG and UGG codons would have high CAI values even if it is not under any tRNA-mediated selection. For this reason, a good implementation of CAI should exclude single-member codon families from CAI calculation.

I have previously mentioned that codon usage indices such as CAI can be implemented differently with different classification of codon families, so gene A could have a higher CAI value than gene B from one software, but the opposite from another software. I wish to illustrate this so that the reader can better interpret their results.

In highly expressed yeast genes (e.g., compiled in the Eyeastcai.cut in EMBOSS distribution), CGU is by far the most frequent codon in the CGN (coding for arginine) codon family. The overuse of CGT and the avoidance of CGG, CGA, and CGC codons in highly expressed yeast genes make sense because the yeast genome contains six tRNAArg genes with anticodon ACG forming Watson-Crick base pairing with the CGT codon, but no other tRNAArg gene forming Watson-Crick base pairing with the other three CGN codons (the nucleotide A in anticodon ACG is modified to inosine but still pairs with U better than with other nucleotides). While this illustrates well the codon-anticodon adaptation, it causes practical problems with computing CAI .

Suppose we now use a sequence consisting entirely of CGU codons and expect the resulting CAI to be 1 by using the Eyeastcai.cut reference set. The resulting CAI value from the EMBOSS.cai program is 0.140 instead of 1. It turns out that amino acid arginine is coded by two codon subfamilies, the CGN codon family we have mentioned and the AGR codon family. The largest codon frequency among these six codons is 314 (for AGA codon) in Eyeastcai.cut. So the w value for CGT is not 1 (43/43) as we have thought but is only 0.1369 (= 43/314). For this reason, some CAI-calculating programs, e.g., DAMBE (Xia 2013, 2017d), may separate compound codon families such as the arginine family into two separate families, one twofold and one fourfold.

2.2.2 Illustration of CAI Applications

The most obvious application of CAI or related codon usage indices is to optimize codon usage to optimize protein expression. Many experiments have demonstrated increased protein production by optimizing codon usage and decreased protein production if codons are replaced by rarely used ones (Haas et al. 1996; Kaishima et al. 2016; Ngumbela et al. 2008; Robinson et al. 1984; Sorensen et al. 1989). There are claims that codon optimization does increase protein production (e.g., Kudla et al. 2009), but these claims were found to be due to wrong data analysis (Tuller et al. 2010; Xia 2015) and will be dealt with on a later section on ITE (Xia 2015). Below I list two less obvious applications of CAI.

2.2.2.1 Does High Mutation Rate Prevent HIV-1 Genes from Evolving Codon Adaptation?

I have mentioned in the section on RSCU that the lack of concordance in codon usage between HIV-1 and human genes was conventionally explained by high mutation rate in HIV-1, based on the observation that (1) HIV-1 genome is known to experience strongly A-biased mutations, (2) usage of A-ending codons in HIV-1 genes is particularly different from that of the host genes, and (3) HTLV-1 that parasitizes the same human CD4+ T cells but has reduced mutation rate does have codon usage similar to human genes (Fig. 9.6b). Thus, the lack of concordance in codon usage between HIV-1 and human genes is interpreted as poor codon adaptation caused by high mutation rate disrupting codon adaptation.

However, van Weringh et al. (2011) objected to this interpretation. They argued that the lack of concordance in codon usage between HIV-1 and human genes is not due to poor codon adaptation in the part of HIV-1 genes, but because HIV-1 genes, especially the late genes, have adapted to a tRNA pool that is fundamentally different from that in a normal human CD4+ T cell. What originally prompted them to formulate this hypothesis is the observation that CAI for HIV-1 early genes are significantly greater than CAI for HIV-1 late genes when highly expressed human genes are used as reference genes. These late genes encode mass-translated HIV-1 structural proteins and are typically expected to have higher CAI than the relatively lowly expressed early genes. So it is thus a surprise to see late genes having smaller CAI than early genes, unless the mass-translated late genes adapt to a tRNA pool different from the early genes.

van Weringh et al. (2011) investigated experimentally measured tRNA abundance in the human cell when the late HIV-1 genes are translated and HIV-1 virions are produced. The tRNA pool for the late genes is indeed different in the expected direction, supporting their hypothesis that the lack of concordance in codon usage between HIV-1 and human genes is not due to poor codon adaptation in HIV-1 genes but because HIV-1 genes, especially the late genes, have adapted to a tRNA pool different from the one with which highly expressed human genes are translated (van Weringh et al. 2011).

2.2.2.2 Detecting Horizontally Transferred Genes
CAI has also been used jointly with a reformulated effective number of codons (Nc, Sun et al. 2013) to detect horizontally transferred genes. E. coli genes with a strong codon usage bias typically have high CAI values. However, three genes (yagF, yagG, and yagH) from the defective CP 4–6 prophages of E. coli (Wang et al. 2010) have strongly biased codon usage (small Nc values) but relatively small CAI values. This codon usage pattern sets the three genes apart from the rest of E. coli genes (Fig. 9.7) which highlight the value of using the “Nc versus CAI” plot to detect recently horizontally transferred genes. These genes have been “naturalized” in E. coli genome and contribute to E. coli survival and growth (Wang et al. 2010).
Fig. 9.7

Plot of CAI against a reformulated effective number of codons (Nc, Sun et al. 2013) for E. coli genes facilitates the detection of newly “immigrant” genes that exhibit codon usage bias different from the “native” genes. Three E. coli genes (yagF, yagG, and yagH) from the defective CP 4–6 prophages of E. coli (Wang et al. 2010) have strongly biased codon usage (relatively small Nc) but relatively poor codon adaptation (mediocre CAI values). The red points represent 179 annotated E. coli pseudogenes (NC_000913) that have not accumulated frameshifting mutations

The largest mucin gene (mucin 14A) in Drosophila melanogaster also exhibits strong codon usage bias (Nc = 38.6), but in the direction opposite to those highly expressed D. melanogaster genes. Its CAI value is equal to 0.1277, which is the second smallest among all D. melanogaster genes. It is unknown how and why the gene has evolved to have such a peculiar feature.

The distribution of CAI values for the 179 annotated pseudogenes are indicated in red. These pseudogenes have not accumulated frameshifting mutations and presumably were pseudogenized only recently. They tend to be clustered on the lower end of CAI distribution, suggesting that genes with high CAI values require tRNA-mediated selection to maintain the high CAI values.

The gene with the smallest CAI is mgtL, which has only 17 sense codons and is a bacterial mRNA leader that controls the expression of the downstream mgtA (Park et al. 2010). The low CAI is not due to stochastic fluctuation due to small number of codons but because almost all used codons are minor codons. This may represent a real case of a gene preferring minor codons to facilitate its regulatory function.

2.2.3 Problems with CAI and Other Gene-Specific Codon Usage Indices

There are major problems with CAI and other commonly used codon usage indices. While some minor problems have been addressed before (Xia 2007c), the key issue of properly inferring translationally optimal codons (TOCs) remains unresolved. These gene-specific codon usage indices all need to infer TOCs, by using two types of information. The first, represented by tAI (dos Reis et al. 2004), uses the most abundant tRNA and its anticodon to infer TOC within each codon family, i.e., the codon that base-pairs best with the most abundant tRNA is the TOC. The second, represented by CAI, considers the most frequent codon in HEGs as the TOC within each codon family. I will outline the problems to pave the way for the presentation of a new index of translation elongation in the next section (ITE , Xia 2015).

2.2.3.1 Problem with Codon Usage Indices Using tRNA Abundance to Infer TOCs

For indices such as tAI that use tRNA abundance information to define TOCs, the main problem is that TOCs cannot be inferred reliably from tRNA gene copy numbers or experimentally measured tRNA abundance. For example, inosine is expected to pair best with C and U, less with A (partly because of the bulky I/A pairing involving two purines), and not with G. However, tRNAVal/IAC from rabbit liver pairs better with GUG codon than with other synonymous codons (Jank et al. 1977; Mitra et al. 1977). No one would have identified GUG as the best codon for tRNAVal/IAC without actually seeing the experimental result.

Similarly, the Bacillus subtilis genome codes tRNAAla/GGC for decoding GCY codons. One would have thought that GCC codon, which forms Watson-Crick base pairing with the anticodon, would be translationally more optimal than GCU. However, GCU is used much more frequently than GCC in HEGs than LEGs in B. subtilis . We have encountered a similar example in Table 9.4 involving Cys codon usage in HEGs. There are four tRNACys genes with the same anticodon GCA forming Watson-Crick base pair with UGC codon, but no tRNACys gene with anticodon forming Watson-Crick base pair with the alternative UGU codon. We would have taken UGC as the TOC. However, UGU is used far more frequently than UGC codon in highly expressed yeast genes relative to LEGs. In short, in all these cases we would be wrong to use the most abundant tRNA species and its matching codon to infer TOC.

There is one more reason for tRNA abundance not able to reliably predict TOCs. What matters in translation elongation is not the abundance of transcribed tRNAs but the availability of charged tRNAs. It is tedious to determine the level of charged tRNAs, and researchers typically would use transcriptionally determined tRNAs or even the number of tRNA genes in the genome as a proxy of charged tRNAs. Unfortunately, the abundance of tRNAs often do not reflect the abundance of charged tRNA (Elf et al. 2003).

Furthermore, codon-anticodon base pairing is known to be context-dependent (Lustig et al. 1989). For example, a wobble cmo5U in the anticodon of tRNAPro, tRNAAla, and tRNAVal can read all four synonymous codons in the respective codon family, but the same cmo5U in tRNAThr cannot read C-ending codons (Nasvall et al. 2007). For this reason, the optimal codon usage is likely better approximated by the codon usage of HEGs than what we can infer based on codon-anticodon pairing. Consistent with this proposition, CAI , which is based on the codon usage of HEGs (HEGs), performs better in predicting protein production or abundance than other indices based on tRNAs (Coghlan and Wolfe 2000; Comeron and Aguade 1998; Duret and Mouchiroud 1999).

2.2.3.2 Problem with Using Codon Usage of HEGs to Infer TOCs

Codon usage indices such as CAI that use codon usage of HEGs to infer TOCs also have problems. Other than those previously outlined (Xia 2007c), it often leads to wrong interpretation of tRNA-mediated selection. I illustrate this problem here with the Ala codon subfamily GCR (where R stands for either A or G). The frequencies of GCA and GCG in E. coli HEGs, as compiled and distributed with EMBOSS (Rice et al. 2000), are 1973 and 2654, respectively, which may lead one to think that E. coli translation machinery prefers GCG over GCA. However, the codon frequencies of GCA and GCG for E. coli non-HEGs are 25,511 and 43,261, respectively. Thus, GCA is relatively more frequent in E. coli HEGs than in E. coli non-HEGs. This suggests that mutation bias favors GCG, but tRNA-mediated selection favors GCA. The battle between the mutation bias and tRNA-mediated selection leads to increased usage of GCA in E. coli HEGs relative to LEGs , although GCA is still not as frequent as GCG in HEGs. This interpretation is corroborated by the E. coli genome encoding three tRNAArg genes for GCR codons, all with a UGC anticodon forming perfect Watson-Crick base pair with codon GCA.

The example above illustrates the point that mutation bias is reflected to codon usage of lowly expressed genes. This is what has driven the formulation, development, and implementation of a new codon usage index, ITE (Xia 2015).

2.3 ITE (Index of Translation Elongation)

2.3.1 Illustration of ITE Calculation

ITE is implemented in DAMBE (Xia 2013, 2017d). There are in fact four different implementations of ITE in DAMBE, depending on how one would classify codons into codon families. The first implementation is the most extreme (unconventional) and classifies all sense codons into NNR or NNY codon families or subfamilies. For example, the fourfold alanine codon is broken into GCR and GCY subfamilies. For such an NNR or NNY codon family or subfamily i, we first define Pi.HEG and Pi.non-HEG as the proportion of codon i within its R-ending or Y-ending family for E. coli HEGs and non-HEGs. Take data for codons GCA and GCG in Table 9.9, for example:
$$ {\displaystyle \begin{array}{l}{P}_{\mathrm{GCA}.\mathrm{HEG}}=\frac{N_{\mathrm{GCA}.\mathrm{HEG}}}{N_{\mathrm{GCR}.\mathrm{HEG}}}=\frac{1973}{1973+2654}=0.42641\\ {}{P}_{\mathrm{GCA}.\mathrm{non}\hbox{-} \mathrm{HEG}}=\frac{N_{\mathrm{GCA}.\mathrm{non}\hbox{-} \mathrm{HEG}}}{N_{\mathrm{GCR}.\mathrm{non}\hbox{-} \mathrm{HEG}}}=\frac{25511}{25511+43261}=0.37095\end{array}} $$
(9.6)
$$ {\displaystyle \begin{array}{l}{S}_{\mathrm{GCA}}=\frac{P_{\mathrm{GCA}.\mathrm{HEG}}}{P_{\mathrm{GCA}.\mathrm{non}\hbox{-} \mathrm{HEG}}}=1.1495\\ {}{S}_{\mathrm{GCG}}=\frac{P_{\mathrm{GCG}.\mathrm{HEG}}}{P_{\mathrm{GCG}.\mathrm{non}\hbox{-} \mathrm{HEG}}}=0.9118\end{array}} $$
(9.7)
where SGCA and SGCG may be viewed as relative codon frequencies of HEGs corrected for the “background” non-HEGs. Codon i is considered selected for if Si > 1 and against if Si < 1. Thus, codon GCA is considered selected for because, according to Eq. (9.7), SGCA > 0. This insight would be obscured if we use codon frequency data from E. coli HEGs only which would have suggested that codon GCA is selected against. The Si values for the four sense codons in E. coli are listed in Table 9.9.
Table 9.9

Codon frequency (CF) for E. coli highly expressed genes (HEGs ) and non-HEGs, as well as the computed Si values according to Eq. (9.7)

AA

Codon

CFHEG

CFnon-HEG

Si

A

GCA

1973

25,511

1.1495

A

GCG

2654

43,261

0.9118

A

GCC

1306

33,463

0.5646

A

GCU

2288

18,526

1.7865

We now compute wi as follows:
$$ {\displaystyle \begin{array}{l}{w}_i=\frac{S_i}{\operatorname{Max}\left({S}_i\right)},\mathrm{e}.\mathrm{g}.,\\ {}{w}_{\mathrm{GCA}}=\frac{1.1495}{1.1495}=1;{w}_{\mathrm{GCG}}=\frac{0.9118}{1.1495}=0.7932\end{array}} $$
(9.8)
The index of translation elongation (ITE ) is then calculated in the same way as CAI except that, in this particular codon family classification, the computation is applied to NNR and NNY codon subfamilies:
$$ {I}_{\mathrm{TE}}={e}^{\frac{\sum \limits_{i=1}^{N_s}{F}_i\ln {w}_i}{\sum \limits_{i=1}^{N_s}{F}_i}} $$
(9.9)
where Fi is the frequency of codon i and Ns is the number of sense codons (excluding those in single-codon families). For example, AUG for methionine, AUA for isoleucine, and UGG for tryptophan in the standard genetic code are excluded from computing ITE . Just like CAI , tAI, and Nc, ITE is a gene-specific index of codon usage bias .

One may note that CAI is a special case of ITE when there is absolutely no codon usage bias in non-HEGs in all codon subfamilies. That is, when NGCA.non-HEG = NGCG.non-HEG, NGCC.non-HEG = NGCU.non-HEG, and so on. The range of ITE is the same as CAI, i.e., between 0 and 1.

Readers may demand a justification for the extreme classification of all sense codons into NNR and NNY codon families. The main reason is that, for genes encoded by the nuclear genome, the R-ending codons are typically decoded by two types of tRNA species (one with a wobble C and the other with a wobble U), whereas the Y-ending codons are decoded typically by a single type of tRNA species with either a wobble G or a wobble A modified to inosine, but never by both (Grosjean et al. 2007; Marck and Grosjean 2002). For this reason, the R-ending and Y-ending codons, even within a single fourfold codon family, are subject to different tRNA-mediated selection and therefore should be treated separately. Such implementation is also relevant for certain experimental settings that induce mutation almost exclusively in NNY codons, which is the case in Kudla et al. (2009). However, for comparative purposes, I have included two alternative ITE implementations in DAMBE (Xia 2013, 2017d): (1) with compound sixfold and eightfold codon families broken into twofold and fourfold codon families and (2) lumping all synonymous codons into one codon family. One may access the function by clicking “Seq.Analysis|Codon usage|Index of translation elongation” and then choosing the desired implementation.

2.3.2 A Major Controversy Resolved by the Application of ITE

Highly expressed genes in bacteria and unicellular eukaryotes overuse codons that match the anticodon of the most abundant tRNA (Ikemura 1981a, b, 1982, 1992). When such codons are replaced by rarely used codons, protein production is reduced (Robinson et al. 1984; Sorensen et al. 1989). Similarly, when codon usage is optimized, protein production is increased (Haas et al. 1996; Kaishima et al. 2016; Ngumbela et al. 2008). However, to what degree is translation elongation rate-limiting has been controversial. Early theoretical considerations (Andersson and Kurland 1983; Bulmer 1990, 1991; Liljenstrom and von Heijne 1987) tend to favor the argument that translation elongation is not rate-limiting in protein production, but translation initiation is. This hypothesis states that codon-anticodon adaptation and increased elongation efficiency are not related to protein production. Instead, the benefit of codon adaptation and increased elongation efficiency is to increase ribosomal availability for global translation and timely response to environmental perturbations.

To test these two alternative hypotheses, Kudla et al. (2009) engineered a synthetic library of 154 genes, all encoding the same green fluorescent protein in Escherichia coli , but differing in synonymous sites (and consequently the degree of codon adaptation, as measured by codon adaptation index or CAI). All sequences share an identical 5′ UTR of 144 nt long, so there is no variation in the Shine-Dalgarno sequence. Because the engineered genes all encode the same protein, it is justifiable to use protein abundance as a proxy for protein production (assuming that protein molecules sharing the same amino acid sequence have the same degradation rate).

Kudla et al. (2009) used minimum folding energy (MFE), computed from sites −4 to +37 (where ribosomes position themselves at the initiation codon), as a proxy for initiation efficiency. The rationale for using MFE as a measure of translation initiation is that an initiation codon would be inaccessible if it is embedded in a strong secondary structure and that accessibility of the initiation codon is a key determinant of translation initiation efficiency (Nakamoto 2006). Stable secondary structure in sequences positioned at or before the start codon has been experimentally shown to inhibit translation initiation (Osterman et al. 2013), presumably because it embeds SD and start codon in a structural stem and consequently hiding the SD and start codon signals from ribosomes. The previous chapter on translation initiation has already highlighted the point that mRNAs in bacteria and unicellular eukaryotes tend to have much weaker secondary structure near the start codon than elsewhere, especially those from highly expressed.

Kudla et al. interpreted CAI as a proxy of translation elongation. If both translation initiation and elongation contribute to translation efficiency, then protein production is expected to depend on both MFE and CAI. If only translation initiation is important, then protein production will depend on MFE only. They found that MFE accounts for 44% of the variation in protein production but CAI is essentially unrelated to protein production. They concluded consequently that “translation initiation, not elongation, is rate-limiting for gene expression.”

The conclusion by Kudla et al. (2009), however, is based on two critical assumptions. First, MFE and CAI are good proxies of translation initiation and elongation efficiencies, respectively. Second, the effect of translation elongation is independent on translation initiation. The problem with the second assumption has been pointed out recently (Supek and Smuc 2010; Tuller et al. 2010) who reanalyzed the data in addition to providing an overwhelming amount of additional empirical evidence to demonstrate the joint effect of both translation initiation and elongation on protein production. In short, protein production rate is expected to increase with elongation efficiency only when translation initiation is efficient. If translation initiation is slow, then increasing elongation rate is not expected to increase protein production. Kudla et al. (2009) ignored the dependence of elongation effect on translation initiation.

Xia (2015) reanalyzed the experimental data in Kudla et al. (2009) with two improvements, by replacing CAI by ITE and by incorporating translation initiation and elongation into one model. Three points are worth highlighting in Fig. 9.8a. First, in contrast to a nonsignificant relationship between protein abundance and CAI, the protein abundance and ITE are highly significantly correlated (p = 0.0001, Fig. 9.8a). Second, when ITE is small (e.g., ITE < 0), protein abundance is generally low, suggesting that translation elongation is limiting. Third, a large ITE (efficient translation elongation) does not imply high protein production, e.g., when translation initiation is very slow. One expects a large ITE to be associated with increase protein production only when translation initiation is efficient.
Fig. 9.8

Relationship between protein abundance (measured by GFP normalized fluorescence; data kindly provided by Dr. Plotkin) translation elongation efficiency (ITE ). (a) Without considering translation initiation. (b) The relationship between protein abundance and ITE is characterized separately for four groups of data, with MFE1, MFE2, MFE3, and MFE4 corresponding to groups of genes with increasing translation initiation efficiency. (Modified from Xia 2015)

Xia (2015) binned MFE into four MFE categories, from strong secondary structure to weak secondary structure (−15.3, −11), (−10.9, −9), (−8.7, −6.2), and (−6, −3.5), representing translation initiation from the lowest to the highest, and designated as MFE1-MFE4 (Fig. 9.8b). The intervals are chosen in such a way that all MFE values fall into four roughly equal-sized groups with within-group MFE being as small as possible. The benefit of binning is that one can exclude the MFE variable so that the effect of ITE can be modeled more explicitly. It is for the same reason that Tuller et al. (2010) also used binned analysis for this data set.

In the MFE1 group, translation initiation is the lowest, and we should expect little increase of protein production with translation elongation efficiency (ITE ). This is consistent with the empirical result (Fig. 9.8b) where the relationship between ITE and protein abundance is not statistically significant in the MFE1 group (b = 67.545, p = 0.4213, Fig. 9.8b), with ITE accounting for only 2% of total variation in ranked protein abundance (rProt). In contrast, when translation initiation is more efficient in groups MFE2-MFE4, rProt increases significantly with ITE, with the simple linear model consistently accounts for about 17% of the total variation in rProt (Fig. 9.8b, with b varying from 216.60 to 263.87). Thus, the contribution of translation elongation (ITE) to protein production is much greater than previously documented for this data set, i.e., absent (Kudla et al. 2009) or less than 3% of the total variation in protein production (Tuller et al. 2010). Readers may consult Xia (2015) for more explicit modeling of the protein abundance on translation initiation and elongation.

One might wonder why previous studies, although not taking translation initiation into consideration, almost always consistently show positive relationship between translation efficiency and codon adaptation. There are two explanations. First, previous experimental studies were carried out typically on highly expressed genes with efficient translation initiation efficiency. Such studies are equivalent to excluding the MFE1 group in Fig. 9.8b. Second, for correlational studies, nature generally does not generate bacterial genes with high translation initiation efficiency but poor codon adaptation or low translation initiation with high codon adaptation. However, the experiment by Kudla et al. (2009) generated both of these unnatural associations, leading to a lack of positive association between protein production and codon adaptation. This example highlights the point that a well-intended and well-done experiment can mislead us. It represents another illustration of Simpson’s Paradox in which wrong conclusion is reached when one omits a contributing variable.

3 Translation Elongation Efficiency and Accuracy

Given a fixed translation initiation efficiency, our conceptual model for the relationship between codon adaptation (CA) and tRNA-mediated selection, in its simplest form, is
$$ \mathrm{CA}=\alpha +\beta {\mathrm{S}}_{\mathrm{E}} $$
(9.10)
where CA is tRNA-mediated codon adaptation often measured by CAI or ITE (Xia 2015) and SE is selection for translation efficiency (in unit of protein produced per mRNA molecule). The slope b is typically positive, i.e., stronger selection for translation efficiency leads to better codon adaptation. Many studies have demonstrated a strong relationship between codon adaptation and gene expression (Coghlan and Wolfe 2000; Duret and Mouchiroud 1999; Gouy and Gautier 1982).
One key deficiency in Eq. (9.10) is that it does not distinguish between selection due to translation efficiency or that due to translation accuracy (Akashi 1994). Take Asn codons AAC and AAU in E. coli, for example. AAC is a major codon (heavily used by highly expressed genes and decoded by the most abundant isoacceptor tRNA ), whereas AAU is a rarely used minor codon. A major codon is typically translated faster than a minor codon, and highly expressed E. coli genes use AAC almost exclusively to code for Asn, so one could argue that the overuse of AAC is driven by SE. However, AAC and AAU also differ in misreading rate, in particular by tRNALys which ideally should decode only AAA and AAG codons but does misread AAC and AAU, leading to Asn replaced by Lys. This misreading error rate is six times greater for AAU than for AAC, with the error ratio maintained in both Asn-starved and Asn-non-starved conditions (Johnston et al. 1984) or with streptomycin used to inhibit translation (Johnston and Parker 1985). Thus, the overuse of AAC could be driven either by selection for increased translation efficiency or increased translation accuracy or both. Designating SA as selection for translation accuracy, we have three alternative hypotheses expressed, in the simplest form, as
$$ \mathrm{CA}=\alpha +{\beta}_1{\mathrm{S}}_{\mathrm{E}} $$
(9.11)
$$ \mathrm{CA}=\alpha +{\beta}_1{\mathrm{S}}_{\mathrm{A}} $$
(9.12)
$$ \mathrm{CA}=\alpha +{\beta}_1{\mathrm{S}}_{\mathrm{E}}+{\beta}_2{\mathrm{S}}_{\mathrm{A}}+{\beta}_3{\mathrm{S}}_{\mathrm{E}}{\mathrm{S}}_{\mathrm{A}} $$
(9.13)

Akashi (1994) classified amino acid sites into conserved sites (assumed to be functionally important with high SA) and variable sites (assumed to experience low SA). He reasoned that, if codon adaptation is due to selection for translation efficiency, then all codons in the gene should be subject to similar selection regardless of whether the codon is in a functionally important or unimportant site. In contrast, if codon adaptation is driven by selection for translation accuracy, then the selection is stronger in functionally important sites than in functionally unimportant sites. So we should observe greater codon usage bias in functionally important codon sites than functionally unimportant codon sites. He found greater codon adaptation in conserved amino acid sites than in variable amino acid sites and concluded that this difference between the conserved and variable sites to have resulted from selection for accuracy.

There is a problem with the conclusion. Take lysine codons (AAA and AAG) and glutamate codons (GAA and GAG), for example. Suppose that AAA codon is favored by selection in lysine codon family and GAG favored in glutamate codon family. Also suppose that an ancestral gene has good codon adaptation with lysine coded by AAA and glutamate coded by GAG. Now some lysine sites experienced nonsynonymous substitutions from AAA to GAA. These sites are now designated as variable sites and are occupied by a minor codon GAA. This would result in an association between “poor codon adaptation” and variable sites that have little to do with translation accuracy. Akashi (1994) was aware of this problem but did not provide a definitive solution.

4 Amino Acid Usage and Translation Elongation Efficiency

There are at least four factors contributing to amino acid usage. The first two are related to selection for translation elongation efficiency, the third related to number of synonymous codons, and the fourth related to genomic mutation bias.

4.1 Factors Related to Selection for Translation Elongation Efficiency

Some amino acids are abundant and energetically cheap to make, i.e., consuming few ATPs in their production, whereas others are rare and energetically expensive, so mass-produced proteins should maximize the use of abundant and cheap amino acids (Akashi and Gojobori 2002). However, such a hypothesis, without considering other factors, often does not produce easily testable predictions. For example, we expect highly expressed proteins to maximize the use of energetically cheap amino acids and avoid the use of the expensive ones. However, many ribosome proteins are highly expressed, yet the need for many of them to bind to the negatively charged mRNA demands the usage of positively charged amino acids such as Lys and Arg that are typically energetically expensive to make in the cell. This would lead to an association between high expression and energetically expensive amino acid, thus confounding the prediction that highly expressed genes should maximize the use of cheap amino acids. Furthermore, amino acid availability changes with environment, and the same amino acid may be manufactured differently with different energy consumption in different organisms. So it is not easy to measure energetic cost of amino acids in different organisms. One could, however, turn the question around and ask how one can characterize energetic costs of amino acids by bioinformatic means. For example, in the ideal situation when all other factors affecting amino acid usage have been controlled for, we may infer that the avoided amino acid is perhaps rare or energenetically expensive to make. This type of inference is of course not very satisfactory and is often derogatively termed the backdoor smuggling approach because one does not present direct evidence for energetic cost.

The other factor related to translation elongation is the tRNA abundance, and one expects mass-produced proteins to use amino acids with many tRNAs to carry them. Designating the proportion of tRNAs carrying amino acid i as Pi, and the frequency of amino acid i in highly expressed genes as Ni, Xia (1998a) analytically derived an equation with Pi linearly increasing with the square root of Ni. The relationship was well substantiated with data from E. coli , Salmonella typhimurium, and Saccharomyces cerevisiae (Xia 1998a).

Single-stranded DNA (ssDNA) bacteriophages do not carry their own tRNA and depend entirely on the host tRNA pool for decoding their codons. So one would predict that amino acid usage in these phages should be correlated with the abundance of tRNAs in the host cell. This prediction is tested in a study (Chithambaram et al. 2014b) of phages infecting E. coli, by using tRNA gene copy number in E. coli as a proxy of tRNA abundance (Fig. 9.9). An amino acid carried by more tRNA is used more frequently than another carried by few tRNAs.
Fig. 9.9

Amino acid usage in single-stranded DNA phages infecting E. coli increases with the abundance of isoaccepting tRNA

4.2 Number of Synonymous Codons

In the lack of any selection, we would expect amino acid usage to increase with the number of synonymous codons (Fig. 9.10). However, this relationship is confounded with the number of tRNAs carrying each amino acid in the cell. If we designate the number of tRNA carrying amino acid i as Ni.tRNA and the number of synonymous codons for amino acid i as Ni.syn codon, then amino acid usage depends on both. Ni.tRNA and Ni.syn codon are also positively correlated.
Fig. 9.10

Amino acid count in all coding sequences in E. coli I12 (NC_000913) increases with number of synonymous codons

4.3 Genomic Mutation Bias

E. coli genomes have roughly equal nucleotide frequencies. A more AT-rich or GC-rich genome would tend to have more AT-rich or GC-rich codon and their encoded amino acids. For example, AT-rich genomes in bacterial pathogens tend to have many more lysine (encoded by AAA and AAG) than less AT-rich genomes (Xia and Palidwor 2005). This is highly visible even with mild difference in genomic AT content. For example, yeast ( Saccharomyces cerevisiae ) is only mildly AT-rich (0.3090, 0.1917, 0.1913, and 0.3080 for A, C, G, and T, respectively), but the yeast clearly uses more amino acids encoded by AT-rich codons and fewer amino acid encoded by GC-rich codons (Table 9.10).
Table 9.10

Amino acid usage in E. coli K12 (NC_000913) and S. cerevisiae (NC_001133-NC_001148) coding sequences

AA

Codon

E. coli

Yeast

E. coli%

Yeast%

Ala

GCT,GCC,GCA,GCG

125,332

160,810

9.5527

5.4966

Arg

CGT,CGC,CGA,CGG,AGA,AGG

72,502

130,068

5.5260

4.4458

Asn

AAT,AAC

51,075

179,836

3.8929

6.1469

Asp

GAT,GAC

67,349

171,072

5.1333

5.8473

Cys

TGT,TGC

15,188

37,093

1.1576

1.2679

Gln

CAA,CAG

58,360

115,741

4.4481

3.9561

Glu

GAA,GAG

75,786

191,267

5.7763

6.5376

Gly

GGT,GGC,GGA,GGG

96,701

145,433

7.3705

4.9710

His

CAT,CAC

29,751

63,505

2.2676

2.1706

Ile

ATT,ATC,ATA

78,845

191,677

6.0095

6.5516

Leu

TTG,TTA,CTT,CTC,CTA,CTG

140,571

277,988

10.7142

9.5017

Lys

AAA,AAG

57,620

214,842

4.3917

7.3434

Met

ATG

37,093

60,672

2.8272

2.0738

Phe

TTT,TTC

51,131

129,516

3.8972

4.4269

Pro

CCT,CCC,CCA,CCG

58,293

128,177

4.4430

4.3811

Ser

TCT,TCC,TCA,TCG,AGT,AGC

75,661

263,096

5.7668

8.9927

Thr

ACT,ACC,ACA,ACG

70,494

173,084

5.3730

5.9161

Trp

TGG

20,060

30,387

1.5290

1.0386

Tyr

TAT,TAC

37,134

98,746

2.8303

3.3752

Val

GTT,GTC,GTA,GTG

93,061

162,642

7.0930

5.5592

Amino acids encoded by AT-rich codons are in bold, and those encoded by GC-rich codons are italicized

In summary, amino acid usage (U) is a function of four factors:
$$ U=F\left(E,{N}_{\mathrm{tRNA}},{N}_{\mathrm{syncodon}},\mathrm{GC}\%\right) $$
(9.14)
where E is energetic cost, NtRNA and Nsyncodon have been defined before, and GC% is genomic GC% reflecting mutation bias. One needs to include all these factors in a model in order to reach a reasonable understanding of the determinants of amino acid usage.

References

  1. Abdel-Hameed EA, Ji H, Shata MT (2016) HIV-induced epigenetic alterations in host cells. Adv Exp Med Biol 879:27–38CrossRefPubMedGoogle Scholar
  2. Abolbaghaei A, Silke JR, Xia X (2017) How changes in anti-SD sequences would affect SD sequences in Escherichia coli and Bacillus subtilis. G3 (Bethesda, Md) 7(5):1607–1615CrossRefGoogle Scholar
  3. Abraham EP, Chain E (1940) An enzyme from bacteria able to destroy penicillin. Rev Infect Dis 10(4):677–678Google Scholar
  4. Abraham EP, Chain E, Fletcher CM, Florey HW, Gardner AD, Heatley NG, Jennings MA (1941) Further observations on penicillin. Lancet 238(6155):177–189CrossRefGoogle Scholar
  5. Abraham JM, Feagin JE, Stuart K (1988) Characterization of cytochrome c oxidase III transcripts that are edited only in the 3′ region. Cell 55(2):267–272CrossRefPubMedGoogle Scholar
  6. Adamski FM, McCaughan KK, Jorgensen F, Kurland CG, Tate WP (1994) The concentration of polypeptide chain release factors 1 and 2 at different growth rates of Escherichia coli. J Mol Biol 238(3):302–308PubMedPubMedCentralCrossRefGoogle Scholar
  7. Aerts S, Van Loo P, Thijs G, Mayer H, de Martin R, Moreau Y, De Moor B (2005) TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res 33(Web Server):W393–W396CrossRefPubMedPubMedCentralGoogle Scholar
  8. Aerts S, van Helden J, Sand O, Hassan BA (2007) Fine-tuning enhancer models to predict transcriptional targets across multiple genomes. PLoS One 2(11):e1115CrossRefPubMedPubMedCentralGoogle Scholar
  9. Ahn BY, Jones EV, Moss B (1990) Identification of the vaccinia virus gene encoding an 18-kilodalton subunit of RNA polymerase and demonstration of a 5′ poly(A) leader on its early transcript. J Virol 64(6):3019–3024PubMedPubMedCentralGoogle Scholar
  10. Aird WC, Parvin JD, Sharp PA, Rosenberg RD (1994) The interaction of GATA-binding proteins and basal transcription factors with GATA box-containing core promoters. A model of tissue-specific gene expression. J Biol Chem 269(2):883–889PubMedGoogle Scholar
  11. Akaike H (1973) Information theory and an extension of maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281Google Scholar
  12. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723CrossRefGoogle Scholar
  13. Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136(3):927–935PubMedPubMedCentralGoogle Scholar
  14. Akashi H, Gojobori T (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA 99(6):3695–3700PubMedCrossRefGoogle Scholar
  15. Alatortsev VS, Cruz-Reyes J, Zhelonkina AG, Sollner-Webb B (2008) Trypanosoma brucei RNA editing: coupled cycles of U deletion reveal processive activity of the editing complex. Mol Cell Biol 28(7):2437–2445PubMedPubMedCentralCrossRefGoogle Scholar
  16. Alderwick LJ, Seidel M, Sahm H, Besra GS, Eggeling L (2006) Identification of a novel arabinofuranosyltransferase (AftA) involved in cell wall arabinan biosynthesis in Mycobacterium tuberculosis. J Biol Chem 281(23):15653–15661CrossRefPubMedGoogle Scholar
  17. Allen A, Flemstrom G, Garner A, Kivilaakso E (1993) Gastroduodenal mucosal protection. Physiol Rev 73(4):823–857PubMedCrossRefGoogle Scholar
  18. Alm RA, Trust TJ (1999) Analysis of the genetic diversity of Helicobacter pylori: the tale of two genomes. J Mol Med 77(12):834–846PubMedCrossRefGoogle Scholar
  19. Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL et al (1999) Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397(6715):176–180PubMedCrossRefGoogle Scholar
  20. Alm RA, Bina J, Andrews BM, Doig P, Hancock RE, Trust TJ (2000) Comparative genomics of Helicobacter pylori: analysis of the outer membrane protein families. Infect Immun 68(7):4155–4168PubMedPubMedCentralCrossRefGoogle Scholar
  21. Althaus E, Caprara A, Lenhof HP, Reinert K (2002) Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics. Bioinformatics 18(Suppl 2):S4–S16CrossRefPubMedGoogle Scholar
  22. Altschul SF (1996) Local alignment statistics. Meth Enzymol 274:460–480CrossRefGoogle Scholar
  23. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410CrossRefGoogle Scholar
  24. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedPubMedCentralCrossRefGoogle Scholar
  25. Anderson KP, Crable SC, Lingrel JB (1998) Multiple proteins binding to a GATA-E box-GATA motif regulate the erythroid Kruppel-like factor (EKLF) gene. J Biol Chem 273(23):14347–14354CrossRefPubMedGoogle Scholar
  26. Andersson DI, Kurland CG (1983) Ram ribosomes are defective proofreaders. Mol Gen Genet 191(3):378–381PubMedCrossRefGoogle Scholar
  27. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003) Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100(7):3889–3894CrossRefPubMedGoogle Scholar
  28. Arbibe L, Sansonetti PJ (2007) Epigenetic regulation of host response to LPS: causing tolerance while avoiding toll errancy. Cell Host Microbe 1(4):244–246PubMedPubMedCentralCrossRefGoogle Scholar
  29. Arnqvist G (2006) Sensory exploitation and sexual conflict. Philos Trans R Soc Lond Ser B Biol Sci 361(1466):375–386CrossRefGoogle Scholar
  30. Arvaniti E, Moulos P, Vakrakou A, Chatziantoniou C, Chadjichristos C, Kavvadas P, Charonis A, Politis PK (2016) Whole-transcriptome analysis of UUO mouse model of renal fibrosis reveals new molecular players in kidney diseases. Sci Rep 6:26235PubMedPubMedCentralCrossRefGoogle Scholar
  31. Ast G (2004) How did alternative splicing evolve? Nat Rev Genet 5(10):773–782PubMedPubMedCentralCrossRefGoogle Scholar
  32. Auch AF, Henz SR, Holland BR, Goker M (2006) Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinform 7:350CrossRefGoogle Scholar
  33. Awan AR, Manfredo A, Pleiss JA (2013) Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans. Proc Natl Acad Sci USA 110(31):12762–12767CrossRefPubMedGoogle Scholar
  34. Axon AT (1999) Are all helicobacters equal? Mechanisms of gastroduodenal pathology and their clinical implications. Gut 45(Suppl 1):I1–I4PubMedPubMedCentralCrossRefGoogle Scholar
  35. Bablanian R, Banerjee AK (1986) Poly(riboadenylic acid) preferentially inhibits in vitro translation of cellular mRNAs compared with vaccinia virus mRNAs: possible role in vaccinia virus cytopathology. Proc Natl Acad Sci USA 83(5):1290–1294PubMedCrossRefGoogle Scholar
  36. Bablanian R, Coppola G, Masters PS, Banerjee AK (1986) Characterization of vaccinia virus transcripts involved in selective inhibition of host protein synthesis. Virology 148(2):375–380PubMedCrossRefGoogle Scholar
  37. Bablanian R, Goswami SK, Esteban M, Banerjee AK (1987) Selective inhibition of protein synthesis by synthetic and vaccinia virus-core synthesized poly(riboadenylic acids). Virology 161(2):366–373PubMedCrossRefGoogle Scholar
  38. Bablanian R, Scribani S, Esteban M (1993) Amplification of polyadenylated nontranslated small RNA sequences (POLADS) during superinfection correlates with the inhibition of viral and cellular protein synthesis. Cell Mol Biol Res 39(3):243–255PubMedGoogle Scholar
  39. Bag J (2001) Feedback inhibition of poly(A)-binding protein mRNA translation. A possible mechanism of translation arrest by stalled 40 S ribosomal subunits. J Biol Chem 276(50):47352–47360PubMedCrossRefGoogle Scholar
  40. Bag J, Bhattacharjee RB (2010) Multiple levels of post-transcriptional control of expression of the poy (A)-binding protein. RNA Biol 7(1):5–12PubMedCrossRefGoogle Scholar
  41. Baik SC, Kim KM, Song SM, Kim DS, Jun JS, Lee SG, Song JY, Park JU, Kang HL, Lee WK et al (2004) Proteomic analysis of the sarcosine-insoluble outer membrane fraction of Helicobacter pylori strain 26695. J Bacteriol 186(4):949–955PubMedPubMedCentralCrossRefGoogle Scholar
  42. Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373PubMedPubMedCentralCrossRefGoogle Scholar
  43. Baird SD, Turcotte M, Korneluk RG, Holcik M (2006) Searching for IRES. RNA 12(10):1755–1785PubMedPubMedCentralCrossRefGoogle Scholar
  44. Baird SD, Lewis SM, Turcotte M, Holcik M (2007) A search for structurally similar cellular internal ribosome entry sites. Nucleic Acids Res 35(14):4664–4677PubMedPubMedCentralCrossRefGoogle Scholar
  45. Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, Cambridge, MAGoogle Scholar
  46. Bamford DH, Caldentey J, Bamford JK (1995) Bacteriophage PRD1: a broad host range DSDNA tectivirus with an internal membrane. Adv Virus Res 45:281–319PubMedCrossRefGoogle Scholar
  47. Bao J, Bedford MT (2016) Epigenetic regulation of the histone-to-protamine transition during spermiogenesis. Reproduction 151(5):R55–R70PubMedPubMedCentralCrossRefGoogle Scholar
  48. Baron D, Cocquet J, Xia X, Fellous M, Guiguen Y, Veitia RA (2004) An evolutionary and functional analysis of FoxL2 in rainbow trout gonad differentiation. J Mol Endocrinol 33:705–715PubMedCrossRefGoogle Scholar
  49. Bastianelli G, Bouillon A, Nguyen C, Crublet E, Petres S, Gorgette O, Le-Nguyen D, Barale JC, Nilges M (2011) Computational reverse-engineering of a spider-venom derived peptide active against Plasmodium falciparum SUB1. PLoS One 6(7):e21812PubMedPubMedCentralCrossRefGoogle Scholar
  50. Bauerfeind P, Garner R, Dunn BE, Mobley HL (1997) Synthesis and activity of Helicobacter pylori urease and catalase at low pH. Gut 40(1):25–30PubMedPubMedCentralCrossRefGoogle Scholar
  51. Baumgartner HK, Montrose MH (2004) Regulated alkali secretion acts in tandem with unstirred layers to regulate mouse gastric surface pH. Gastroenterology 126(3):774–783PubMedCrossRefGoogle Scholar
  52. Beier H, Grimm M (2001) Misreading of termination codons in eukaryotes by natural nonsense suppressor tRNAs. Nucleic Acids Res 29(23):4767–4782PubMedPubMedCentralCrossRefGoogle Scholar
  53. Bell D, Bell AH, Bondaruk J, Hanna EY, Weber RS (2016) In-depth characterization of the salivary adenoid cystic carcinoma transcriptome with emphasis on dominant cell type. Cancer 122(10):1513–1522CrossRefPubMedGoogle Scholar
  54. Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21(11):2657–2666CrossRefPubMedGoogle Scholar
  55. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300Google Scholar
  56. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple hypothesis testing under dependency. Ann Stat 29:1165–1188CrossRefGoogle Scholar
  57. Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031PubMedPubMedCentralGoogle Scholar
  58. Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, Rizk G (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinform 16:288CrossRefGoogle Scholar
  59. Benzer S, Champe SP (1962) A change from nonsense to sense in the genetic code. Proc Natl Acad Sci USA 48:1114–1121PubMedPubMedCentralCrossRefGoogle Scholar
  60. Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry. W. H. Freeman and Co, New YorkGoogle Scholar
  61. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C et al (2010) Integrative analysis of the melanoma transcriptome. Genome Res 20(4):413–427PubMedPubMedCentralCrossRefGoogle Scholar
  62. Bergsten E, Uutela M, Li X, Pietras K, Ostman A, Heldin CH, Alitalo K, Eriksson U (2001) PDGF-D is a specific, protease-activated ligand for the PDGF beta-receptor. Nat Cell Biol 3(5):512–516CrossRefPubMedGoogle Scholar
  63. Bertholet C, Van Meir E, ten Heggeler-Bordier B, Wittek R (1987) Vaccinia virus produces late mRNAs by discontinuous synthesis. Cell 50(2):153–162PubMedCrossRefGoogle Scholar
  64. Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454PubMedPubMedCentralCrossRefGoogle Scholar
  65. Bestor TH, Coxon A (1993) The pros and cons of DNA methylation. Curr Biol 6:384–386CrossRefGoogle Scholar
  66. Betney R, de Silva E, Krishnan J, Stansfield I (2010) Autoregulatory systems controlling translation factor expression: thermostat-like control of translational accuracy. RNA 16(4):655–663PubMedPubMedCentralCrossRefGoogle Scholar
  67. Beznoskova P, Gunisova S, Valasek LS (2016) Rules of UGA-N decoding by near-cognate tRNAs and analysis of readthrough on short uORFs in yeast. RNA 22(3):456–466PubMedPubMedCentralCrossRefGoogle Scholar
  68. Bhagwat M, Aravind L (2007) PSI-BLAST tutorial. Methods Mol Biol 395:177–186PubMedPubMedCentralCrossRefGoogle Scholar
  69. Bhatia B, Ponia SS, Solanki AK, Dixit A, Garg LC (2014) Identification of glutamate ABC-transporter component in Clostridium perfringens as a putative drug target. Bioinformation 10(7):401–405PubMedPubMedCentralCrossRefGoogle Scholar
  70. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL et al (2011) High density DNA methylation array with single CpG site resolution. Genomics 98(4):288–295PubMedPubMedCentralCrossRefGoogle Scholar
  71. Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19(7):818–824CrossRefPubMedGoogle Scholar
  72. Bierne H, Hamon M, Cossart P (2012) Epigenetics and bacterial infections. Cold Spring Harb Perspect Med 2(12):a010272PubMedPubMedCentralCrossRefGoogle Scholar
  73. Bigaud E, Corrales FJ (2016) Methylthioadenosine (MTA) regulates liver cells proteome and methylproteome: implications in liver biology and disease. Mol Cell Proteomics 15(5):1498–1510PubMedPubMedCentralCrossRefGoogle Scholar
  74. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816CrossRefPubMedGoogle Scholar
  75. Bjorkholm B, Lundin A, Sillen A, Guillemin K, Salama N, Rubio C, Gordon JI, Falk P, Engstrand L (2001) Comparison of genetic divergence and fitness between two subclones of Helicobacter pylori. Infect Immun 69(12):7832–7838PubMedPubMedCentralCrossRefGoogle Scholar
  76. Bjornsson A, Isaksson LA (1996) Accumulation of a mRNA decay intermediate by ribosomal pausing at a stop codon. Nucleic Acids Res 24(9):1753–1757PubMedPubMedCentralCrossRefGoogle Scholar
  77. Blackburne BP, Whelan S (2013) Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 30(3):642–653PubMedCrossRefGoogle Scholar
  78. Blakqori G, van Knippenberg I, Elliott RM (2009) Bunyamwera orthobunyavirus S-segment untranslated regions mediate poly(A) tail-independent translation. J Virol 83(8):3637–3646PubMedPubMedCentralCrossRefGoogle Scholar
  79. Blanchet S, Cornu D, Argentini M, Namy O (2014) New insights into the incorporation of natural suppressor tRNAs at stop codons in Saccharomyces cerevisiae. Nucleic Acids Res 42(15):10061–10072PubMedPubMedCentralCrossRefGoogle Scholar
  80. Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748PubMedPubMedCentralCrossRefGoogle Scholar
  81. Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D et al (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 6(5):656–668CrossRefGoogle Scholar
  82. Boehringer D, Thermann R, Ostareck-Lederer A, Lewis JD, Stark H (2005) Structure of the hepatitis C virus IRES bound to the human 80S ribosome: remodeling of the HCV IRES. Structure 13(11):1695PubMedCrossRefGoogle Scholar
  83. Bogenhagen DF, Clayton DA (2003) The mitochondrial DNA replication bubble has not burst. Trends Biochem Sci 28(7):357–360PubMedPubMedCentralCrossRefGoogle Scholar
  84. Bolden JE, Peart MJ, Johnstone RW (2006) Anticancer activities of histone deacetylase inhibitors. Nat Rev Drug Discov 5(9):769–784PubMedPubMedCentralCrossRefGoogle Scholar
  85. Borodovsky M, McIninch J (1993) GENMARK: parallel gene recognition for both DNA strands. Comput Chem 17:123–133CrossRefGoogle Scholar
  86. Bossi L (1983) Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message. J Mol Biol 164(1):73–87PubMedPubMedCentralCrossRefGoogle Scholar
  87. Bossi L, Ruth JR (1980) The influence of codon context on genetic code translation. Nature 286(5769):123–127PubMedPubMedCentralCrossRefGoogle Scholar
  88. Brauch H, Weirich G, Brieger J, Glavac D, Rodl H, Eichinger M, Feurer M, Weidt E, Puranakanitstha C, Neuhaus C et al (2000) VHL alterations in human clear cell renal cell carcinoma: association with advanced tumor stage and a novel hot spot mutation. Cancer Res 60(7):1942–1948PubMedPubMedCentralGoogle Scholar
  89. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371CrossRefPubMedPubMedCentralGoogle Scholar
  90. Britten RJ (1986) Rates of DNA sequence evolution differ between taxonomic groups. Science 231:1393–1398PubMedCrossRefGoogle Scholar
  91. Brooks DR, McLennan DA (1991) Phylogeny, ecology and behavior: a research program in comparative biology. University of Chicago Press, ChicagoGoogle Scholar
  92. Brown CM, Stockwell PA, Trotman CN, Tate WP (1990) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18(21):6339–6345PubMedPubMedCentralCrossRefGoogle Scholar
  93. Brown M, Hughey R, Krogh A, Mian IS, Sjolander K, Haussler D (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol 1:47–55PubMedGoogle Scholar
  94. Brown TA, Cecconi C, Tkachuk AN, Bustamante C, Clayton DA (2005) Replication of mitochondrial DNA occurs by strand displacement with alternative light-strand origins, not via a strand-coupled mechanism. Genes Dev 19(20):2466–2476PubMedPubMedCentralCrossRefGoogle Scholar
  95. Brumme ZL, Dong WW, Yip B, Wynhoven B, Hoffman NG, Swanstrom R, Jensen MA, Mullins JI, Hogg RS, Montaner JS et al (2004) Clinical and immunological impact of HIV envelope V3 sequence variation after starting initial triple antiretroviral therapy. AIDS 18(4):F1–F9CrossRefPubMedGoogle Scholar
  96. Bucklew JA (1990) Large deviation techniques in decision, simulation, and estimation. Wiley, New YorkGoogle Scholar
  97. Bulmer M (1990) The effect of context on synonymous codon usage in genes with low codon usage bias. Nucleic Acids Res 18(10):2869–2873PubMedPubMedCentralCrossRefGoogle Scholar
  98. Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907PubMedPubMedCentralGoogle Scholar
  99. Bumann D, Aksu S, Wendland M, Janek K, Zimny-Arndt U, Sabarth N, Meyer TF, Jungblut PR (2002) Proteome analysis of secreted proteins of the gastric pathogen Helicobacter pylori. Infect Immun 70(7):3396–3403PubMedPubMedCentralCrossRefGoogle Scholar
  100. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94CrossRefPubMedGoogle Scholar
  101. Burge CB, Karlin S (1998) Finding the genes in genomic DNA. Curr Opin Struct Biol 8(3):346–354PubMedPubMedCentralCrossRefGoogle Scholar
  102. Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New YorkGoogle Scholar
  103. Bury-Mone S, Skouloubris S, Labigne A, De Reuse H (2001) The Helicobacter pylori UreI protein: role in adaptation to acidity and identification of residues essential for its activity and for acid activation. Mol Microbiol 42(4):1021–1034PubMedPubMedCentralCrossRefGoogle Scholar
  104. Calderone TL, Stevens RD, Oas TG (1996) High-level misincorporation of lysine for arginine at AGA codons in a fusion protein expressed in Escherichia coli. J Mol Biol 262(4):407–412PubMedPubMedCentralCrossRefGoogle Scholar
  105. Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Paabo S, Hasegawa M (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47(3):307–322PubMedPubMedCentralCrossRefGoogle Scholar
  106. Capecchi MR (1967) Polypeptide chain termination in vitro: isolation of a release factor. Proc Natl Acad Sci USA 58(3):1144–1151PubMedPubMedCentralCrossRefGoogle Scholar
  107. Capuano F, Mulleder M, Kok R, Blom HJ, Ralser M (2014) Cytosine DNA methylation is found in Drosophila melanogaster but absent in Saccharomyces cerevisiae, Schizosaccharomyces pombe, and other yeast species. Anal Chem 86(8):3697–3702PubMedPubMedCentralCrossRefGoogle Scholar
  108. Cardon LR, Burge C, Clayton DA, Karlin S (1994) Pervasive CpG suppression in animal mitochondrial genomes. Proc Natl Acad Sci USA 91:3799–3803PubMedPubMedCentralCrossRefGoogle Scholar
  109. Carlini DB (2005) Context-dependent codon bias and messenger RNA longevity in the yeast transcriptome. Mol Biol Evol 22(6):1403–1411PubMedPubMedCentralCrossRefGoogle Scholar
  110. Carroll J, Fearnley IM, Shannon RJ, Hirst J, Walker JE (2003) Analysis of the subunit composition of complex I from bovine heart mitochondria. Mol Cell Proteomics 2(2):117–126PubMedPubMedCentralCrossRefGoogle Scholar
  111. Carullo M, Xia X (2008) An extensive study of mutation and selection on the wobble nucleotide in tRNA anticodons in fungal mitochondrial genomes. J Mol Evol 66(5):484–493PubMedPubMedCentralCrossRefGoogle Scholar
  112. Censini S, Lange C, Xiang Z, Crabtree JE, Ghiara P, Borodovsky M, Rappuoli R, Covacci A (1996) Cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors. Proc Natl Acad Sci USA 93(25):14648–14653PubMedPubMedCentralCrossRefGoogle Scholar
  113. Cesar Sanchez J, Padron G, Santana H, Herrera L (1998) Elimination of an HuIFN alpha 2b readthrough species, produced in Escherichia coli, by replacing its natural translational stop signal. J Biotechnol 63(3):179–186PubMedPubMedCentralCrossRefGoogle Scholar
  114. Chakrabarti S, Lanczycki CJ (2007) Analysis and prediction of functionally important sites in proteins. Protein Sci 16(1):4–13PubMedPubMedCentralCrossRefGoogle Scholar
  115. Chakraborty R (1977) Estimation of time of divergence from phylogenetic studies. Can J Genet Cytol 19:217–223PubMedPubMedCentralCrossRefGoogle Scholar
  116. Chambaud I, Heilig R, Ferris S, Barbe V, Samson D, Galisson F, Moszer I, Dybvig K, Wroblewski H, Viari A et al (2001) The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res 29(10):2145–2153PubMedPubMedCentralCrossRefGoogle Scholar
  117. Chan S-W, Egan P (2009) Effects of hepatitis C virus envelope glycoprotein unfolded protein response activation on translation and transcription. Arch Virol 154(10):1631–1640PubMedPubMedCentralCrossRefGoogle Scholar
  118. Chan PP, Lowe TM (2009) GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37(Database issue):D93–D97PubMedPubMedCentralCrossRefGoogle Scholar
  119. Chang SY, McGary EC, Chang S (1989) Methionine aminopeptidase gene of Escherichia coli is essential for cell growth. J Bacteriol 171(7):4071–4072PubMedPubMedCentralCrossRefGoogle Scholar
  120. Charig CR, Webb DR, Payne SR, Wickham JE (1986) Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. Br Med J (Clin Res Ed) 292(6524):879–882CrossRefGoogle Scholar
  121. Chen JJ, Peck K, Hong TM, Yang SC, Sher YP, Shih JY, Wu R, Cheng JL, Roffler SR, Wu CW et al (2001) Global analysis of gene expression in invasion by a lung cancer model. Cancer Res 61(13):5223–5230PubMedGoogle Scholar
  122. Chen Q, Yan M, Cao Z, Li X, Zhang Y, Shi J, Feng GH, Peng H, Zhang X, Qian J et al (2016) Sperm tsRNAs contribute to intergenerational inheritance of an acquired metabolic disorder. Science 351(6271):397–400PubMedPubMedCentralCrossRefGoogle Scholar
  123. Chilingaryan A, Gevorgyan N, Vardanyan A, Jones D, Szabo A (2002) Multivariate approach for selecting sets of differentially expressed genes. Math Biosci 176(1):59–69CrossRefPubMedGoogle Scholar
  124. Chithambaram S, Prabhakaran R, Xia X (2014a) Differential codon adaptation between dsDNA and ssDNA phages in escherichia coli. Mol Biol Evol 31(6):1606–1617PubMedPubMedCentralCrossRefGoogle Scholar
  125. Chithambaram S, Prabhakaran R, Xia X (2014b) The effect of mutation and selection on codon adaptation in escherichia coli bacteriophage. Genetics 197(1):301–315PubMedPubMedCentralCrossRefGoogle Scholar
  126. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73CrossRefPubMedGoogle Scholar
  127. Chou PY, Fasman GD (1978a) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276PubMedPubMedCentralCrossRefGoogle Scholar
  128. Chou PY, Fasman GD (1978b) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47:45–148PubMedGoogle Scholar
  129. Chu C, Qu K, Zhong FL, Artandi SE, Chang HY (2011) Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell 44(4):667–678PubMedPubMedCentralCrossRefGoogle Scholar
  130. Chu C, Quinn J, Chang HY (2012) Chromatin isolation by RNA purification (ChIRP). J Vis Exp 61:e3912Google Scholar
  131. Chuang SE, Daniels DL, Blattner FR (1993) Global regulation of gene expression in Escherichia coli. J Bacteriol 175(7):2026–2036PubMedPubMedCentralCrossRefGoogle Scholar
  132. Clark AT (2015) DNA methylation remodeling in vitro and in vivo. Curr Opin Genet Dev 34:82–87PubMedPubMedCentralCrossRefGoogle Scholar
  133. Claverie JM (1994) Some useful statistical properties of position-weight matrices. Comput Chem 18(3):287–294CrossRefPubMedGoogle Scholar
  134. Claverie JM, Audic S (1996) The statistical significance of nucleotide position-weight matrix matches. Comput Appl Biosci 12(5):431–439PubMedGoogle Scholar
  135. Clayton DA (1982) Replication of animal mitochondrial DNA. Cell 28(4):693–705PubMedPubMedCentralCrossRefGoogle Scholar
  136. Clayton DA (2000) Transcription and replication of mitochondrial DNA. Hum Reprod 15(Suppl 2):11–17PubMedPubMedCentralCrossRefGoogle Scholar
  137. Cocquet J, De Baere E, Gareil M, Pannetier M, Xia X, Fellous M, Veitia RA (2003) Structure, evolution and expression of the FOXL2 transcription unit. Cytogenet Genome Res 101:206–211PubMedPubMedCentralCrossRefGoogle Scholar
  138. Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B (2003) INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res 31(13):3468–3470PubMedPubMedCentralCrossRefGoogle Scholar
  139. Coghlan A, Wolfe KH (2000) Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 16(12):1131–1145PubMedPubMedCentralCrossRefGoogle Scholar
  140. Comeron JM, Aguade M (1998) An evaluation of measures of synonymous codon usage bias. J Mol Evol 47(3):268–274PubMedPubMedCentralCrossRefGoogle Scholar
  141. Correa P (1997) Helicobacter pylori as a pathogen and carcinogen. J Physiol Pharmacol 48(Suppl 4):19–24PubMedPubMedCentralGoogle Scholar
  142. Cottrell JS (1994) Protein identification by peptide mass fingerprinting. Pept Res 7(3):115–124PubMedPubMedCentralGoogle Scholar
  143. Cottrell JS, Sutton CW (1996) The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods Mol Biol 61:67–82PubMedPubMedCentralGoogle Scholar
  144. Covacci A, Falkow S, Berg DE, Rappuoli R (1997) Did the inheritance of a pathogenicity island modify the virulence of Helicobacter pylori? Trends Microbiol 5(5):205–208CrossRefPubMedGoogle Scholar
  145. Covell DG, Wallqvist A, Rabow AA, Thanki N (2003) Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. Mol Cancer Ther 2(3):317–332PubMedGoogle Scholar
  146. Cox SS, van der Giezen M, Tarr SJ, Crompton MR, Tovar J (2006) Evidence from bioinformatics, expression and inhibition studies of phosphoinositide-3 kinase signalling in Giardia intestinalis. BMC Microbiol 6:45PubMedPubMedCentralCrossRefGoogle Scholar
  147. Craigen WJ, Caskey CT (1986) Expression of peptide chain release factor 2 requires high-efficiency frameshift. Nature 322(6076):273–275PubMedPubMedCentralCrossRefGoogle Scholar
  148. Craigen WJ, Caskey CT (1987) The function, structure and regulation of E. coli peptide chain release factors. Biochimie 69(10):1031–1041PubMedPubMedCentralCrossRefGoogle Scholar
  149. Craigen WJ, Cook RG, Tate WP, Caskey CT (1985) Bacterial peptide chain release factors: conserved primary structure and possible frameshift regulation of release factor 2. Proc Natl Acad Sci USA 82(11):3616–3620PubMedPubMedCentralCrossRefGoogle Scholar
  150. Craigen WJ, Lee CC, Caskey CT (1990) Recent advances in peptide chain termination. Mol Microbiol 4(6):861–865PubMedPubMedCentralCrossRefGoogle Scholar
  151. Crick FH (1966) Codon—anticodon pairing: the wobble hypothesis. J Mol Biol 19(2):548–555PubMedPubMedCentralCrossRefGoogle Scholar
  152. Curran JF, Yarus M (1988) Use of tRNA suppressors to probe regulation of Escherichia coli release factor 2. J Mol Biol 203(1):75–83PubMedPubMedCentralCrossRefGoogle Scholar
  153. Czerwoniec A, Dunin-Horkawicz S, Purta E, Kaminska KH, Kasprzak JM, Bujnicki JM, Grosjean H, Rother K (2009) MODOMICS: a database of RNA modification pathways. 2008 update. Nucleic Acids Res 37(Database issue):D118–D121PubMedPubMedCentralCrossRefGoogle Scholar
  154. Danchin A (2002) The Delphic boat : what genomes tell us. Harvard University Press, Cambridge, MAGoogle Scholar
  155. David E, Tramontin T, Zemmel R (2009) Pharmaceutical R&D: the road to positive returns. Nat Rev Drug Discov 8(8):609–610CrossRefPubMedGoogle Scholar
  156. Davies J, Jones DS, Khorana HG (1966) A further study of misreading of codons induced by streptomycin and neomycin using ribopolynucleotides containing two nucleotides in alternating sequence as templates. J Mol Biol 18(1):48–57PubMedPubMedCentralCrossRefGoogle Scholar
  157. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC, pp 345–352Google Scholar
  158. Delorenzi M, Speed T (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18(4):617–625CrossRefPubMedGoogle Scholar
  159. Deng R, Huang M, Wang J, Huang Y, Yang J, Feng J, Wang X (2006) PTreeRec: phylogenetic tree reconstruction based on genome BLAST distance. Comput Biol Chem 30(4):300–302CrossRefPubMedGoogle Scholar
  160. Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149(6):1233–1244PubMedPubMedCentralCrossRefGoogle Scholar
  161. Deng Q, Ramskold D, Reinius B, Sandberg R (2014a) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196CrossRefPubMedGoogle Scholar
  162. Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al (2014b) Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158(4):849–860PubMedPubMedCentralCrossRefGoogle Scholar
  163. Desper R, Gascuel O (2002) Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol 9(5):687–705CrossRefPubMedGoogle Scholar
  164. Dewey CN, Rogozin IB, Koonin EV (2006) Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics 7:311PubMedPubMedCentralCrossRefGoogle Scholar
  165. Diehn M, Eisen MB, Botstein D, Brown PO (2000) Large-scale identification of secreted and membrane-associated gene products using DNA microarrays. Nat Genet 25(1):58–62PubMedPubMedCentralCrossRefGoogle Scholar
  166. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21PubMedPubMedCentralCrossRefGoogle Scholar
  167. Dobzhansky T (1973) Nothing in biology makes sense except in the light of evolution. Am Biol Teach 35:125–129CrossRefGoogle Scholar
  168. Donly BC, Edgar CD, Adamski FM, Tate WP (1990) Frameshift autoregulation in the gene for Escherichia coli release factor 2: partly functional mutants result in frameshift enhancement. Nucleic Acids Res 18(22):6517–6522PubMedPubMedCentralCrossRefGoogle Scholar
  169. Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN (1983) Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science 221(4607):275–277PubMedPubMedCentralCrossRefGoogle Scholar
  170. Dorokhov YL, Skulachev MV, Ivanov PA, Zvereva SD, Tjulkina LG, Merits A, Gleba YY, Hohn T, Atabekov JG (2002) Polypurine (A)-rich sequences promote cross-kingdom conservation of internal ribosome entry. Proc Natl Acad Sci USA 99(8):5301–5306PubMedPubMedCentralCrossRefGoogle Scholar
  171. dos Reis M, Savva R, Wernisch L (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32(17):5036–5044 Print 2004PubMedPubMedCentralCrossRefGoogle Scholar
  172. Doudna JA, Sarnow P (2007) Translation initiation by viral internal ribosome entry sites. In: Mathews MB, Sonenberg N, Hershey J (eds) Translational control in biology and medicine. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 129–154Google Scholar
  173. Drews J, Ryser S (1997) The role of innovation in drug development. Nat Biotechnol 15(13):1318–1319PubMedPubMedCentralCrossRefGoogle Scholar
  174. Drouin G, Daoud H, Xia J (2008) Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol 49(3):827–831PubMedPubMedCentralCrossRefGoogle Scholar
  175. Drummond A, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7(1):214PubMedPubMedCentralCrossRefGoogle Scholar
  176. Drummond A, Rodrigo AG (2000) Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. Mol Biol Evol 17(12):1807–1815PubMedCrossRefGoogle Scholar
  177. Drummond A, Forsberg R, Rodrigo AG (2001) The inference of stepwise changes in substitution rates using serial sequence samples. Mol Biol Evol 18(7):1365–1371PubMedCrossRefGoogle Scholar
  178. Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG (2003a) Measurably evolving populations. Trends Ecol Evol 18(9):481–488CrossRefGoogle Scholar
  179. Drummond A, Pybus OG, Rambaut A (2003b) Inference of viral evolutionary rates from molecular sequences. Adv Parasitol 54:331–358PubMedCrossRefGoogle Scholar
  180. Durbin R (1998) Biological sequence analysis : probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  181. Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96(8):4482–4487PubMedCrossRefGoogle Scholar
  182. DuRose JB, Scheuner D, Kaufman RJ, Rothblum LI, Niwa M (2009) Phosphorylation of eukaryotic translation initiation factor 2alpha coordinates rRNA transcription and translation inhibition during endoplasmic reticulum stress. Mol Cell Biol 29(15):4295–4307PubMedPubMedCentralCrossRefGoogle Scholar
  183. Duval M, Korepanov A, Fuchsbauer O, Fechter P, Haller A, Fabbretti A, Choulier L, Micura R, Klaholz BP, Romby P et al (2013) Escherichia coli Ribosomal protein S1 unfolds structured mRNAs onto the ribosome for active translation initiation. PLoS Biol 11(12):e1001731PubMedPubMedCentralCrossRefGoogle Scholar
  184. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA et al (2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 38(12):1378–1385PubMedPubMedCentralCrossRefGoogle Scholar
  185. Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365PubMedPubMedCentralCrossRefGoogle Scholar
  186. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763PubMedPubMedCentralCrossRefGoogle Scholar
  187. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797PubMedPubMedCentralCrossRefGoogle Scholar
  188. Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373PubMedCrossRefGoogle Scholar
  189. Efron B (1982) The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefGoogle Scholar
  190. Ehnman M, Missiaglia E, Folestad E, Selfe J, Strell C, Thway K, Brodin B, Pietras K, Shipley J, Ostman A et al (2013) Distinct effects of ligand-induced PDGFRalpha and PDGFRbeta signaling in the human rhabdomyosarcoma tumor cell and stroma cell compartments. Cancer Res 73(7):2139–2149PubMedPubMedCentralCrossRefGoogle Scholar
  191. Ehrenberg M, Tenson T (2002) A new beginning of the end of translation. Nat Struct Biol 9(2):85–87PubMedPubMedCentralCrossRefGoogle Scholar
  192. Einstein A, Russell B, Dewey J, Millikan RA, Dreiser T, Wells HG, Nansen F, Jeans SJ, Babbitt I, Keith SA et al (1931) Living philosophies. Simon and Schuster, New YorkGoogle Scholar
  193. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868PubMedPubMedCentralCrossRefGoogle Scholar
  194. Elf J, Nilsson D, Tenson T, Ehrenberg M (2003) Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300(5626):1718–1722PubMedCrossRefGoogle Scholar
  195. Elroy-Stein O, Merrick W (2007) Translation initiation via cellular internal ribosome entry sites. In: Mathews MB, Sonenberg N, Hershey J (eds) Translational control in biology and medicine. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 155–172Google Scholar
  196. Engel E, Peskoff A, Kauffman GL Jr, Grossman MI (1984) Analysis of hydrogen ion concentration in the gastric gel mucus layer. Am J Phys 247(4 Pt 1):G321–G338Google Scholar
  197. Engelberg-Kulka H (1981) UGA suppression by normal tRNA Trp in Escherichia coli: codon context effects. Nucleic Acids Res 9(4):983–991PubMedPubMedCentralCrossRefGoogle Scholar
  198. Epstein CB, Butow RA (2000) Microarray technology – enhanced versatility, persistent challenge. Curr Opin Biotechnol 11(1):36–41PubMedPubMedCentralCrossRefGoogle Scholar
  199. Eswarappa SM, Potdar AA, Koch WJ, Fan Y, Vasu K, Lindner D, Willard B, Graham LM, DiCorleto PE, Fox PL (2014) Programmed translational readthrough generates antiangiogenic VEGF-Ax. Cell 157(7):1605–1618PubMedPubMedCentralCrossRefGoogle Scholar
  200. Evans T, Felsenfeld G, Reitman M (1990) Control of globin gene transcription. Annu Rev Cell Biol 6:95–124CrossRefPubMedGoogle Scholar
  201. Eyre-Walker A (1996) The close proximity of Escherichia coli genes: consequences for stop codon and synonymous codon use. J Mol Evol 42(2):73–78PubMedPubMedCentralCrossRefGoogle Scholar
  202. Eyre-Walker A, Bulmer M (1993) Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res 21:4599–4603PubMedPubMedCentralCrossRefGoogle Scholar
  203. Ezzell C (2002) Proteins rule. Sci Am 286(4):40–47CrossRefPubMedGoogle Scholar
  204. Farazi TA, Waksman G, Gordon JI (2001) The biology and enzymology of protein N-myristoylation. J Biol Chem 276(43):39501–39504PubMedPubMedCentralCrossRefGoogle Scholar
  205. Farnham PJ, Platt T (1981) Rho-independent termination: dyad symmetry in DNA causes RNA polymerase to pause during transcription in vitro. Nucleic Acids Res 9(3):563–577PubMedPubMedCentralCrossRefGoogle Scholar
  206. Fasman GD, Chou PY (1974) Prediction of protein conformation: consequences and aspirations. In: Blout ER, Bovey FA, Goodman M, Latan N (eds) Peptides, polypeptides and proteins. Wiley, New York, pp 114–125Google Scholar
  207. Fatemi M, Hermann A, Pradhan S, Jeltsch A (2001) The activity of the murine DNA methyltransferase Dnmt1 is controlled by interaction of the catalytic domain with the N-terminal part of the enzyme leading to an allosteric activation of the enzyme after binding to methylated DNA. J Mol Biol 309(5):1189–1199PubMedPubMedCentralCrossRefGoogle Scholar
  208. Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249CrossRefGoogle Scholar
  209. Felsenstein J (1978a) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410CrossRefGoogle Scholar
  210. Felsenstein J (1978b) The number of evolutionary trees. Syst Zool 27:27–33CrossRefGoogle Scholar
  211. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376CrossRefPubMedGoogle Scholar
  212. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791CrossRefGoogle Scholar
  213. Felsenstein J (2004) Inferring phylogenies. Sinauer, SunderlandGoogle Scholar
  214. Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104CrossRefPubMedGoogle Scholar
  215. Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360CrossRefPubMedGoogle Scholar
  216. Feng DF, Doolittle RF (1990) Progressive alignment and phylogenetic tree construction of protein sequences. Methods Enzymol 183:375–387PubMedPubMedCentralCrossRefGoogle Scholar
  217. Fernandez-Pinar R, Lo Sciuto A, Rossi A, Ranucci S, Bragonzi A, Imperi F (2015) In vitro and in vivo screening for novel essential cell-envelope proteins in Pseudomonas aeruginosa. Sci Rep 5:17593PubMedPubMedCentralCrossRefGoogle Scholar
  218. Fickett JW (1996) Quantitative discrimination of MEF2 sites. Mol Cell Biol 16(1):437–441PubMedPubMedCentralCrossRefGoogle Scholar
  219. Figeys D (2002) Adapting arrays and lab-on-a-chip technology for proteomics. Proteomics 2(4):373–382CrossRefPubMedGoogle Scholar
  220. Figeys D (2003a) Novel approaches to map protein interactions. Curr Opin Biotechnol 14(1):119–125CrossRefPubMedGoogle Scholar
  221. Figeys D (2003b) Proteomics in 2002: a year of technical development and wide-ranging applications. Anal Chem 75(12):2891–2905CrossRefPubMedGoogle Scholar
  222. Fisher RA (1926) The arrangement of field experiments. J Minist Agric 33:503–513Google Scholar
  223. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188CrossRefGoogle Scholar
  224. Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416CrossRefGoogle Scholar
  225. Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155:279–284PubMedPubMedCentralCrossRefGoogle Scholar
  226. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512CrossRefPubMedPubMedCentralGoogle Scholar
  227. Fong TC, Emerson BM (1992) The erythroid-specific protein cGATA-1 mediates distal enhancer activity through a specialized beta-globin TATA box. Genes Dev 6(4):521–532CrossRefPubMedGoogle Scholar
  228. Forde CE, McCutchen-Maloney SL (2002) Characterization of transcription factors by mass spectrometry and the role of SELDI-MS. Mass Spectrom Rev 21(6):419–439PubMedPubMedCentralCrossRefGoogle Scholar
  229. Forrester WC, Epner E, Driscoll MC, Enver T, Brice M, Papayannopoulou T, Groudine M (1990) A deletion of the human beta-globin locus activation region causes a major alteration in chromatin structure and replication across the entire beta-globin locus. Genes Dev 4(10):1637–1649PubMedPubMedCentralCrossRefGoogle Scholar
  230. Frank C, Makkonen H, Dunlop TW, Matilainen M, Vaisanen S, Carlberg C (2005) Identification of pregnane X receptor binding sites in the regulatory regions of genes involved in bile acid homeostasis. J Mol Biol 346(2):505–519CrossRefPubMedGoogle Scholar
  231. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270(5235):397–403CrossRefPubMedPubMedCentralGoogle Scholar
  232. Frederico LA, Kunkel TA, Shaw BR (1990) A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry (Mosc) 29(10):2532–2537CrossRefGoogle Scholar
  233. Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26(12):2941–2947PubMedPubMedCentralCrossRefGoogle Scholar
  234. Frolova LY, Tsivkovskii RY, Sivolobova GF, Oparina NY, Serpinsky OI, Blinov VM, Tatkov SI, Kisselev LL (1999) Mutations in the highly conserved GGQ motif of class 1 polypeptide release factors abolish ability of human eRF1 to trigger peptidyl-tRNA hydrolysis. RNA 5(8):1014–1020PubMedPubMedCentralCrossRefGoogle Scholar
  235. Frottin F, Martinez A, Peynot P, Mitra S, Holz RC, Giglione C, Meinnel T (2006) The proteomics of N-terminal methionine cleavage. Mol Cell Proteomics 5(12):2336–2349PubMedPubMedCentralCrossRefGoogle Scholar
  236. Furukawa R, Hachiya T, Ohmomo H, Shiwa Y, Ono K, Suzuki S, Satoh M, Hitomi J, Sobue K, Shimizu A (2016) Intraindividual dynamics of transcriptome and genome-wide stability of DNA methylation. Sci Rep 6:26424PubMedPubMedCentralCrossRefGoogle Scholar
  237. Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI (1999) A sampling of the yeast proteome. Mol Cell Biol 19(11):7357–7368PubMedPubMedCentralCrossRefGoogle Scholar
  238. Gaasterland T, Bekiranov S (2000) Making the most of microarray data [news]. Nat Genet 24(3):204–206PubMedPubMedCentralCrossRefGoogle Scholar
  239. Gallie DR, Tanguay R (1994) Poly(A) binds to initiation factors and increases cap-dependent translation in vitro. J Biol Chem 269(25):17166–17173PubMedPubMedCentralGoogle Scholar
  240. Gal-Mor O, Finlay BB (2006) Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell Microbiol 8(11):1707–1719CrossRefPubMedGoogle Scholar
  241. Galtier N, Lobry JR (1997) Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J Mol Evol 44(6):632–636CrossRefPubMedGoogle Scholar
  242. Gao L, Qi J (2007) Whole genome molecular phylogeny of large dsDNA viruses using composition vector method. BMC Evol Biol 7:41PubMedPubMedCentralCrossRefGoogle Scholar
  243. Gapp K, Jawaid A, Sarkies P, Bohacek J, Pelczar P, Prados J, Farinelli L, Miska E, Mansuy IM (2014) Implication of sperm RNAs in transgenerational inheritance of the effects of early trauma in mice. Nat Neurosci 17(5):667–669PubMedPubMedCentralCrossRefGoogle Scholar
  244. Gascuel O, Steel M (2006) Neighbor-joining revealed. Mol Biol Evol 23(11):1997–2000PubMedPubMedCentralCrossRefGoogle Scholar
  245. Ge Y, Sealfon SC, Speed TP (2008) Some step-down procedures controlling the false discovery rate under dependence. Stat Sin 18(3):881–904PubMedPubMedCentralGoogle Scholar
  246. Geller AI, Rich A (1980) A UGA termination suppression tRNATrp active in rabbit reticulocytes. Nature 283(5742):41–46PubMedPubMedCentralCrossRefGoogle Scholar
  247. Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741CrossRefPubMedGoogle Scholar
  248. Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, Weissman JS (2003) Global analysis of protein expression in yeast. Nature 425(6959):737–741PubMedPubMedCentralCrossRefGoogle Scholar
  249. Gibbs JB (2000) Mechanism-based target identification and drug discovery in cancer research. Science 287(5460):1969–1973CrossRefPubMedGoogle Scholar
  250. Giglione C, Vallon O, Meinnel T (2003) Control of protein life-span by N-terminal methionine excision. EMBO J 22(1):13–23PubMedPubMedCentralCrossRefGoogle Scholar
  251. Giglione C, Boularot A, Meinnel T (2004) Protein N-terminal methionine excision. Cell Mol Life Sci 61(12):1455–1474PubMedPubMedCentralCrossRefGoogle Scholar
  252. Gilbert WV (2010) Alternative ways to think about cellular internal ribosome entry. J Biol Chem 285(38):29033–29038PubMedPubMedCentralCrossRefGoogle Scholar
  253. Gilbert WV, Zhou K, Butler TK, Doudna JA (2007) Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317(5842):1224–1227CrossRefPubMedGoogle Scholar
  254. Gillespie JH (1991) The causes of molecular evolution. Oxford University Press, OxfordGoogle Scholar
  255. Gojobori T, Li WH, Graur D (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18(5):360–369PubMedPubMedCentralCrossRefGoogle Scholar
  256. Gonzalez B, Ceciliani F, Galizzi A (2003) Growth at low temperature suppresses readthrough of the UGA stop codon during the expression of Bacillus subtilis flgM gene in Escherichia coli. J Biotechnol 101(2):173–180PubMedPubMedCentralCrossRefGoogle Scholar
  257. Gorodkin J, Heyer LJ, Brunak S, Stormo GD (1997) Displaying the information contents of structural RNA alignments: the structure logos. Comput Appl Biosci 13(6):583–586PubMedGoogle Scholar
  258. Goto M, Washio T, Tomita M (2000) Causal analysis of CpG suppression in the Mycoplasma genome. Microb Comp Genomics 5(1):51–58PubMedPubMedCentralCrossRefGoogle Scholar
  259. Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708CrossRefPubMedGoogle Scholar
  260. Gould SJ, Vrba ES (1982) Exaptation – a missing term in the science of form. Paleobiology 8:4–15CrossRefGoogle Scholar
  261. Gouy M (1987) Codon contexts in enterobacterial and coliphage genes. Mol Biol Evol 4(4):426–444PubMedPubMedCentralGoogle Scholar
  262. Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7064PubMedPubMedCentralCrossRefGoogle Scholar
  263. Gowri-Shankar V, Rattray M (2007) A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Mol Biol Evol 24(6):1286–1299CrossRefPubMedGoogle Scholar
  264. Grahn AM, Butcher SJ, Bamford JKH, Bamford DH (2006) PRD1: dissecting the genome, structure and entry. In: Calendar R (ed) The bacteriophages. Oxford University Press, Oxford, pp 176–185Google Scholar
  265. Gramm J, Niedermeier R (2002) Breakpoint medians and breakpoint phylogenies: a fixed-parameter approach. Bioinformatics 18(Suppl 2):S128–S139PubMedPubMedCentralCrossRefGoogle Scholar
  266. Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864CrossRefPubMedGoogle Scholar
  267. Graveley BR (2005) Mutually exclusive splicing of the insect Dscam pre-mRNA directed by competing intronic RNA secondary structures. Cell 123(1):65–73PubMedPubMedCentralCrossRefGoogle Scholar
  268. Grech B, Maetschke S, Mathews S, Timms P (2007) Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol 158(8–9):685–693CrossRefPubMedGoogle Scholar
  269. Grigg GW (1996) Sequencing 5-methylcytosine residues by the bisulphite method. DNA Seq 6(4):189–198PubMedPubMedCentralCrossRefGoogle Scholar
  270. Grigg G, Clark S (1994) Sequencing 5-methylcytosine residues in genomic DNA. BioEssays 16(6):431–436PubMedPubMedCentralCrossRefGoogle Scholar
  271. Grosjean H, Marck C, de Crecy-Lagard V (2007) The various strategies of codon decoding in organisms of the three domains of life: evolutionary implications. Nucleic Acids Symp Ser (Oxf) 51:15–16CrossRefGoogle Scholar
  272. Grosjean H, de Crecy-Lagard V, Marck C (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584(2):252–264PubMedPubMedCentralCrossRefGoogle Scholar
  273. Grossi de Sa MF, Standart N, Martins de Sa C, Akhayat O, Huesca M, Scherrer K (1988) The poly(A)-binding protein facilitates in vitro translation of poly(A)-rich mRNA. Eur J Biochem 176(3):521–526PubMedPubMedCentralCrossRefGoogle Scholar
  274. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321CrossRefGoogle Scholar
  275. Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New YorkGoogle Scholar
  276. Gupta SK, Kececioglu JD, Schaffer AA (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol 2(3):459–472CrossRefPubMedGoogle Scholar
  277. Gusfield D (1997) Algorithms on strings, trees, and sequences : computer science and computational biology. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  278. Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19(3):1720–1730PubMedPubMedCentralCrossRefGoogle Scholar
  279. Haas J, Park E-C, Seed B (1996) Codon usage limitation in the expression of HIV-1 envelope glycoprotein. Curr Biol 6(3):315–324PubMedPubMedCentralCrossRefGoogle Scholar
  280. Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679CrossRefPubMedGoogle Scholar
  281. Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23(6):1089–1097CrossRefPubMedGoogle Scholar
  282. Hamajima N, Goto Y, Nishio K, Tanaka D, Kawai S, Sakakibara H, Kondo T (2004) Helicobacter pylori eradication as a preventive tool against gastric cancer. Asian Pac J Cancer Prev 5(3):246–252PubMedPubMedCentralGoogle Scholar
  283. Hanada K, Suzuki Y, Gojobori T (2004) A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes. Mol Biol Evol 21(6):1074–1080PubMedPubMedCentralCrossRefGoogle Scholar
  284. Hartigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  285. Hasegawa M, Kishino H (1989) Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Jpn J Genet 64(4):243–258PubMedPubMedCentralCrossRefGoogle Scholar
  286. Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22(2):160–174PubMedPubMedCentralCrossRefGoogle Scholar
  287. Haustead DJ, Stevenson A, Saxena V, Marriage F, Firth M, Silla R, Martin L, Adcroft KF, Rea S, Day PJ et al (2016) Transcriptome analysis of human ageing in male skin shows mid-life period of variability and central role of NF-kappaB. Sci Rep 6:26846PubMedPubMedCentralCrossRefGoogle Scholar
  288. Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8(11):1154–1171PubMedPubMedCentralCrossRefGoogle Scholar
  289. Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216PubMedPubMedCentralCrossRefGoogle Scholar
  290. Hein J (1990) A unified approach to phylogenies and alignments. Methods Enzymol 183:625–644Google Scholar
  291. Hein J (1994) TreeAlign. Methods Mol Biol 25:349–364PubMedGoogle Scholar
  292. Hendy MD, Penny D (1982) Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci 60:133–142CrossRefGoogle Scholar
  293. Hendy MD, Penny D (1989) A framework for the quantitative study of evolutionary trees. Syst Zool 38:297–309CrossRefGoogle Scholar
  294. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919PubMedPubMedCentralCrossRefGoogle Scholar
  295. Henz SR, Huson DH, Auch AF, Nieselt-Struwe K, Schuster SC (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335PubMedPubMedCentralCrossRefGoogle Scholar
  296. Herman JL, Challis CJ, Novak A, Hein J, Schmidler SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 31(9):2251–2266PubMedPubMedCentralCrossRefGoogle Scholar
  297. Hernández G (2008) Was the initiation of translation in early eukaryotes IRES-driven? Trends Biochem Sci 33(2):58PubMedPubMedCentralCrossRefGoogle Scholar
  298. Hernandez G, Vazquez-Pianzola P, Sierra JM, Rivera-Pomar R (2004) Internal ribosome entry site drives cap-independent translation of reaper and heat shock protein 70 mRNAs in Drosophila embryos. RNA 10(11):1783–1797PubMedPubMedCentralCrossRefGoogle Scholar
  299. Herniou EA, Luque T, Chen X, Vlak JM, Winstanley D, Cory JS, O’Reilly DR (2001) Use of whole genome sequence data to infer baculovirus phylogeny. J Virol 75(17):8117–8126PubMedPubMedCentralCrossRefGoogle Scholar
  300. Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577CrossRefPubMedGoogle Scholar
  301. Hertz GZ, Hartzell GW 3rd, Stormo GD (1990) Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 6(2):81–92PubMedGoogle Scholar
  302. Hertzberg L, Izraeli S, Domany E (2007) STOP: searching for transcription factor motifs using gene expression. Bioinformatics 23(14):1737–1743CrossRefPubMedGoogle Scholar
  303. Hiard S, Maree R, Colson S, Hoskisson PA, Titgemeyer F, van Wezel GP, Joris B, Wehenkel L, Rigali S (2007) PREDetector: a new tool to identify regulatory elements in bacterial genomes. Biochem Biophys Res Commun 357(4):861–864CrossRefPubMedGoogle Scholar
  304. Hickson RE, Simon C, Perrey SW (2000) The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol Biol Evol 17(4):530–539CrossRefPubMedGoogle Scholar
  305. Higashi K, Kashiwagi K, Taniguchi S, Terui Y, Yamamoto K, Ishihama A, Igarashi K (2006) Enhancement of +1 frameshift by polyamines during translation of polypeptide release factor 2 in Escherichia coli. J Biol Chem 281(14):9527–9537CrossRefPubMedGoogle Scholar
  306. Higgins DG (1994) CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol Biol 25:307–318PubMedGoogle Scholar
  307. Higgs PG, Attwood TK (2005) Bioinformatics and molecular evolution. Blackwell, MaldenGoogle Scholar
  308. Higgs PG, Ran W (2008) Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon usage. Mol Biol Evol 25(11):2279–2291PubMedPubMedCentralCrossRefGoogle Scholar
  309. Hiller K, Grote A, Scheer M, Munch R, Jahn D (2004) PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res 32(Web Server issue):W375–W379PubMedPubMedCentralCrossRefGoogle Scholar
  310. Hirao I, Kimoto M (2010) Expansion of the genetic alphabet in nucleic acids by creating new base pairs. In: Mayer G (ed) The chemical biology of nucleic acids. Wiley, Chichester, pp 39–62CrossRefGoogle Scholar
  311. Hirsh D, Gold L (1971) Translation of the UGA triplet in vitro by tryptophan transfer RNA’s. J Mol Biol 58(2):459–468PubMedPubMedCentralCrossRefGoogle Scholar
  312. Hirst JD, Sternberg MJ (1991) Prediction of ATP/GTP-binding motif: a comparison of a perceptron type neural network and a consensus sequence method [corrected]. Protein Eng 4(6):615–623CrossRefPubMedGoogle Scholar
  313. Hoagland MB, Stephenson ML, Scott JF, Hecht LI, Zamecnik PC (1958) A soluble ribonucleic acid intermediate in protein synthesis. J Biol Chem 231(1):241–257PubMedPubMedCentralGoogle Scholar
  314. Hobolth A, Christensen OF, Mailund T, Schierup MH (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3(2):e7PubMedPubMedCentralCrossRefGoogle Scholar
  315. Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13):3429–3431PubMedPubMedCentralCrossRefGoogle Scholar
  316. Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319(5):1059–1066PubMedPubMedCentralCrossRefGoogle Scholar
  317. Hofer A, Steverding D, Chabes A, Brun R, Thelander L (2001) Trypanosoma brucei CTP synthetase: a target for the treatment of African sleeping sickness. Proc Natl Acad Sci U S A 98(11):6412–6416PubMedPubMedCentralCrossRefGoogle Scholar
  318. Hogeweg P, Hesper aB (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186CrossRefPubMedGoogle Scholar
  319. Holmes I, Bruno WJ (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9):803–820CrossRefPubMedGoogle Scholar
  320. Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95(5):717–728 Transcriptomic data at http://web.wi.mit.edu/young/pub/data/orf_transcriptome.txtPubMedPubMedCentralCrossRefGoogle Scholar
  321. Hou C, Zhao H, Tanimoto K, Dean A (2008) CTCF-dependent enhancer-blocking by alternative chromatin loop formation. Proc Natl Acad Sci U S A 105(51):20398–20403PubMedPubMedCentralCrossRefGoogle Scholar
  322. Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8):721–728CrossRefPubMedGoogle Scholar
  323. Hudson RR (1992) Gene trees, species trees and the segregation of ancestral alleles. Genetics 131(2):509–513PubMedPubMedCentralGoogle Scholar
  324. Huelsenbeck JP, Larget B, Alfaro ME (2004) Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Mol Biol Evol 21(6):1123–1133PubMedPubMedCentralCrossRefGoogle Scholar
  325. Hughes D (1987) Mutant forms of tufA and tufB independently suppress nonsense mutations. J Mol Biol 197(4):611–615CrossRefPubMedGoogle Scholar
  326. Hui A, de Boer HA (1987) Specialized ribosome system: preferential translation of a single mRNA species by a subpopulation of mutated ribosomes in Escherichia coli. Proc Natl Acad Sci U S A 84(14):4762–4766PubMedPubMedCentralCrossRefGoogle Scholar
  327. Hunt RH (2004) Will eradication of Helicobacter pylori infection influence the risk of gastric cancer? Am J Med 117(Suppl 5A):86S–91SPubMedPubMedCentralGoogle Scholar
  328. Hurst LD, Merchant AR (2001) High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc R Soc Lond B 268:493–497CrossRefGoogle Scholar
  329. Huynen M, Dandekar T, Bork P (1998) Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett 426(1):1–5PubMedPubMedCentralCrossRefGoogle Scholar
  330. Hwang S, Gou Z, Kuznetsov IB (2007) DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5):634–636CrossRefPubMedGoogle Scholar
  331. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119CrossRefGoogle Scholar
  332. Igarashi K, Kashiwagi K (2006) Polyamine Modulon in Escherichia coli: genes involved in the stimulation of cell growth by polyamines. J Biochem 139(1):11–16CrossRefPubMedPubMedCentralGoogle Scholar
  333. Ikemura T (1981a) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol 146:1–21PubMedPubMedCentralCrossRefGoogle Scholar
  334. Ikemura T (1981b) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E coli translational system. J Mol Biol 151:389–409PubMedPubMedCentralCrossRefGoogle Scholar
  335. Ikemura T (1982) Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol 158(4):573–597PubMedPubMedCentralCrossRefGoogle Scholar
  336. Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34PubMedPubMedCentralGoogle Scholar
  337. Ikemura T (1992) Correlation between codon usage and tRNA content in microorganisms. In: Hatfield DL, Lee BJ, Pirtle RM (eds) Transfer RNA in protein synthesis. CRC Press, Boca Raton, pp 87–111Google Scholar
  338. Ilkow CS, Mancinelli V, Beatch MD, Hobman TC (2008) Rubella virus capsid protein interacts with poly(a)-binding protein and inhibits translation. J Virol 82(9):4284–4294PubMedPubMedCentralCrossRefGoogle Scholar
  339. Ingolia NT (2010) Genome-wide translational profiling by ribosome footprinting. Methods Enzymol 470:119–142CrossRefPubMedGoogle Scholar
  340. Ingolia NT (2014) Ribosome profiling: new views of translation, from single codons to genome scale. Nat Rev Genet 15(3):205–213CrossRefPubMedGoogle Scholar
  341. Ingolia NT (2016) Ribosome footprint profiling of translation throughout the Genome. Cell 165(1):22–33PubMedPubMedCentralCrossRefGoogle Scholar
  342. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223PubMedPubMedCentralCrossRefGoogle Scholar
  343. Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802PubMedPubMedCentralCrossRefGoogle Scholar
  344. Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, Wills MR, Weissman JS (2014) Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep 8(5):1365–1379PubMedPubMedCentralCrossRefGoogle Scholar
  345. Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178(4537):792–794CrossRefPubMedGoogle Scholar
  346. Ingram VM (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180(4581):326–328CrossRefPubMedGoogle Scholar
  347. Ingrosso D, Perna AF (2009) Epigenetics in hyperhomocysteinemic states. A special focus on uremia. Biochim Biophys Acta 1790(9):892–899PubMedPubMedCentralCrossRefGoogle Scholar
  348. Ingrosso D, Cimmino A, Perna AF, Masella L, De Santo NG, De Bonis ML, Vacca M, D’Esposito M, D’Urso M, Galletti P et al (2003) Folate treatment and unbalanced methylation and changes of allelic expression induced by hyperhomocysteinaemia in patients with uraemia. Lancet 361(9370):1693–1699PubMedPubMedCentralCrossRefGoogle Scholar
  349. Ink BS, Pickup DJ (1990) Vaccinia virus directs the synthesis of early mRNAs containing 5′ poly(A) sequences. Proc Natl Acad Sci U S A 87(4):1536–1540PubMedPubMedCentralCrossRefGoogle Scholar
  350. Insinga A, Minucci S, Pelicci PG (2005a) Mechanisms of selective anticancer action of histone deacetylase inhibitors. Cell Cycle 4(6):741–743PubMedPubMedCentralCrossRefGoogle Scholar
  351. Insinga A, Monestiroli S, Ronzoni S, Gelmetti V, Marchesi F, Viale A, Altucci L, Nervi C, Minucci S, Pelicci PG (2005b) Inhibitors of histone deacetylases induce tumor-selective apoptosis through activation of the death receptor pathway. Nat Med 11(1):71–76PubMedPubMedCentralCrossRefGoogle Scholar
  352. Ito T, Bulger M, Pazin MJ, Kobayashi R, Kadonaga JT (1997) ACF, an ISWI-containing and ATP-utilizing chromatin assembly and remodeling factor. Cell 90(1):145–155PubMedPubMedCentralCrossRefGoogle Scholar
  353. Ito K, Uno M, Nakamura Y (2000) A tripeptide ‘anticodon’ deciphers stop codons in messenger RNA. Nature 403(6770):680–684PubMedPubMedCentralCrossRefGoogle Scholar
  354. Jackson RJ, Hellen CU, Pestova TV (2010) The mechanism of eukaryotic translation initiation and principles of its regulation. Nat Rev Mol Cell Biol 11(2):113–127PubMedPubMedCentralCrossRefGoogle Scholar
  355. Jacob F (1982) The possible and the actual. University of Washington Press, Seattle, p 70Google Scholar
  356. Jacob F (1988) The statue within: an autobiography. Basic Books, Inc., New YorkGoogle Scholar
  357. Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356CrossRefPubMedPubMedCentralGoogle Scholar
  358. Jacobson A, Favreau M (1983) Possible involvement of poly(A) in protein synthesis. Nucleic Acids Res 11(18):6353–6368PubMedPubMedCentralCrossRefGoogle Scholar
  359. James P, Quadroni M, Carafoli E, Gonnet G (1994) Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci 3(8):1347–1350PubMedPubMedCentralCrossRefGoogle Scholar
  360. Jan E, Sarnow P (2002) Factorless ribosome assembly on the internal ribosome entry site of cricket paralysis virus. J Mol Biol 324(5):889–902PubMedPubMedCentralCrossRefGoogle Scholar
  361. Jan E, Thompson SR, Wilson JE, Pestova TV, Hellen CU, Sarnow P (2001) Initiator Met-tRNA-independent translation mediated by an internal ribosome entry site element in cricket paralysis virus-like insect viruses. Cold Spring Harb Symp Quant Biol 66:285–292PubMedPubMedCentralCrossRefGoogle Scholar
  362. Janin L, Schulz-Trieglaff O, Cox AJ (2014) BEETL-fastq: a searchable compressed archive for DNA reads. Bioinformatics 30(19):2796–2801PubMedPubMedCentralCrossRefGoogle Scholar
  363. Jank P, Shindo-Okada N, Nishimura S, Gross HJ (1977) Rabbit liver tRNA1Val:I. Primary structure and unusual codon recognition. Nucleic Acids Res 4(6):1999–2008PubMedPubMedCentralCrossRefGoogle Scholar
  364. Jayaswal V, Jermiin LS, Robinson J (2005) Estimation of phylogeny using a general markov model. Evol Bioinform Online 1:62–80CrossRefGoogle Scholar
  365. Jenkins GM, Holmes EC (2003) The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92(1):1–7PubMedPubMedCentralCrossRefGoogle Scholar
  366. Jensen JL, Hein J (2005) Gibbs sampler for statistical multiple alignment. Stat Sin 15:889–907Google Scholar
  367. Jia W, Higgs PG (2008) Codon usage in mitochondrial genomes: distinguishing context-dependent mutation from translational selection. Mol Biol Evol 25(2):339–351PubMedPubMedCentralCrossRefGoogle Scholar
  368. Jin P, Alisch RS, Warren ST (2004a) RNA and microRNAs in fragile X mental retardation. Nat Cell Biol 6(11):1048–1053PubMedPubMedCentralCrossRefGoogle Scholar
  369. Jin VX, Leu YW, Liyanarachchi S, Sun H, Fan M, Nephew KP, Huang TH, Davuluri RV (2004b) Identifying estrogen receptor alpha target genes using integrated computational genomics and chromatin immunoprecipitation microarray. Nucleic Acids Res 32(22):6627–6635PubMedPubMedCentralCrossRefGoogle Scholar
  370. Jin VX, O’Geen H, Iyengar S, Green R, Farnham PJ (2007) Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches. Genome Res 17(6):807–817PubMedPubMedCentralCrossRefGoogle Scholar
  371. Johnston TC, Parker J (1985) Streptomycin-induced, third-position misreading of the genetic code. J Mol Biol 181(2):313–315PubMedPubMedCentralCrossRefGoogle Scholar
  372. Johnston TC, Borgia PT, Parker J (1984) Codon specificity of starvation induced misreading. Mol Gen Genet MGG 195(3):459–465PubMedPubMedCentralCrossRefGoogle Scholar
  373. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282PubMedPubMedCentralGoogle Scholar
  374. Jorgensen F, Adamski FM, Tate WP, Kurland CG (1993) Release factor-dependent false stops are infrequent in Escherichia coli. J Mol Biol 230(1):41–50PubMedPubMedCentralCrossRefGoogle Scholar
  375. Josse J, Kaiser AD, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid VII. Frequencies of nearest neighbor base-sequences in deoxyribonucleic acid. J Biol Chem 236:864–875PubMedPubMedCentralGoogle Scholar
  376. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123CrossRefGoogle Scholar
  377. Kaishima M, Ishii J, Matsuno T, Fukuda N, Kondo A (2016) Expression of varied GFPs in Saccharomyces cerevisiae: codon optimization yields stronger than expected expression and fluorescence intensity. Sci Rep 6:35932PubMedPubMedCentralCrossRefGoogle Scholar
  378. Kamalakaran S, Radhakrishnan SK, Beck WT (2005) Identification of estrogen-responsive genes using a genome-wide analysis of promoter elements for transcription factor binding sites. J Biol Chem 280(22):21491–21497CrossRefPubMedGoogle Scholar
  379. Kanehisa M (2013) Molecular network analysis of diseases and drugs in KEGG. Methods Mol Biol 939:263–275PubMedPubMedCentralCrossRefGoogle Scholar
  380. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462PubMedPubMedCentralCrossRefGoogle Scholar
  381. Kaneko T, Tanaka A, Sato S, Kotani H, Sazuka T, Miyajima N, Sugiura M, Tabata S (1995) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res 2(4):153–166 191-8CrossRefPubMedGoogle Scholar
  382. Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S et al (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3(3):109–136CrossRefPubMedGoogle Scholar
  383. Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. TIG 11(7):283–290PubMedPubMedCentralCrossRefGoogle Scholar
  384. Katsafanas GC, Moss B (2007a) Colocalization of transcription and translation within cytoplasmic poxvirus factories coordinates viral expression and subjugates host functions. Cell Host Microbe 2(4):221PubMedPubMedCentralCrossRefGoogle Scholar
  385. Karlin S, Mrazek J (1996) What drives codon choices in human genes. J Mol Biol 262:459–472CrossRefPubMedGoogle Scholar
  386. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795CrossRefGoogle Scholar
  387. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9(4):286–298CrossRefPubMedGoogle Scholar
  388. Katoh K, Toh H (2010) Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 26(15):1899–1900PubMedPubMedCentralCrossRefGoogle Scholar
  389. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518PubMedPubMedCentralCrossRefGoogle Scholar
  390. Katoh K, Asimenos G, Toh H (2009) Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol 537:39–64PubMedPubMedCentralCrossRefGoogle Scholar
  391. Katsafanas GC, Moss B (2007b) Colocalization of transcription and translation within cytoplasmic poxvirus factories coordinates viral expression and subjugates host functions. Cell Host Microbe 2(4):221PubMedPubMedCentralCrossRefGoogle Scholar
  392. Kawashima T, Douglass S, Gabunilas J, Pellegrini M, Chanfreau GF (2014) Widespread use of non-productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genet 10(4):e1004249PubMedPubMedCentralCrossRefGoogle Scholar
  393. Kazan K (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci 8(10):468–471PubMedPubMedCentralCrossRefGoogle Scholar
  394. Keeling PJ, Doolittle WF (1996) A non-canonical genetic code in an early diverging eukaryotic lineage. EMBO J 15(9):2285–2290PubMedPubMedCentralCrossRefGoogle Scholar
  395. Kersulyte D, Chalkauskas H, Berg DE (1999) Emergence of recombinant strains of Helicobacter pylori during human infection. Mol Microbiol 31(1):31–43PubMedPubMedCentralCrossRefGoogle Scholar
  396. Kim H, Park H (2004) Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Proteins 54(3):557–562CrossRefPubMedGoogle Scholar
  397. Kim DW, Lee KH, Lee D (2005) Detecting clusters of different geometrical shapes in microarray gene expression data. Bioinformatics 21(9):1927–1934CrossRefPubMedGoogle Scholar
  398. Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626PubMedPubMedCentralCrossRefGoogle Scholar
  399. Kimura M (1977) Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276PubMedPubMedCentralCrossRefGoogle Scholar
  400. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120PubMedPubMedCentralCrossRefGoogle Scholar
  401. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  402. Kimura M, Ohta T (1972) On the stochastic model for estimation of mutational distance between homologous proteins. J Mol Evol 2:87–90PubMedPubMedCentralCrossRefGoogle Scholar
  403. King MC, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798PubMedPubMedCentralCrossRefGoogle Scholar
  404. Kingsford C, Patro R (2015) Reference-based compression of short-read sequences using path encoding. Bioinformatics 31(12):1920–1928PubMedPubMedCentralCrossRefGoogle Scholar
  405. Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306(5944):662–666CrossRefPubMedGoogle Scholar
  406. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179CrossRefPubMedGoogle Scholar
  407. Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Methods Enzymol 183:550–570PubMedCentralCrossRefPubMedGoogle Scholar
  408. Kjer KM (1995) Use of ribosomal-RNA secondary structure in phylogenetic studies to identify homologous positions – an example of alignment and data presentation from the frogs. Mol Phylogenet Evol 4(3):314–330CrossRefPubMedGoogle Scholar
  409. Kliman RM, Bernal CA (2005) Unusual usage of AGG and TTG codons in humans and their viruses. Gene 352:92PubMedPubMedCentralCrossRefGoogle Scholar
  410. Kobayashi H, Akitomi J, Fujii N, Kobayashi K, Altaf-Ul-Amin M, Kurokawa K, Ogasawara N, Kanaya S (2007) The entire organization of transcription units on the Bacillus subtilis genome. BMC Genomics 8:197PubMedPubMedCentralCrossRefGoogle Scholar
  411. Kodama Y, Shumway M, Leinonen R (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40(Database issue):D54–D56CrossRefPubMedGoogle Scholar
  412. Kohonen T (2001) Self-organizing maps. Springer, BerlinCrossRefGoogle Scholar
  413. Komar AA, Hatzoglou M (2005) Internal ribosome entry sites in cellular mRNAs: mystery of their existence. J Biol Chem 280(25):23425–23428PubMedPubMedCentralCrossRefGoogle Scholar
  414. Korenke GC, Fuchs S, Krasemann E, Doerr HG, Wilichowski E, Hunneman DH, Hanefeld F (1996) Cerebral adrenoleukodystrophy (ALD) in only one of monozygotic twins with an identical ALD genotype. Ann Neurol 40(2):254–257PubMedPubMedCentralCrossRefGoogle Scholar
  415. Korkmaz G, Holm M, Wiens T, Sanyal S (2014) Comprehensive analysis of stop codon usage in bacteria and its correlation with release factor abundance. J Biol Chem 289(44):30334–30342PubMedPubMedCentralCrossRefGoogle Scholar
  416. Kornblihtt AR (2005) Promoter usage and alternative splicing. Curr Opin Cell Biol 17(3):262–268PubMedPubMedCentralCrossRefGoogle Scholar
  417. Kozak M (1978) How do eucaryotic ribosomes select initiation regions in messenger RNA? Cell 15(4):1109–1123PubMedPubMedCentralCrossRefGoogle Scholar
  418. Kozak M (1980a) Evaluation of the “scanning model” for initiation of protein synthesis in eucaryotes. Cell 22(1 Pt 1):7–8PubMedPubMedCentralCrossRefGoogle Scholar
  419. Kozak M (1980b) Influence of mRNA secondary structure on binding and migration of 40S ribosomal subunits. Cell 19(1):79–90PubMedPubMedCentralCrossRefGoogle Scholar
  420. Kozak M (1981) Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res 9(20):5233–5252PubMedPubMedCentralCrossRefGoogle Scholar
  421. Kozak M (1986) Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44(2):283–292PubMedPubMedCentralCrossRefGoogle Scholar
  422. Kozak M (1991) Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr 1(2):117–125PubMedGoogle Scholar
  423. Kozak M (1997) Recognition of AUG and alternative initiator codons is augmented by G in position +4 but is not generally affected by the nucleotides in positions +5 and +6. EMBO J 16(9):2482–2492PubMedPubMedCentralCrossRefGoogle Scholar
  424. Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes. Gene 234(2):187–208CrossRefPubMedGoogle Scholar
  425. Kozak M (2005) A second look at cellular mRNA sequences said to function as internal ribosome entry sites. Nucleic Acids Res 33(20):6593–6602PubMedPubMedCentralCrossRefGoogle Scholar
  426. Kozak M (2007) Some thoughts about translational regulation: forward and backward glances. J Cell Biochem 102(2):280–290PubMedPubMedCentralCrossRefGoogle Scholar
  427. Krasemann EW, Meier V, Korenke GC, Hunneman DH, Hanefeld F (1996) Identification of mutations in the ALD-gene of 20 families with adrenoleukodystrophy/adrenomyeloneuropathy. Hum Genet 97(2):194–197PubMedPubMedCentralCrossRefGoogle Scholar
  428. Kreutzer DA, Essigmann JM (1998) Oxidized, deaminated cytosines are a source of C --> T transitions in vivo. Proc Natl Acad Sci U S A 95(7):3578–3582PubMedPubMedCentralCrossRefGoogle Scholar
  429. Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22(22):4768–4778PubMedPubMedCentralCrossRefGoogle Scholar
  430. Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in escherichia coli. Science 324(5924):255–258PubMedPubMedCentralCrossRefGoogle Scholar
  431. Kullback S (1959) Information theory and statistics. Wiley, New YorkGoogle Scholar
  432. Kullback S (1987) The Kullback-Leibler distance. Am Stat 41:340–341Google Scholar
  433. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86CrossRefGoogle Scholar
  434. Kumar S, Filipski A (2007) Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res 17(2):127–135PubMedPubMedCentralCrossRefGoogle Scholar
  435. Kumar KK, Shelokar PS (2008) An SVM method using evolutionary information for the identification of allergenic proteins. Bioinformation 2(6):253–256PubMedPubMedCentralCrossRefGoogle Scholar
  436. Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874PubMedPubMedCentralCrossRefGoogle Scholar
  437. Kungulovski G, Jeltsch A (2016) Epigenome editing: state of the art, concepts, and perspectives. Trends Genet 32(2):101–113PubMedPubMedCentralCrossRefGoogle Scholar
  438. Kurland CG (1987) Strategies for efficiency and accuracy in gene expression. Trends Biochem Sci 12:126CrossRefGoogle Scholar
  439. Kutlar A (2007) Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31(2):209–224CrossRefPubMedGoogle Scholar
  440. Kuznetsov IB, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64(1):19–27CrossRefPubMedGoogle Scholar
  441. Kypr J, Mrazek JAN (1987) Unusual codon usage of HIV. Nature 327(6117):20PubMedPubMedCentralCrossRefGoogle Scholar
  442. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132CrossRefPubMedPubMedCentralGoogle Scholar
  443. Lacerda R, Menezes J, Romao L (2016) More than just scanning: the importance of cap-independent mRNA translation initiation for cellular stress response and cancer. Cell Mol Life Sci 74(9):1659–1680PubMedPubMedCentralCrossRefGoogle Scholar
  444. Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nat Biotechnol 227:680–685Google Scholar
  445. Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A 91:1455–1459PubMedPubMedCentralCrossRefGoogle Scholar
  446. Lamendola DE, Duan Z, Yusuf RZ, Seiden MV (2003) Molecular description of evolving paclitaxel resistance in the SKOV-3 human ovarian carcinoma cell line. Cancer Res 63(9):2200–2205PubMedGoogle Scholar
  447. Lamond AI (1988) RNA editing and the mysterious undercover genes of trypanosomatid mitochondria. Trends Biochem Sci 13(8):283–284CrossRefPubMedGoogle Scholar
  448. Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20(1):86–93PubMedPubMedCentralCrossRefGoogle Scholar
  449. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921PubMedPubMedCentralCrossRefGoogle Scholar
  450. Lang BF, Burger G, O’Kelly CJ, Cedergren R, Golding GB, Lemieux C, Sankoff D, Turmel M, Gray MW (1997) An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387(6632):493–497PubMedPubMedCentralCrossRefGoogle Scholar
  451. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359PubMedPubMedCentralCrossRefGoogle Scholar
  452. Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009a) Searching for SNPs with cloud computing. Genome Biol 10(11):R134PubMedPubMedCentralCrossRefGoogle Scholar
  453. Langmead B, Trapnell C, Pop M, Salzberg SL (2009b) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25PubMedPubMedCentralCrossRefGoogle Scholar
  454. Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11(8):R83PubMedPubMedCentralCrossRefGoogle Scholar
  455. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214PubMedPubMedCentralCrossRefGoogle Scholar
  456. Lee C, Wang Q (2005) Bioinformatics analysis of alternative splicing. Brief Bioinform 6(1):23–33PubMedPubMedCentralCrossRefGoogle Scholar
  457. Leinonen R, Sugawara H, Shumway M (2011) The sequence read archive. Nucleic Acids Res 39(Database):D19–D21PubMedPubMedCentralCrossRefGoogle Scholar
  458. Lemay DG, Hwang DH (2006) Genome-wide identification of peroxisome proliferator response elements using integrated computational genomics. J Lipid Res 47(7):1583–1587CrossRefPubMedGoogle Scholar
  459. Lesk AM (2004) Introduction to protein science: architecture, function and genomics. Oxford University Press, New YorkGoogle Scholar
  460. Li CC (1976) First course in population genetics. The Boxwood Press, Pacific GroveGoogle Scholar
  461. Li W-H (1983) Evolution of duplicate genes and pseudogenes. Sinauer, SunderlandGoogle Scholar
  462. Li W-H (1997) Molecular evolution. Sinauer, SunderlandGoogle Scholar
  463. Li X, Chang YH (1995) Amino-terminal protein processing in Saccharomyces cerevisiae is an essential function that requires two distinct methionine aminopeptidases. Proc Natl Acad Sci U S A 92(26):12357–12361PubMedPubMedCentralCrossRefGoogle Scholar
  464. Li GL, Leong TY (2005) Feature selection for the prediction of translation initiation sites. Genomics Proteomics Bioinformatics 3(2):73–83PubMedPubMedCentralCrossRefGoogle Scholar
  465. Li W-H, Tanimura M (1987) The molecular clock runs more slowly in man than in apes and monkeys. Nature 326:93–96PubMedPubMedCentralCrossRefGoogle Scholar
  466. Li WH, Wu CI (1987) Rates of nucleotide substitution are evidently higher in rodents than in man. Mol Biol Evol 4(1):74–82PubMedPubMedCentralGoogle Scholar
  467. Li WH, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292(5820):237–239PubMedPubMedCentralCrossRefGoogle Scholar
  468. Li W-H, Wolfe KH, Sourdis J, Sharp PM (1987) Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. Cold Spring Harb Symp Quant Biol 52:847–856PubMedPubMedCentralCrossRefGoogle Scholar
  469. Li F, Ge P, Hui WH, Atanasov I, Rogers K, Guo Q, Osato D, Falick AM, Zhou ZH, Simpson L (2009) Structure of the core editing complex (L-complex) involved in uridine insertion/deletion RNA editing in trypanosomatid mitochondria. Proc Natl Acad Sci U S A 106(30):12306–12310PubMedPubMedCentralCrossRefGoogle Scholar
  470. Liang KC, Wang X, Anastassiou D (2008) A profile-based deterministic sequential Monte Carlo algorithm for motif discovery. Bioinformatics 24(1):46–55CrossRefPubMedGoogle Scholar
  471. Liberman N, Gandin V, Svitkin YV, David M, Virgili G, Jaramillo M, Holcik M, Nagar B, Kimchi A, Sonenberg N (2015) DAP5 associates with eIF2beta and eIF4AI to promote Internal Ribosome Entry Site driven translation. Nucleic Acids Res 43(7):3764–3775PubMedPubMedCentralCrossRefGoogle Scholar
  472. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293PubMedPubMedCentralCrossRefGoogle Scholar
  473. Liebler DC, TBDC L III., fb JRY, Publisher : c (2002) Introduction to proteomics: tools for the new biology. Humana Press, TotowaGoogle Scholar
  474. Liljenstrom H, von Heijne G (1987) Translation rate modification by preferential codon usage: intragenic position effects. J Theor Biol 124(1):43–55PubMedPubMedCentralCrossRefGoogle Scholar
  475. Lim VI (1994) Analysis of action of wobble nucleoside modifications on codon-anticodon pairing within the ribosome. J Mol Biol 240(1):8–19PubMedPubMedCentralCrossRefGoogle Scholar
  476. Lin JP, Aker M, Sitney KC, Mortimer RK (1986) First position wobble in codon-anticodon pairing: amber suppression by a yeast glutamine tRNA. Gene 49(3):383–388PubMedPubMedCentralCrossRefGoogle Scholar
  477. Lin HC, Tsai K, Chang BL, Liu J, Young M, Hsu W, Louie S, Nicholas HB Jr, Rosenquist GL (2003) Prediction of tyrosine sulfation sites in animal viruses. Biochem Biophys Res Commun 312(4):1154–1158CrossRefPubMedGoogle Scholar
  478. Lin GN, Cai Z, Lin G, Chakraborty S, Xu D (2009) ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinform 10(Suppl 1):S5CrossRefGoogle Scholar
  479. Lindahl T (1993) Instability and decay of the primary structure of DNA. Nature 362:709–715PubMedPubMedCentralCrossRefGoogle Scholar
  480. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441CrossRefPubMedGoogle Scholar
  481. Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci U S A 86(12):4412–4415PubMedPubMedCentralCrossRefGoogle Scholar
  482. Lipscombe D (2005) Neuronal proteins custom designed by alternative splicing. Curr Opin Neurobiol 15(3):358–363PubMedPubMedCentralCrossRefGoogle Scholar
  483. Lithwick G, Margalit H (2005) Relative predicted protein levels of functionally associated proteins are conserved across organisms. Nucleic Acids Res 33(3):1051–1057PubMedPubMedCentralCrossRefGoogle Scholar
  484. Liu J, Louie S, Hsu W, Yu KM, Nicholas HB Jr, Rosenquist GL (2008) Tyrosine sulfation is prevalent in human chemokine receptors important in lung disease. Am J Respir Cell Mol Biol 38(6):738–743PubMedPubMedCentralCrossRefGoogle Scholar
  485. Liu X, Jiang H, Gu Z, Roberts JW (2013) High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A 110(29):11928–11933PubMedPubMedCentralCrossRefGoogle Scholar
  486. Livesey R (2002) Have microarrays failed to deliver for developmental biology? Genome Biol 3(9):comment2009CrossRefGoogle Scholar
  487. Lobry JR (1996) Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol 13(5):660–665PubMedPubMedCentralCrossRefGoogle Scholar
  488. Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612PubMedPubMedCentralGoogle Scholar
  489. Lodish HF, Nathan DG (1972) Regulation of hemoglobin synthesis. Preferential inhibition of and globin synthesis. J Biol Chem 247(23):7822–7829PubMedPubMedCentralGoogle Scholar
  490. Lopez P, Philippe H, Myllykallio H, Forterre P (1999) Identification of putative chromosomal origins of replication in Archaea. Mol Microbiol 32(4):883–886PubMedPubMedCentralCrossRefGoogle Scholar
  491. Lowry JA, Atchley WR (2000) Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J Mol Evol 50(2):103–115CrossRefPubMedGoogle Scholar
  492. Lu C, Bablanian R (1996) Characterization of small nontranslated polyadenylylated RNAs in vaccinia virus-infected cells. Proc Natl Acad Sci U S A 93(5):2037–2042PubMedPubMedCentralCrossRefGoogle Scholar
  493. Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 18(2):298–309PubMedPubMedCentralCrossRefGoogle Scholar
  494. Lustig F, Boren T, Guindy YS, Elias P, Samuelsson T, Gehrke CW, Kuo KC, Lagerkvist U (1989) Codon discrimination and anticodon structural context. Proc Natl Acad Sci U S A 86(18):6873–6877PubMedPubMedCentralCrossRefGoogle Scholar
  495. Ma B, Nussinov R (2004) Release factors eRF1 and RF2: a universal mechanism controls the large conformational changes. J Biol Chem 279(51):53875–53885PubMedPubMedCentralCrossRefGoogle Scholar
  496. Ma P, Xia X (2011) Factors affecting splicing strength of yeast genes. Comp Funct Genomics:Article ID 212146, 13 pagesGoogle Scholar
  497. Ma S, Musa T, Bag J (2006) Reduced stability of mitogen-activated protein kinase kinase-2 mRNA and phosphorylation of poly(A)-binding protein (PABP) in cells overexpressing PABP. J Biol Chem 281(6):3145–3156PubMedPubMedCentralCrossRefGoogle Scholar
  498. MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R et al (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics 3(5):478–489CrossRefPubMedGoogle Scholar
  499. Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry GA (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene 15(9):1079–1085CrossRefPubMedGoogle Scholar
  500. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458(7234):97–101PubMedPubMedCentralCrossRefGoogle Scholar
  501. Mannella CA, Neuwald AF, Lawrence CE (1996) Detection of likely transmembrane beta strand regions in sequences of mitochondrial pore proteins using the Gibbs sampler. J Bioenerg Biomembr 28(2):163–169CrossRefPubMedGoogle Scholar
  502. Marck C, Grosjean H (2002) tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 8(10):1189–1232PubMedPubMedCentralCrossRefGoogle Scholar
  503. Marin A, Xia X (2008) GC skew in protein-coding genes between the leading and lagging strands in bacterial genomes: new substitution models incorporating strand bias. J Theor Biol 253(3):508–513PubMedPubMedCentralCrossRefGoogle Scholar
  504. Martinez MA, Vartanian J-P, Simon W-H (1994) Hypermutagenesis of RNA using human immunodeficiency virus type 1 reverse transcriptase and biased dNTP concentrations. Proc Natl Acad Sci U S A 91(25):11787–11791PubMedPubMedCentralCrossRefGoogle Scholar
  505. Matin A, Zychlinsky E, Keyhan M, Sachs G (1996) Capacity of Helicobacter pylori to generate ionic gradients at low pH is similar to that of bacteria which grow under strongly acidic conditions. Infect Immun 64(4):1434–1436PubMedPubMedCentralGoogle Scholar
  506. McNulty DE, Claffee BA, Huddleston MJ, Porter ML, Cavnar KM, Kane JF (2003) Mistranslational errors associated with the rare arginine codon CGG in Escherichia coli. Protein Expr Purif 27(2):365–374CrossRefPubMedGoogle Scholar
  507. McPherson DT (1988) Codon preference reflects mistranslational constraints: a proposal. Nucleic Acids Res 16(9):4111–4120PubMedPubMedCentralCrossRefGoogle Scholar
  508. Medawar PB, Medawar JS (1983) Aristotle to zoos: a philosophical dictionary of biology. Harvard University Press, Cambridge, MAGoogle Scholar
  509. Meinnel T, Mechulam Y, Blanquet S (1993) Methionine as translation start signal: a review of the enzymes of the pathway in Escherichia coli. Biochimie 75(12):1061–1075PubMedPubMedCentralCrossRefGoogle Scholar
  510. Melo EO, de Melo Neto OP, Martins de Sa C (2003a) Adenosine-rich elements present in the 5′-untranslated region of PABP mRNA can selectively reduce the abundance and translation of CAT mRNAs in vivo. FEBS Lett 546(2–3):329–334PubMedPubMedCentralCrossRefGoogle Scholar
  511. Melo EO, Dhalia R, Martins de Sa C, Standart N, de Melo Neto OP (2003b) Identification of a C-terminal poly(A)-binding protein (PABP)-PABP interaction domain: role in cooperative binding to poly (A) and efficient cap distal translational repression. J Biol Chem 278(47):46357–46368PubMedPubMedCentralCrossRefGoogle Scholar
  512. Menaker RJ, Sharaf AA, Jones NL (2004) Helicobacter pylori infection and gastric cancer: host, bug, environment, or all three? Curr Gastroenterol Rep 6(6):429–435PubMedPubMedCentralCrossRefGoogle Scholar
  513. Mendz GL, Hazell SL (1996) The urea cycle of Helicobacter pylori. Microbiology 142(Pt 10):2959–2967PubMedPubMedCentralCrossRefGoogle Scholar
  514. Meng SY, Hui JO, Haniu M, Tsai LB (1995) Analysis of translational termination of recombinant human methionyl-neurotrophin 3 in Escherichia coli. Biochem Biophys Res Commun 211(1):40–48PubMedPubMedCentralCrossRefGoogle Scholar
  515. Metropolis N (1987) The beginnning of the Monte Carlo method. Los Alamos Sci 15(Special issue):125–130Google Scholar
  516. Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32(2):776–783PubMedPubMedCentralCrossRefGoogle Scholar
  517. Miller JH, Albertini AM (1983) Effects of surrounding sequence on the suppression of nonsense codons. J Mol Biol 164(1):59–71PubMedPubMedCentralCrossRefGoogle Scholar
  518. Miller CG, Kukral AM, Miller JL, Movva NR (1989) pepM is an essential gene in Salmonella typhimurium. J Bacteriol 171(9):5215–5217PubMedPubMedCentralCrossRefGoogle Scholar
  519. Milman G, Goldstein J, Scolnick E, Caskey T (1969) Peptide chain termination. 3. Stimulation of in vitro termination. Proc Natl Acad Sci U S A 63(1):183–190PubMedPubMedCentralCrossRefGoogle Scholar
  520. Min Jou W, Haegeman G, Ysebaert M, Fiers W (1972) Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature 237(5350):82–88PubMedPubMedCentralCrossRefGoogle Scholar
  521. Minakshi R, Padhan K, Rani M, Khan N, Ahmad F, Jameel S (2009) The SARS coronavirus 3a protein causes endoplasmic reticulum stress and induces ligand-independent downregulation of the type 1 interferon receptor. PLoS One 4(12):e8342PubMedPubMedCentralCrossRefGoogle Scholar
  522. Mine T, Muraoka H, Saika T, Kobayashi I (2005) Characteristics of a clinical isolate of urease-negative Helicobacter pylori and its ability to induce gastric ulcers in Mongolian gerbils. Helicobacter 10(2):125–131PubMedPubMedCentralCrossRefGoogle Scholar
  523. Mitra SK, Lustig F, Akesson B, Lagerkvist U (1977) Codon-acticodon recognition in the valine codon family. J Biol Chem 252(2):471–478PubMedPubMedCentralGoogle Scholar
  524. Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T (2006) A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci 103(47):17846–17851PubMedPubMedCentralCrossRefGoogle Scholar
  525. Miyata T, Yasunaga T (1980) Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol 16(1):23–36PubMedPubMedCentralCrossRefGoogle Scholar
  526. Miyata T, Miyazawa S, Yasunaga T (1979) Two types of amino acid substitutions in protein evolution. J Mol Evol 12(3):219–236CrossRefPubMedGoogle Scholar
  527. Mlera L, Lam J, Offerdahl DK, Martens C, Sturdevant D, Turner CV, Porcella SF, Bloom ME (2016) Transcriptome analysis reveals a signature profile for tick-borne Flavivirus persistence in HEK 293T cells. MBio 7(3):e00314–e00316PubMedPubMedCentralCrossRefGoogle Scholar
  528. Mobley HL, Hu LT, Foxal PA (1991) Helicobacter pylori urease: properties and role in pathogenesis. Scand J Gastroenterol 187(Supplement):39–46CrossRefGoogle Scholar
  529. Moerschell RP, Hosokawa Y, Tsunasawa S, Sherman F (1990) The specificities of yeast methionine aminopeptidase and acetylation of amino-terminal methionine in vivo. Processing of altered iso-1-cytochromes c created by oligonucleotide transformation. J Biol Chem 265(32):19638–19643PubMedGoogle Scholar
  530. Moffat JG, Rudolph J, Bailey D (2014) Phenotypic screening in cancer drug discovery – past, present and future. Nat Rev Drug Discov 13(8):588–602CrossRefPubMedGoogle Scholar
  531. Moi P, Loudianos G, Lavinha J, Murru S, Cossu P, Casu R, Oggiano L, Longinotti M, Cao A, Pirastu M (1992) Delta-thalassemia due to a mutation in an erythroid-specific binding protein sequence 3′ to the delta-globin gene. Blood 79(2):512–516PubMedGoogle Scholar
  532. Monteiro PT, Mendes ND, Teixeira MC, d’Orey S, Tenreiro S, Mira NP, Pais H, Francisco AP, Carvalho AM, Lourenco AB et al (2008) YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res 36(Database issue):D132–D136PubMedGoogle Scholar
  533. Mora L, Heurgue-Hamard V, Champ S, Ehrenberg M, Kisselev LL, Buckingham RH (2003) The essential role of the invariant GGQ motif in the function and stability in vivo of bacterial release factors RF1 and RF2. Mol Microbiol 47(1):267–275PubMedPubMedCentralCrossRefGoogle Scholar
  534. Mora L, Heurgue-Hamard V, de Zamaroczy M, Kervestin S, Buckingham RH (2007) Methylation of bacterial release factors RF1 and RF2 is required for normal translation termination in vivo. J Biol Chem 282(49):35638–35645PubMedPubMedCentralCrossRefGoogle Scholar
  535. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M (2008a) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 45(1):81–94PubMedPubMedCentralCrossRefGoogle Scholar
  536. Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, Zhao Y, McDonald H, Zeng T, Hirst M et al (2008b) Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res 18(4):610–621PubMedPubMedCentralCrossRefGoogle Scholar
  537. Morita M, Shimozawa N, Kashiwayama Y, Suzuki Y, Imanaka T (2011) ABC subfamily D proteins and very long chain fatty acid metabolism as novel targets in adrenoleukodystrophy. Curr Drug Targets 12(5):694–706CrossRefPubMedGoogle Scholar
  538. Moriyama EN, Powell JR (1997) Codon usage bias and tRNA abundance in Drosophila. J Mol Evol 45(5):514–523PubMedPubMedCentralCrossRefGoogle Scholar
  539. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628CrossRefPubMedPubMedCentralGoogle Scholar
  540. Mottagui-Tabar S, Isaksson LA (1997) Only the last amino acids in the nascent peptide influence translation termination in Escherichia coli genes. FEBS Lett 414(1):165–170PubMedPubMedCentralCrossRefGoogle Scholar
  541. Moult J, Hubbard T, Fidelis K, Pedersen JT (1999) Critical assessment of methods of protein structure prediction (CASP): round III. Proteins 37(Suppl 3):2–6CrossRefGoogle Scholar
  542. Muller HJ, Altenburg E (1930) The frequency of translocations produced by X-rays in Drosophila. Genetics 15(4):283–311PubMedPubMedCentralGoogle Scholar
  543. Murphy J, Mahony J, Ainsworth S, Nauta A, van Sinderen D (2013) Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl Environ Microbiol 79(24):7547–7555PubMedPubMedCentralCrossRefGoogle Scholar
  544. Murtagh F (1984) Complexities of hierarchic clustering algorithms: state of the art. Comput Stat Q 1:101–113Google Scholar
  545. Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A 84:166–169PubMedPubMedCentralCrossRefGoogle Scholar
  546. Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics 156(1):297–304PubMedPubMedCentralGoogle Scholar
  547. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–36PubMedPubMedCentralCrossRefGoogle Scholar
  548. Nakamoto T (2006) A unified view of the initiation of protein synthesis. Biochem Biophys Res Commun 341(3):675–678PubMedPubMedCentralCrossRefGoogle Scholar
  549. Nakamura Y, Ito K, Matsumura K, Kawazu Y, Ebihara K (1995) Regulation of translation termination: conserved structural motifs in bacterial and eukaryotic polypeptide release factors. Biochem Cell Biol 73(11–12):1113–1122CrossRefPubMedGoogle Scholar
  550. Nakamura Y, Ito K, Isaksson LA (1996) Emerging understanding of translation termination. Cell 87(2):147–150PubMedPubMedCentralCrossRefGoogle Scholar
  551. Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28(1):292PubMedPubMedCentralCrossRefGoogle Scholar
  552. Nakashima H, Fukuchi S, Nishikawa K (2003) Compositional changes in RNA, DNA and proteins for bacterial adaptation to higher and lower temperatures. J Biochem (Tokyo) 133(4):507–513CrossRefGoogle Scholar
  553. Nasvall SJ, Chen P, Bjork GR (2007) The wobble hypothesis revisited: uridine-5-oxyacetic acid is critical for reading of G-ending codons. RNA 13(12):2151–2164PubMedPubMedCentralCrossRefGoogle Scholar
  554. Needleman SB, Wunsch CD (1970) A general method applicable to the search of similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRefPubMedGoogle Scholar
  555. Nei M (1996) Phylogenetic analysis in molecular evolutionary genetics. Annu Rev Genet 30:371–403PubMedPubMedCentralCrossRefGoogle Scholar
  556. Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, New YorkGoogle Scholar
  557. Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632PubMedPubMedCentralCrossRefGoogle Scholar
  558. Ngumbela KC, Ryan KP, Sivamurthy R, Brockman MA, Gandhi RT, Bhardwaj N, Kavanagh DG (2008) Quantitative effect of suboptimal codon usage on translational efficiency of mRNA encoding HIV-1 gag in intact T cells. PLoS One 3(6):e2356PubMedPubMedCentralCrossRefGoogle Scholar
  559. Nicholas HB Jr, Chan SS, Rosenquist GL (1999) Reevaluation of the determinants of tyrosine sulfation. Endocrine 11(3):285–292CrossRefPubMedGoogle Scholar
  560. Nichols T, Hayasaka S (2003) Controlling the familywise error rate in functional neuroimaging: a comparative review. Stat Meth Med Res 12(5):419–446CrossRefGoogle Scholar
  561. Nicolae M, Pathak S, Rajasekaran S (2015) LFQC: a lossless compression algorithm for FASTQ files. Bioinformatics 31(20):3276–3281PubMedPubMedCentralCrossRefGoogle Scholar
  562. Nishimura S, Takahashi S, Kuroha T, Suwabe N, Nagasawa T, Trainor C, Yamamoto M (2000) A GATA box in the GATA-1 gene hematopoietic enhancer is a critical element in the network of GATA factors and sites that regulate this gene. Mol Cell Biol 20(2):713–723PubMedPubMedCentralCrossRefGoogle Scholar
  563. Nissen P, Kjeldgaard M, Thirup S, Polekhina G, Reshetnikova L, Clark BF, Nyborg J (1995) Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog. Science 270(5241):1464–1472PubMedPubMedCentralCrossRefGoogle Scholar
  564. Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM (2008) Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med 359(24):2619–2620CrossRefPubMedGoogle Scholar
  565. Noedl H, Socheat D, Satimai W (2009) Artemisinin-resistant malaria in Asia. N Engl J Med 361(5):540–541CrossRefPubMedGoogle Scholar
  566. Noedl H, Se Y, Sriwichai S, Schaecher K, Teja-Isavadharm P, Smith B, Rutvisuttinunt W, Bethell D, Surasri S, Fukuda MM et al (2010) Artemisinin resistance in Cambodia: a clinical trial designed to address an emerging problem in Southeast Asia. Clin Infect Dis 51(11):e82–e89CrossRefPubMedGoogle Scholar
  567. Nomenclature Committee of the International Union of Biochemistry (1985) Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Eur J Biochem 150:1–5CrossRefGoogle Scholar
  568. Notredame C, O’Brien EA, Higgins DG (1997) RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Res 25(22):4570–4580PubMedPubMedCentralCrossRefGoogle Scholar
  569. Numanagic I, Bonfield JK, Hach F, Voges J, Ostermann J, Alberti C, Mattavelli M, Sahinalp SC (2016) Comparison of high-throughput sequencing data compression tools. Nat Methods 13(12):1005–1008PubMedPubMedCentralCrossRefGoogle Scholar
  570. Nur I, Szyf M, Razin A, Glaser G, Rottem S, Razin S (1985) Procaryotic and eucaryotic traits of DNA methylation in spiroplasmas (mycoplasmas). J Bacteriol 164(1):19–24PubMedPubMedCentralGoogle Scholar
  571. Nussinov R (1984) Doublet frequencies in evolutionary distinct groups. Nucleic Acids Res 12(3):1749–1763PubMedPubMedCentralCrossRefGoogle Scholar
  572. O’Brien JD, She ZS, Suchard MA (2008) Dating the time of viral subtype divergence. BMC Evol Biol 8:172PubMedPubMedCentralCrossRefGoogle Scholar
  573. Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641PubMedPubMedCentralCrossRefGoogle Scholar
  574. Ohta T, Gray TA, Rogan PK, Buiting K, Gabriel JM, Saitoh S, Muralidhar B, Bilienska B, Krajewska-Walasek M, Driscoll DJ et al (1999) Imprinting-mutation mechanisms in Prader-Willi syndrome. Am J Hum Genet 64(2):397–413PubMedPubMedCentralCrossRefGoogle Scholar
  575. Ordway JM, Fenster SD, Ruan H, Curran T (2005) A transcriptome map of cellular transformation by the fos oncogene. Mol Cancer 4(1):19PubMedPubMedCentralCrossRefGoogle Scholar
  576. Orkin SH (1990) Globin gene regulation and switching: circa 1990. Cell 63(4):665–672CrossRefGoogle Scholar
  577. Orkin SH (1992) GATA-binding transcription factors in hematopoietic cells. Blood 80(3):575–581PubMedGoogle Scholar
  578. Osawa S, Jukes TH, Muto A, Yamao F, Ohama T, Andachi Y (1987) Role of directional mutation pressure in the evolution of the eubacterial genetic code. Cold Spring Harb Symp Quant Biol 52:777–789PubMedPubMedCentralCrossRefGoogle Scholar
  579. Osterman IA, Evfratov SA, Sergiev PV, Dontsova OA (2013) Comparison of mRNA features affecting translation initiation and reinitiation. Nucleic Acids Res 41(1):474–486PubMedPubMedCentralCrossRefGoogle Scholar
  580. Ostrin EJ, Li Y, Hoffman K, Liu J, Wang K, Zhang L, Mardon G, Chen R (2006) Genome-wide identification of direct targets of the Drosophila retinal determination protein Eyeless. Genome Res 16(4):466–476PubMedPubMedCentralCrossRefGoogle Scholar
  581. Ota S, Li WH (2000) NJML: a hybrid algorithm for the neighbor-joining and maximum-likelihood methods. Mol Biol Evol 17(9):1401–1409PubMedPubMedCentralCrossRefGoogle Scholar
  582. Ota S, Li WH (2001) NJML+: an extension of the NJML method to handle protein sequence data and computer software implementation. Mol Biol Evol 18(11):1983–1992PubMedPubMedCentralCrossRefGoogle Scholar
  583. Otu HH, Sayood K (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16):2122–2130PubMedPubMedCentralCrossRefGoogle Scholar
  584. Palidwor GA, Perkins TJ, Xia X (2010) A general model of codon bias due to GC mutational bias. PLoS One 5(10):e13431PubMedPubMedCentralCrossRefGoogle Scholar
  585. Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W (2003) The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35(2):190–194PubMedPubMedCentralCrossRefGoogle Scholar
  586. Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, Komorowski J, Nagano T, Mancini-Dinardo D, Kanduri C (2008) Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32(2):232–246PubMedPubMedCentralCrossRefGoogle Scholar
  587. Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3(6):327–332PubMedPubMedCentralCrossRefGoogle Scholar
  588. Park SY, Cromie MJ, Lee EJ, Groisman EA (2010) A bacterial mRNA leader that employs different mechanisms to sense disparate intracellular signals. Cell 142(5):737–748PubMedPubMedCentralCrossRefGoogle Scholar
  589. Parker J (1989) Errors and alternatives in reading the universal genetic code. Microbiol Rev 53(3):273–298PubMedPubMedCentralGoogle Scholar
  590. Patel GP, Bag J (2006) IMP1 interacts with poly(A)-binding protein (PABP) and the autoregulatory translational control element of PABP-mRNA through the KH III-IV domain. FEBS J 273(24):5678–5690PubMedPubMedCentralCrossRefGoogle Scholar
  591. Patel GP, Ma S, Bag J (2005) The autoregulatory translational control element of poly(A)-binding protein mRNA forms a heteromeric ribonucleoprotein complex. Nucleic Acids Res 33(22):7074–7089PubMedPubMedCentralCrossRefGoogle Scholar
  592. Pauling L, Itano HA, Singer SJ, Wells IC (1949) Sickle cell anemia a molecular disease. Science 110(2865):543–548PubMedPubMedCentralCrossRefGoogle Scholar
  593. Pazin MJ, Kamakaka RT, Kadonaga JT (1994) ATP-dependent nucleosome reconfiguration and transcriptional activation from preassembled chromatin templates. Science 266(5193):2007–2011PubMedPubMedCentralCrossRefGoogle Scholar
  594. Pazin MJ, Sheridan PL, Cannon K, Cao Z, Keck JG, Kadonaga JT, Jones KA (1996) NF-kappa B-mediated chromatin reconfiguration and transcriptional activation of the HIV-1 enhancer in vitro. Genes Dev 10(1):37–49PubMedPubMedCentralCrossRefGoogle Scholar
  595. Pazin MJ, Hermann JW, Kadonaga JT (1998) Promoter structure and transcriptional activation with chromatin templates assembled in vitro. A single Gal4-VP16 dimer binds to chromatin or to DNA with comparable affinity. J Biol Chem 273(51):34653–34660CrossRefPubMedGoogle Scholar