Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
You have full access to this open access chapter, Download chapter PDF
1 Introduction
We will first learn a few key definitions and notations on tRNA, its anticodon, and codon families. We will then outline the conceptual framework of codon adaptation, mediated by mutation and selection. This brings us to indices of codon usage bias , their calculation and interpretations, and factors that may confound their interpretations. There are codon-specific indices such as relative synonymous codon usage (RSCU , Sharp et al. 1986) or gene-specific indices such as index of translation elongation (ITE , Xia 2015) and codon adaptation index (CAI, Sharp and Li 1987; Xia 2007c). All these indices are implemented in DAMBE (Xia 2013, 2017d).
ITE takes background mutation bias into consideration, while CAI does not. ITE is reduced to CAI if there is no background mutation bias. I will illustrate the applications of these indices in practical research. Keep in mind that a codon adaptation index is just one variable which will not be particularly interesting until you relate it to other variables and understand their relationships.
Two additional topics are dealt with close to the end of the chapter. The first involves how to discriminate between selection for translation efficiency and accuracy (Akashi 1994). The second is on the effect of amino acid usage on translation elongation efficiency. The general prediction concerning amino acid usage is that highly expressed proteins should maximize the use of amino acids that are abundant and energetically cheap (Akashi and Gojobori 2002) to make and have many tRNAs to carry them (Xia 1998a). The same argument has been used for transcription, i.e., an mRNA with many A nucleotides will be transcribed faster than one with many C nucleotides because A is in general far more abundant than C and it takes extra ATP to make CTP (Xia 1996; Xia et al. 2006).
1.1 Basic Notations, Definitions, and Abbreviations
Notations, definitions, and abbreviations are essential in science. We are lucky enough to have almost all of them unambiguous. If you were studying social sciences, you would have to come to define what is man and what is woman, and the debate on a proper definition will last forever, eventually with all debaters losing their mind and being called jerks.
1.1.1 tRNA Notation and Identification of tRNA Anticodon
The simplest notation of a tRNA is tRNAAA, where AA is a specific amino acid. For example, tRNAGly refers to all tRNAs that can be charged with amino acid glycine (Gly). A slightly more complicated notation is tRNAAA/AC, where AC refers to tRNA anticodon. For example, tRNAGly/GCC refers specifically to tRNAGly with a GCC anticodon. The general notation of a tRNA is AA2-tRNAAA1/AC, where AA1 is the amino acid the tRNA is supposed to carry, AA2 is the amino acid that is actually carried by the tRNA, and AC is the anticodon. In most cases, AA1 and AA2 are the same. However, there are two cases where AA1 and AA2 can be different. The first is modification of AA2 by a biochemist. The second occurs naturally in a number of species across all three domains of life (Sheppard et al. 2008; Yuan et al. 2008), where Gln-tRNAGln, Asn-tRNAAsn, Cys-tRNACys, and Sec-tRNASec are formed indirectly by two steps. Take Gln-tRNAGln and Asn-tRNAAsn, for example. Glu is first misacylated to tRNAGln, and Asp to tRNAAsn, to form Glu-tRNAGln and Asp-tRNAAsn, respectively. The resulting misacylated tRNAs are then converted to Gln-tRNAGln and Asn-tRNAAsn by a group of tRNA-dependent modifying enzyme.
Isoacceptor tRNA is a somewhat confusing term as it may carry two slightly different meanings. It could refer to a single tRNA decoding different synonymous codons, e.g., tRNAGly/GCC decoding GGC and GGU codons. Alternatively, it could refer to a set of different tRNAs that carry the same amino acid but decode different synonymous codons. For example, tRNAGly/GCC, tRNAGly/CCC, and tRNAGly/UCC are isoacceptor tRNA s. They all carry amino acid Gly but with different anticodons decoding different synonymous Gly codons. Different isoacceptor tRNAs could decode the same codon. For example, tRNAGly/CCC decodes GGG, but tRNAGly/UCC decodes both GGA and GGG, so GGG is decoded by both tRNAGly/CCC and tRNAGly/UCC. Thus, isoacceptor tRNA refers to (1) one tRNA decoding different synonymous codons or (2) a set of tRNAs that carry the same amino acid but decode different sets of synonymous codons. The intersection of different sets of synonymous codons may not be empty. For example, the set of codons decoded by tRNAGly/CCC is {GGG}, and the set of codons decoded by tRNAGly/UCC is {GGA, GGG}. The intersection of the two sets is {GGG}.
Related to isoacceptor tRNA is another potentially confusing concept, near-cognate tRNA , which is defined in two ways. The first is based on empirical evidence. If codon XYZ encoding amino acid AA1 can be misread by tRNA carrying amino acid AA2 (AA1 ≠ AA2), then that tRNA is a near-cognate tRNA for codon XYZ. The second definition is based on nucleotide similarity among codons. A codon XYZ has nine XYZ-like codons which differ from XYZ by a single nucleotide. Some of these XYZ-like codons are synonymous to XYZ and some not. The set of tRNAs that can decode any of those nonsynonymous XYZ-like codons are near-cognate tRNAs for codon XYZ because they can “potentially” misread codon XYZ. For example, tRNAAsp is a near-cognate for codons GAA and GAG because Asp is encoded by GAC and GAU which are GAA-like and GAG-like codons.
1.1.2 Genetic Code s and Associated Concepts and Definitions
It is through genetic code that the 64 codons are interpreted as encoding amino acids or translation stop. Nature is superfluous in her creation of genetic code. There are now 24 known genetic codes listed from 1 to 31 (Table 9.1). The standard genetic code is shown previously in Table 2.7.
Some codons do not change their meanings, e.g., Phe (UUY), Tyr (UAY), and Pro (CCN), whereas some others change their meaning frequently. Table 9.2 lists those codons with different meanings in different genetic codes. These codons tend to end with a purine, except for CUN. However, even within the CUR codon family, CUR codons are involved in recoding more often than CUY codons (Table 9.2).
We can build a distance tree from Table 9.2 by counting the pairwise number of reassignment events (i.e., when a codon for one amino acid is reassigned to a different amino acid or a stop codon). The only problem is how to treat reassignment between a sense codon and a stop. Such a change probably should occur less frequently than reassignments involving two sense codons. All pairwise comparisons among the 24 rows (24 genetic codes) generate 609 reassignments involving 2 sense codons and 445 reassignments between a sense codon and a stop codon. However, during the long evolutionary time, the more frequent reassignments will erase each other and the frequencies of their occurrences will be underestimated. So the actual difference between the two numbers must be much greater. If we count each reassignment between a sense codon and a stop codon as equivalent to four reassignments between two sense codons, we obtain a distance-based tree in Fig. 9.1. The topology remains the same if we treat each reassignment between a sense codon and a stop codon as equivalent to two, three, or five reassignments involving two sense codons.
Most bacteria use genetic code 11 which is the same as the standard code except for the difference in start codon usage. The wall-less bacteria including Mycoplasma and Spiroplasma use genetic code 4 which is identical to the mitochondrial genetic code used in a number of fungal lineages, red algae, and protozoa. The use of the same genetic code 4 by bacteria and mitochondria in eukaryotic lineages suggests two alternative hypotheses. First, it is convergence. Second, the ancestor of mitochondrial lineages in Cluster 3 (Fig. 9.1) is a Mycoplasma-like bacteria. This would imply multiple origin of mitochondrial lineages.
The main arguments for a single origin of mitochondria are (1) extensive phylogenetic reconstruction with rRNA sequences from diverse array of mitochondrial and bacterial lineages appears to recover mitochondrial lineages as a monophyletic taxon, with its closest phylogenetic relative being in Alphaproteobacteria lineages, especially Rickettsiales (Williams et al. 2007), and (2) all diverse mitochondrial genomes appear to represent reduced form of the mitochondrial genome from Reclinomonas americana (Lang et al. 1997). In particular, the closest phylogenetic relative for the mitochondrial genome from R. Americana among bacterial lineages is Ehrlichia muris strain AS145 within Rickettsiales. These lines of evidence, taken together, represent compelling evidence for the single-origin hypothesis of mitochondria.
Genetic codes also differ in start codons (Table 9.3). While AUG is used universally and dominantly as a start codon, other codons are used as well, although there has been no species in which a non-AUG codon is used as a start codon more frequently than AUG. For eukaryotic species where AUG is part of translation initiation signal such as in the Kozak consensus RxxAUGG, non-AUG codons are rarely used. In bacterial species where start codon is localized by pairing of Shine-Dalgarno (SD) sequences and anti-SD sequences, the requirement for AUG as a start codon is less stringent.
A synonymous codon family refers to all codons coding the same amino acids. For example, GGA, GGC, GGG, and GGU codons all code Gly and are collectively referred to as the Gly codon family or just Gly family. I may use “family” for “synonymous codon family” when there is no confusion. A codon family such as Gly family that differs only at the third codon position is a simple family. The Gly codon family is a simple family. In contrast, a codon family that differs not only at the third codon position but also at other codon positions is a compound codon family. For example, in standard genetic code, Leu is coded by UUR (where R stands for purine) and CUN (where N stands for any nucleotide) codons. Therefore, Leu codon family is a compound family. Other compound families in the standard code include Ser (coded by UCN and AGY, where Y stands for pyrimidine) and Arg (coded by CGN and AGR). Compound families are often divided into subfamilies. For example, the Ser family is broken into UCN subfamily and AGY subfamily.
The phenomenon that one amino acid may be encoded by multiple codons is called codon degeneracy. This gives rise to 4-fold, 3-fold, 2-fold, and 1-fold (0-fold is a misnomer) degenerate sites. An n-fold site is one that can be occupied by n different nucleotides without changing the meaning of the encoded amino acid. For example, the third site in the four Gly codons above is fourfold degenerate. In the standard code, AUA, AUC, and AUU all encode amino acid Met, so that the third codon site is threefold degenerate. AAA and AAG both encode amino acid Lys, so that the third codon site is twofold degenerate. We may also have a twofold degenerate site at the first codon site. For example, both CUA and UUA encode amino acid Leu, so the first codon site is twofold degenerate. The second codon site of Gly codons is onefold degenerate because replacing it by any other nucleotide will change the meaning of the encoded amino acid.
A synonymous mutation refers to the change of a codon by another synonymous codon. A nonsynonymous mutation refers to codon replacement involving amino acid replacement. A substitution is a mutation that has spread to all individuals in the population. Synonymous substitutions occur often, but nonsynonymous substitutions occur rarely.
Throughout text, we will abbreviate highly and lowly expressed genes as HEGs and LEGs . Unless specified otherwise, HEGs and LEGs in this chapter pertain to protein expression, not mRNA expression. One may rank all proteins according to experimentally measured abundance and take the top and bottom 1/3 as HEGs and LEGs, respectively. Non-HEGs are simply all genes from a genome that is not included in HEGs. Protein abundance values for most model species may be found in PaxDb (Wang et al. 2012).
1.2 Elongation Efficiency Depends on Amino Acid and Codon Usage
Many unicellular organisms, especially bacterial species, need to grow and replicate the cell rapidly in order not to be outcompeted by others. For example, an E. coli cell replicates once every 20 min with unlimited nutrients. To replicate a cell, not only the genome needs to be replicated, but a large amount of proteins have to be produced, with some proteins produced in nearly half a million copies in an E. coli cell. For such highly expressed proteins, it is very important for their coding genes to have efficient coding strategy to maximize the rate of translation. Translation involves three sub-processes, initiation, elongation, and termination. The previous chapter illustrates how natural selection can drive evolution toward more efficient translation initiation. This chapter addresses the question of how translation elongation can be improved through codon adaptation.
There are two obvious ways of increasing translation elongation efficiency for mass-produced proteins. The first is to optimize amino acid usage , i.e., to use energetically cheap and typically abundant amino acids as building blocks (Akashi and Gojobori 2002). The second is to maximize the usage of codons that match the anticodon of the most abundant cognate tRNA (Gouy and Gautier 1982; Ikemura 1992; Xia 1998a, 2005, 2009, 2015). For example, the amino acid glycine (Gly) can be coded by GGA, GGC, GGG, and GGU codons, but tRNAGly species that decode GGY codons are more abundant than tRNAGly species that decode GGR codons in E. coli cells. What codons should E. coli use to code glycine? Obviously natural selection should favor those that maximize the usage of GGY codons against GGR codons given the differential tRNA availability. However, selection and mutation may go in opposite directions, so any study of codon adaptation would be incomplete without considering both selection and mutation.
1.3 Empirical Illustration of Codon-Anticodon Adaptation
Ikemura’s pioneering works established the relationship between differential tRNA abundance and its effect on codon usage in rapidly replicating bacterial species and unicellular eukaryotes (Ikemura 1981a, b, 1982, 1992). Many studies have since demonstrated a strong relationship not only between codon adaptation and gene expression (Coghlan and Wolfe 2000; Comeron and Aguade 1998; Duret and Mouchiroud 1999; Gouy and Gautier 1982; Xia 2007c) but also between experimentally modified codon usage and protein production (Haas et al. 1996; Ngumbela et al. 2008; Robinson et al. 1984; Sorensen et al. 1989). These results have led to the explicit formulation of codon-anticodon coevolution and adaptation theory (e.g., Akashi 1994; Moriyama and Powell 1997; Ran and Higgs 2012; Xia 1998a, 2008) which states that (1) protein production is rate-limited by both translation initiation and elongation efficiency; (2) codon usage and tRNA anticodon coevolve to adapt to each other, resulting in increased production of correctly translated proteins; and (3) the increased elongation efficiency and accuracy represent the driving force for the HEGs to acquire a high degree of codon-anticodon adaptation.
1.3.1 Empirical Illustration of Codon-Anticodon Adaptation in Yeast
The baker’s yeast, Saccharomyces cerevisiae , replicates rapidly and is expected to use codons with many decoding tRNAs and avoid codons with few decoding tRNAs. The earliest association between tRNA and codon usage was empirically demonstrated by Ikemura (1981a, b, 1992). Tables 9.4 and 9.5 show the association between tRNA gene copy number (T in Tables 9.4 and 9.5) in the genome and codon usage in highly expressed yeast genes (F in Tables 9.4 and 9.5). T is a good proxy for tRNA abundance (Percudani et al. 1997).
The association between T and F is obvious in Tables 9.4 and 9.5. Take the two Arg codons AGA and AGG in Table 9.4, for example. There are 11 tRNAArg/UCU genes in the yeast genome that form perfect Watson-Crick base pair with AGA but only one tRNAArg/CCU with AGG. So we expect yeast genes, especially highly expressed ones, to use AGA and avoid AGG, which is true (Table 9.4). The same applies to all other synonymous codon families or subfamilies, except for the Cys codon family. Why the rarely used Cys codon family should be exceptional remains unknown. It is possible that Cys codon UGC may happen to be followed by a GNN codon, leading to methylation of C at the third codon position which then changes to T via spontaneous deamination. Whether the yeast genome has cytosine methylation remains controversial, with both evidence for (Tang et al. 2012) and against (Capuano et al. 2014) the existence of methylation in S. cerevisiae . However, there is significant CpG deficiency and TpG and CpA surplus in genome, which is consistent with CpG-specific DNA methylation.
One can obtain tables similar to Tables 9.4 and 9.5 by downloading the yeast genome from GenBank and then using DAMBE to compile the data in three steps. First, read the GenBank files for yeast chromosome sequences into DAMBE (Xia 2013, 2017d) to extract the coding sequences (CDSs) and tRNA genes. Second, compute ITE (Xia 2015) as a proxy of gene expression, and choose a subset of CDSs with highest ITE as HEGs . Third, use DAMBE to obtain codon usage of these HEGs. In this way, a table similar to Table 9.4 can be generated in minutes.
1.3.2 Codon Usage Changes When tRNA Abundance Changes
An evolutionary change in tRNA composition or relative abundance is expected to alter codon-anticodon adaptation. This is not controversial theoretically, but empirically difficult to demonstrate. However, recent studies (Xia 2012c; Xia et al. 2007) have documented that changes in tRNAMet genes (where Met is the amino acid carried by the tRNA) in animal mitochondrial DNA (mtDNA) are associated with changes in Met codon usage.
In mtDNA of most animal species, Met is coded by AUA and AUG codons. In some animal species, e.g., vertebrates, these two codons are translated by a single tRNAMet/CAU species (where CAU is the anticodon in the 5′ to 3′ orientation) with a modified C (i.e., f5C) at the first anticodon position (Grosjean et al. 2010) to allow C/A pairing. In other animal species, e.g., tunicates, an additional tRNAMet/UAU gene is present in the mtDNA. One would expect that, when tRNAMet/UAU is absent, Met should be preferably coded by AUG with a reduced AUA usage. The gain of tRNAMet/UAU would favor more Met to be coded by AUA.
In addition to tunicates, MtDNA in bivalve species also have two tRNAMet genes. In some bivalve species (e.g., Acanthocardia tuberculata, Crassostrea gigas, C. virginica, Hiatella arctica, Placopecten magellanicus, and Venerupis philippinarum), both tRNAMet genes have a CAU anticodon forming Watson-Crick base pair with codon AUG. In some other bivalve species (e.g., Mytilus edulis, Mytilus galloprovincialis, and Mytilus trossulus), one tRNAMet has a CAU anticodon, and the other has a UAU anticodon forming Watson-Crick base pair with the AUA codon. One would predict that the latter should be more likely to code Met by AUA than the former, i.e., the proportion of AUA codon within the AUR codon family, designated PAUA, should be greater in the latter with both a tRNAMet/CAU and a tRNAMet/UAU gene than in the former with tRNAMet/CAU gene only (Xia et al. 2007).
One complication in testing the prediction is that AUA usage will increase with genomic AT%. To control for this effect, one may use another A-ending codon, such as UUA as a reference. Thus, given the same PUUA (the proportion of UUA codon in the UUR codon family), PAUA in the three Mytilus mtDNA with both a tRNAMet/CAU and a tRNAMet/UAU gene should be higher than that in the six bivalve species without a tRNAMet/UAU gene. This is supported by empirical evidence (ANCOVA test, p = 0.0111, Fig. 9.2a). Thus, the presence of tRNAMet/UAU increases AUA usage significantly.
A similar comparison can be performed between the urochordates (tunicates, with both tRNAMet/CAU and tRNAMet/UAU genes in their mtDNA) and cephalochordates (lancelets, with only a tRNAMet/CAU gene in their mtDNA). Figure 9.2b shows that PAUA is much smaller in lancelets than in tunicates at the same PUUA level. Thus, AUA usage is consistently increased by the gain of a tRNAMet/UAU gene (or consistently decreased by the loss of a tRNAMet/UAU gene) in animal mtDNA.
A gain of a tRNAMet/UAU gene is also associated with a surplus of AUG→AUA substitutions in animal mitochondrial coding sequences (results not shown). Similar associations can also be observed with other gain/loss of tRNA genes in animal mitochondrial. In contrast, a gain/loss of tRNA genes in plant mtDNA appears to have little effect on nucleotide substitutions or codon usage, presumably because such gain/loss events do not significantly alter the tRNA pool in plant cells where nuclear tRNAs are mass-imported into plant mitochondria.
1.4 Effect of Biased Mutation on Codon Usage and Some Misconceptions
Biased mutation has long been known to affect codon usage (Muto and Osawa 1987; Sueoka 1964; Xia and Yuen 2005; Xia et al. 2002). The third codon position is the most amenable to mutation bias (Fig. 9.4) because most nucleotide substitutions at the third codon position are synonymous. Nucleotide substitutions are synonymous at some first codon positions but nonsynonymous at all second codon position. Furthermore, all nucleotide substitutions at the second codon positions typically involve rather different amino acids and therefore should be subject to strong purifying selection (Xia 1998b; Xia and Li 1998). One therefore would predict that the third codon position should increase more rapidly with the genomic GC% than the first codon position which in turn should have its GC% increase more rapidly with the genomic GC% than the second codon position. The empirical results (Fig. 9.3) strongly support the prediction (Muto and Osawa 1987).
However, the pattern in Fig. 9.3, while consistent with the mutation hypothesis, has resulted in two misconceptions. First, the pattern shown by the third codon position is often interpreted to reflect mutation bias. This interpretation is incorrect because the third codon position is subject to selection by differential availability of tRNA species (Carullo and Xia 2008; Xia 1998a, 2005, 2008; Xia et al. 2007). We may contrast a GC-rich Streptomyces coelicolor and a GC-poor Mycoplasma capricolum as an illustrative example. M. capricolum has no tRNA with a C or G at the wobble site for fourfold codon families (Ala, Gly, Pro, Thr, and Val), i.e., the translation machinery would be inefficient in translating C-ending or G-ending codons. This implies selection in favor of A-ending or U-ending codons and will consequently reduce GC% at the third codon position. This most likely has contributed to the low GC% at the third codon position in M. capricolum. In contrast, most of the tRNA genes translating the five fourfold codon families in the GC-rich S. coelicolor have G or C at the wobble site, and should favor the use of C-ending or G-ending codons. This most likely has contributed to the high GC% at the third codon position in S. coelicolor. In these two cases, mutation bias and tRNA-mediated selection are in the same direction to drive up or down GC% at the third codon position. The same pattern is observed for twofold codon families. The most conspicuous one is the Gln codon family (CAA and CAG). There is only one tRNAGln gene in M. capricolum with a UUG anticodon favoring the CAA codon. In contrast, there are two tRNAGln in S. coelicolor, both with a CUG anticodon favoring the CAG codon. Thus, the high slope for the third codon position in Fig. 9.3 is at least partially attributable to the tRNA-mediated selection. Relative contribution of mutation and tRNA-mediated selection to codon usage has been evaluated in several recent studies (Carullo and Xia 2008; Xia 2005, 2008; Xia et al. 2007).
The second misconception arising from Fig. 9.3 is that the frequency of G-ending and C-ending codons will increase and A-ending and U-ending codons decrease, with genomic GC% or GC-biased mutation (Kliman and Bernal 2005). This is not generally true (Palidwor et al. 2010). Take the arginine codons, for example. Given the transition probability matrix for the six synonymous codons shown in Table 9.6, the equilibrium frequencies (π) for the six codons are
The three solutions correspond to the number of GC in the codon, with AGA having one, AGG, CGA and CGT having two, and CGC and CGG having three G or C. One may note that the G-ending codon AGG has the same equilibrium frequency as that of the A-ending CGA and the T-ending CGT. Thus, we should not expect A-ending or T-ending codons to always decrease or G-ending and C-ending codons always increase, with increasing genomic GC% or GC-biased mutation. In fact, according to the solutions in Eq. (9.1), πAGG, πCGA, and πCGT will first increase with k until k reaches \( \sqrt{2}/2 \) and will then decrease with k when k > \( \sqrt{2}/2 \) (Palidwor et al. 2010).
1.5 Two Hypotheses on Translation Elongation Efficiency
It is controversial as to what degree is protein production limited by translation elongation. Early theoretical considerations (Andersson and Kurland 1983; Bulmer 1990, 1991; Liljenstrom and von Heijne 1987) tend to favor the argument that translation elongation is not rate-limiting in protein production, but translation initiation is. This hypothesis does not deny the existence of codon adaptation, but it asserts that codon-anticodon adaptation and increased elongation efficiency are not related to protein production. Instead, the benefit of codon adaptation and increased elongation efficiency is to increase ribosomal availability for global translation. This hypothesis was explicitly formulated only recently and empirically tested (Kudla et al. 2009).
We thus have two alternative hypotheses attributing different benefits to codon-anticodon adaptation. The first assumes that protein production is rate-limited by both initiation and elongation and codon-anticodon adaptation would result in higher elongation efficiency and more efficient and accurate protein production, especially for HEGs . The second claims that protein production is rate-limited only by initiation efficiency but improved codon adaptation and consequently increased elongation efficiency have the benefit of increasing ribosomal availability for global translation.
How should we go about testing these two hypotheses? Note that the two hypotheses make different predictions about the relationship among three variables: (1) translation initiation efficiency, (2) translation elongation efficiency, and (3) protein production. Before we can test these two hypotheses, we need to understand how these variables can be measured. The previous chapter outlines a few factors contributing to translation initiation efficiency. Here we first learn a few indices of codon usage bias as a proxy for translation elongation efficiency and then include them in the test of the two hypotheses in the section illustrating the application of index of translation elongation (Xia 2015).
1.6 Wobble Hypothesis and Its Extensions
The wobble hypothesis is proposed to explain how a set of tRNA molecules can decode all sense codons which are much larger in number. The wobble-pairing rules are specified in Fig. 9.4, together with the numbering system used here for individual codon and anticodon sites that is more precise than, but different from, the conventional one. The original wobble hypothesis (Crick 1966), with its extended codon-anticodon base pairs (Fig. 9.4), played a crucial role in understanding the working of the translation machinery. It explains why tRNAIle/IAU, where I in IAU is inosine derived from A, is able to translate all three Ile codons (AUC, AUU, and, albeit inefficiently, AUA), why a tRNA with a GI can translate Y-ending codons (where Y stands for C or U), and why a tRNA with a UI can translate R-ending codons (where R stands for A or G). The hypothesis also explains the lack of AI in tRNA genes for decoding twofold Y-ending codon family because such a tRNA, when its AI is modified to II, would misread the near-cognate R-ending codons.
Wobble pairing reduces the number of tRNAs needed for translation and simplifies the translation machinery. As an example of parsimonious tRNA usage, the Y-ending codons, be they in twofold or fourfold codon families, are decoded by tRNAs with either a II or a GI, but never both. This rule is obeyed in all three kingdoms of life. Almost all fourfold codon families in Mycoplasma pulmonis (including the Ser UCN codon family and Leu CUN codon family) are decoded by a single tRNA species with a UI, except for the Thr ACN and Arg CGN codon families which are each decoded by two tRNA species, one with a UI and other with a GI. The most dramatic simplification of tRNome is observed in vertebrate mitochondria, e.g., vertebrate mitochondrial genomes which contain only 22 tRNA genes, with each tRNA species decoding a codon family. Instead of separate initiation tRNAiMet/CAU and elongation tRNAeMet/CAU present in all nuclear genomes, a single tRNAMet/CAU, with a modified CI, decodes both the initiation AUG codon and internal Met AUR codons. Each Y-ending codon family is decoded by a single tRNA species with a wobble GI and each R-ending codon family by a single tRNA with a wobble UI which is modified to prevent its pairing with U or C. All fourfold codon families are decoded by a tRNA with a wobble UI which is not modified.
Wobble pairing is not without cost as it often reduces translation efficiency and accuracy and is generally avoided (Xia 2008). For example, an II/A3 pair is bulky because it involves two purines (Fig. 9.4) in contrast to other base pairs which typically involve a large purine and a small pyrimidine. For this reason, Ile is rarely coded by AUA except for certain viruses with a strong A-biased mutation (van Weringh et al. 2011). Among a set of highly expressed genes in the yeast ( Saccharomyces cerevisiae ), AUA is not used at all (Table 9.5). Similarly, a tRNA with a UI can translate A-ending codons better than G-ending codons (Grosjean et al. 2010; Xia 2008). Most of the yeast tRNAArg have a UI, and only one AGG codon is found in contrast to 314 AGA codons in highly expressed yeast genes (Table 9.4). Yeast genomic data also suggest that a tRNA with a GI can translate C-ending codons better than U-ending codons. For example, the yeast tRNAAsn genes translating the Asn AAY codon family all have a GI. Among 219 Asn codons in highly expressed yeast genes, only 11 are AAU codons, suggesting strong selection against AAU codons in favor of AAC codons (Table 9.4). Note that the yeast genome is strongly AT-biased. If there is no selection against AAU codons, we would expect more AAU codons than AAC codons, which is contrary to the observed frequencies. However, the selection against GI/U3 pair is in general much weaker than that against UI/G3 pair. In fungal mitochondrial genomes, there is no avoidance of GI/U3 pair in favor of GI/C3 pair, although UI/G3 pair is strongly avoided in favor of UI/A3 pair (Xia 2008). The weak, or lack of, selection against GI/U3 can explain several puzzling counterexamples against the codon-anticodon adaptation theory (Bulmer 1991; Ikemura 1981b; Xia 1998a) which states that the most frequently used codon in each synonymous codon family should form Watson-Crick base paring with the anticodon of the most abundant tRNA species to reduce translation error and increase translation efficiency. For example, Cys codons (UGY) are translated by tRNACys/GCA in both cytoplasm and mitochondria in the yeast, yet most Cys codons have U3. If there is little selection against GI/U3 pair (i.e., GI/U3 pair is as efficient and accurate as GI/C3 pair), then the frequencies of UGC and UGU will be mostly determined by AT-bias. Because the yeast nuclear and mitochondrial genomes are both AT-rich, we have more UGU codons than UGC codons, in spite of GI in tRNACys. The weak selection against GI/U3 but strong selection against UI/G3 also explains why Y-ending codons are typically translated by a tRNA with a GI, whereas R-ending codons are typically translated by two different tRNAs, one with a UI and the other with a CI (Xia 2008).
The wobble hypothesis points to the necessity of nucleotide modification in tRNA to either increase or decrease the wobble versatility to improve accuracy and efficiency of translation. The observation that an unmodified UI can pair with all N3 in many mitochondrial genomes suggests that UI in tRNA for twofold R-ending codon families needs to be modified to restrict its wobble versatility to avoid misreading the near-cognate Y-ending codons. Chemical modification of UI to restrict its pair versatility to R3 in twofold R-ending codon family is universal in all three kingdoms of life and in organelles (Grosjean et al. 2010; Lim 1994). On the other hand, the tRNAMet/CAU in vertebrate mitochondria need to read both the initiation AUG codon and the internal AUG and AUA codons, and its CI is modified to f5CI to increase its wobble versatility so as to form a f5CI/A3 pairing between the anticodon and the AUA codon. Nucleotide modification in tRNA has been extensively reviewed (Grosjean et al. 2010) and chemically detailed in MODOMICS (Czerwoniec et al. 2009).
Wobble pairing implies the theoretical possibility of adding new base pairs of novel nucleotides to protein-coding genes to increase the coding capacity (Hirao and Kimoto 2010). A single novel base pair, involving two novel nucleotides, would increase the number of codons from 64 to 216 (=63), and one can then use these extra codons, together with engineered tRNAs to recognize these codons and to carry new amino acid analogs, to produce novel proteins.
The wobble hypothesis can be extended to explain the lack of UCG anticodon in Arg CGN codon family in a large number of evolutionary lineages. A tRNA species with a wobble UI is almost always present among tRNA species decoding fourfold codon families and twofold R-ending codon families, with most exceptions observed in the Arg CGN codon family. In the mitochondrial genomes of Caenorhabditis elegans (metazoan), Marchantia polymorpha (plant), Pichia canadensis (fungus), and Saccharomyces cerevisiae (fungus), there is no tRNAArg/UCG, and Arg CGN codon family is decoded by tRNAArg/ACG (Xia 2005). The lack of tRNAArg/UCG in the mitochondrial genome of these diverse taxa suggests that the lack is an ancestral state and that the presence of tRNAArg/UCG in vertebrate mitochondria is a derived state. This is substantiated by the fact that almost all eubacterial species, from which the mitochondrion was originally derived, lack tRNAArg/UCG (Grosjean et al. 2010).
The expanded wobble hypothesis for the lack of tRNAArg/UCG requires an extension of the wobble hypothesis by invoking wobble paring between the third anticodon site (NIII) and the first codon site (N1), conditional on a CII/G2 or GII/C2 with three hydrogen bonds. Thus, the anticodon UCG would wobble-pair with stop codon UGA through a wobble GIII/U1 pair and should therefore be strongly selected against (Carullo and Xia 2008). This explains not only the absence of tRNAArg/UCG in diverse evolutionary lineages but in particular why tRNAArg/UCG is absent in most eubacterial species and ancestral mitochondrial lineages where UGA is used as a stop codon and why it is present in derived mitochondrial lineages such as vertebrate mitochondrial genomes where UGA is no longer used as a stop codon.
2 Commonly Used Codon Usage Indices
There are two key factors contributing to codon usage bias : the mutation bias (Osawa et al. 1987) and the tRNA-mediated selection (Ikemura 1981a, 1982, 1992; Xia 1998a, 2015). There are also two types of codon usage indices, but they do not correspond to the two factors shaping codon usage. The first type of codon usage indices is codon-specific best represented by relative synonymous codon usage (RSCU , Sharp et al. 1986), which measures deviation of codon usage from equal usage. The second type of codon usage indices is gene-specific with several well-known representatives including codon adaptation index effective number of codons (ENC, Sun et al. 2013; Wright 1990), codon adaptation index (CAI, Sharp and Li 1987; Xia 2007c), codon bias index (CBI, Bennetzen and Hall 1982), frequency of optimal codons (Fop, Ikemura 1985), tRNA adaptation index (tAI, dos Reis et al. 2004), and index of translation elongation (ITE , Xia 2015).
ENC aims to measure deviation of codon usage from equal usage and may be considered as the gene-specific equivalent of the codon-specific RSCU . They are both descriptive and do not distinguish between mutation bias or tRNA-mediated selection in their contribution to codon usage bias . All other gene-specific indices aim to measure the intensity of the tRNA-mediated selection on codon usage bias. A gene encoding a mass-produced (highly expressed) protein is expected to be under stronger selection to optimize its codon usage corresponding to differential tRNA availability than a gene encoding lowly expressed protein, and we expect CAI , CBI, tAI, and ITE to be greater for the highly expressed gene than the lowly expressed gene. However, CAI, CBI, and tAI ignore background mutation bias. ITE is a generalization of CAI, by incorporating background mutation, and is reduced to CAI when there is no background mutation bias (Xia 2015).
Codon indices that aim to measure tRNA-mediated selection (i.e., CAI , CBI, Fop, tAI, and ITE ) all define a translationally optimal codon (TOC) within each codon family, and the codon usage index value will be the highest if all codons in a gene are TOCs. However, TOC is defined differently among these indices. CBI, Fop, and tRNA define a TOC mainly as one that corresponds to the most abundant isoacceptor tRNA , with CBI incorporating gene expression information as well. CAI defines a TOC as one in its codon family that is used most frequently in HEGs . ITE defines a TOC as one in its codon family that is used most frequently in HEGs after adjustment of mutation bias reflected in LEGs . Comparative studies (Coghlan and Wolfe 2000; Comeron and Aguade 1998) suggest that CAI is better than ENC, CBI, and Fop in predicting gene expression levels, tAI is better than CAI (dos Reis et al. 2004; Tuller et al. 2010), and ITE is better than CAI and tAI (Xia 2015). However, such comparison depends not only on the methods but also on the quality of the software that implements the methods. A good method could be conceptually sound but implemented erroneously and generate poor results. Moreover, the same index could be implemented differently. For example, one implementation could treat all synonymous codons into one family so that some codons could have six or even eight synonymous codons (trematode mitochondrial code has eight Ser codons: UCN and AGN), whereas another implementation would break all compound codon families, such as Leu, Ser, and Arg codon families, into separate fourfold and twofold codon families.
2.1 RSCU (Relative Synonymous Codon Usage)
RSCU measures codon usage bias for each codon within each codon family. It is essentially a normalized codon frequency so that the expectation is 1 when there is no codon usage bias. A codon is overused if its RSCU value is greater than 1 and underused if its RSCU value is less than 1. It is computed directly from input sequences.
2.1.1 Calculation of RSCU
The general equation for computing RSCU is
where i refers to a codon family and j to a specific codon within the family. For example, i may refer to the alanine codon family with four codons (GCU, GCC, GCA, and GCG) and j to a specific codon such as GCU. In this case, the numerator is the frequency of GCU, and the denominator is the summation of the four codon frequencies divided by the number of codons in the codon family, i.e., 4.
For biology students, it is always easier to learn by numerical examples. Suppose we counted the codon frequencies of one particular protein-coding sequence and have obtained the codon frequencies (Table 9.7). The RSCU for the GCU codon is computed, according to Eq. (9.2), as
which is displayed in Table 9.7. Biology students are recommended to cover up the last column in Table 9.7 and finish the computation of the rest of the RSCU values.
2.1.2 Illustration of RSCU Applications
As I mentioned earlier, a variable such as RSCU is often not interesting by itself, but it becomes more interesting when you relate the variable to some other variables. Figure 9.5 shows the correlation of RSCU for Escherichia coli genes and that for the E. coli double-stranded DNA (dsDNA) phage TLS. This strong and positive correlation suggests adaptation of host tRNA pool. This adaptation the phage genes and the host genes to the same tRNA pool in E. coli cells and the evolution of the very similar codon usage patterns is an example of convergent evolution, i.e., phylogenetically remote organisms evolving similar features not due to coancestry, but in response to the same selection regime induced by the same environment.
What explanation would you offer if we find little correlation in RSCU between a phage and its host? There are in fact a large number of cases in which a virus and its host share little similarity in codon usage. Will such cases invalidate our convergent evolution explanation for the strong and positive correlation between phage TLS and its E. coli host? Science thrives in questions, and such questions immediately drive us to search for answers, and the answers enrich our explanatory conceptual framework. Ronald Fisher once said that “No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed” (Fisher 1926).
There are at least six factors that will weaken the correlation in RSCU between a virus and its host. First, some dsDNA phages carry many tRNA genes of their own genome, and the transcription of these tRNA genes would modify the host tRNA pool. For example, another dsDNA E. coli phage, enterobacteria phage WV8, carries 20 tRNA genes on its genome. In such cases, the phage genes would adapt to the modified tRNA pool which may be different from the tRNA pool where E. coli mRNAs are translated normally (i.e., without phage infection). Partly for this reason, the correlation in RSCU between enterobacteria phage WV8 and its E. coli host is much weaker than that shown in Fig. 9.5 (Chithambaram et al. 2014a). Phage TLS (Fig. 9.5) happens to have a genome that does not encode any tRNA genes of its own. So it depends entirely on the host tRNA pool to decode the codons of its genes.
Second, codon usage adaptation takes time. If a phage having adapted to one host has switched to a new host, and if the original host and the new host differ in their tRNA pools, then the phage codon usage will be more similar to that of the original host than the new host. This may be applicable to phage PRD1 which belongs to the peculiar Tectiviridae family with members parasitizing both gram-negative and gram-positive bacteria. Phage PRD1 is the only species in the family known to parasitize gram-negative bacteria, with other members of the family, i.e., phages PR3, PR4, PR5, L17, and PR772, parasitizing gram-positive bacteria (Bamford et al. 1995; Grahn et al. 2006). It is reasonably safe to assume that the phage PRD1 lineage has switched host from gram-positive to gram-negative bacteria. Furthermore, there is only one amino acid difference in the coat protein between phages PRDl and PR4 (Bamford et al. 1995). This suggests that PRD1 is phylogenetically close to its relative parasitizing gram-positive, i.e., the host-switching may have occurred quite recently. In fact, codon usage in phage PRD1 is more similar to that in gram-positive bacteria than in gram-negative bacteria (Chithambaram et al. 2014b). Among 87 bacterial genomes covering major groups of bacterial species, the host species with codon usage most similar to that of phage PRD1 are strains in the gram-positive Geobacillus (NC_014206, NC _012793, NC_014650, NC_014915, NC_013411).
Third, a phage with a wide range of host species may imply diverse tRNA pools that would represent fluctuating selection with different optima. Phage PRD1 mentioned above does have a variety of gram-negative bacteria as hosts, including Salmonella, Pseudomonas, Escherichia, Proteus, Vibrio, Acinetobacter, and Serratia species (Bamford et al. 1995; Grahn et al. 2006). However, this diverse array of hosts actually have rather similar codon usage, so host variability is not a good explanation for the lack of similarity in codon usage between PRD1 and E. coli (Chithambaram et al. 2014b).
Fourth, the tRNA-mediated selection differs in its effectiveness between temperate phages (i.e., those with lysogeny) and virulent phages (i.e., those without lysogeny). The lysogenic phase effectively hides protein-coding genes of the phage from tRNA-mediated selection, and the phage codon usage will be at the mercy of mutation bias in the host genome. In contrast, virulent phages have their codon usage under tRNA-mediated selection every time they enter the host cell. For this reason, one would expect better codon usage adaptation in virulent phages than in temperate phages, which is true (Prabhakaran et al. 2015).
Fifth, mass translation of phage mRNA often occurs in the late infection phase when the host cellular environment has already been dramatically altered, presumably with a quite different tRNA pool in the late phase from that in the early phase. In vaccinia virus, the degradation of host mRNA appears nearly complete 6 h after the viral infection as no host poly(A) mRNA is detectable at/after this time (Katsafanas and Moss 2007). Shutdown or drastic alteration of host protein and RNA expression implies that many tRNA species are no longer sequestered for host translation, which would dramatically alter availability of different tRNA species. Many other viruses, including hepatitis C (Chan and Egan 2009), SARS (Minakshi et al. 2009), Japanese encephalitis virus (Su et al. 2002), and coxsackie B2 virus (Zhang et al. 2010), can induce stress responses such as the UPR (unfolded protein response) in late phase. URP often results in the shutdown of transcription of ribosomal RNAs as well as repression of translation via phosphorylation of eukaryotic translation initiation factor eIF-2α (DuRose et al. 2009). All these suggest that the tRNA pool in the late phase differs from that in the normal cell. If codon usage of phage genes adapts to the altered tRNA pool in the late phase, whereas that of host genes adapts to the tRNA pool and normal cells, then we should not expect the parasite and the host share high similarity in codon usage. Interestingly, HIV-1 early genes have RSCU positively correlated with RSCU of human genes, but HIV-1 late genes have RSCU values negatively correlated with RSCU of human genes (van Weringh et al. 2011).
Sixth, if mutation bias is in different direction from tRNA-mediated selection, e.g., if tRNA-mediated selection favors Y-ending codons whereas mutation bias favors R-ending codons (where Y and R stand for pyrimidine and purine, respectively), then strong mutation bias will disrupt selection. This may well be the case for the poor codon adaptation in HIV-1. According to a recent compilation of tRNAs in human genome (Chan and Lowe 2009), the AUC codon can be translated by 17 tRNAIle species (14 tRNAIle/IAU and 3 tRNAIle/GAU) and AUU can be translated by 14 tRNAIle/IAU species, whereas AUA can be translated by only 5 tRNAIle/UAU species. In agreement with the tRNA-mediated selection, human genes code Ile mostly by AUC and least by AUA. In contrast, HIV-1 genes code Ile mostly by AUA and least by AUC (Haas et al. 1996; Nakamura et al. 2000). The poor codon adaptation of HIV-1 (Fig. 9.6a) reduces the translation efficiency of HIV-1 genes. Modifying HIV-1 codon usage according to host codon usage has been shown to increase the production of viral proteins (Haas et al. 1996; Ngumbela et al. 2008). The high frequency of maladaptive AUA codons in HIV-1 genes is due to high A-biased mutation at the third codon position of HIV-1 genes (Jenkins and Holmes 2003). The A-bias is mediated by the error-prone reverse transcriptase (Martinez et al. 1994; Vartanian et al. 2002) and the human APOBEC3 protein (Yu et al. 2004). The frequency of A can reach up to 40% in some HIV-1 genomes (Vartanian et al. 2002), resulting in a preponderance of A-ending codons which are typically rarely used in the human HEGs (Kypr and Mrazek 1987; Sharp 1986).
One would predict a better correlation in RSCU between HIV-1 genes and highly expressed human genes. One viral species that may shed light on this prediction is HTLV-1 which infects the same type of host cell as HIV-1. Both HIV-1 and HTLV-1 are retroviruses with RNA genomes, but HTLV-1 is exceptional in that it does not have a strong A-biased mutation (Van Dooren et al. 2004; van Hemert and Berkhout 1995). HTLV-1 relies for the most part on the host polymerase to replicate through clonal expansion of infected cells rather than undergoing iterative replication cycles like HIV-1 (Strebel 2005). The substitution rate of HTLV-1 is consequently lower, about 5.2 × 10−6 substitutions/site/year (Hanada et al. 2004; Van Dooren et al. 2004), whereas that of HIV-1 is around 2.5 × 10−3 substitutions/site/year (Hanada et al. 2004). Thus, although HTLV-1 infects the same cells as HIV-1, i.e., human CD4+ T cells (Rimsky et al. 1988), and both viruses are therefore subject to the same selective pressures on codon usage by the host tRNA pool, mutations are less likely to disrupt codon-anticodon adaptation in HTLV-1 than in HIV-1 as they occur at a lower rate in the former. The positive correlation in RSCU between HTLV-1 and highly expressed human genes (Fig. 9.6b) is highly significant (Pearson r = 0.4982, p < 0.0001, Spearman r = 0.4688, p = 0.0002).
2.2 CAI (Codon Adaptation Index)
CAI has been used extensively in biological research. Other than its primary use for measuring the efficiency of translation elongation, it has contributed to the finding that functionally related genes are conserved in their expression across different microbial species (Lithwick and Margalit 2005), to the prediction of protein production (Futcher et al. 1999; Gygi et al. 1999), and to the optimization of DNA vaccines (Ruiz et al. 2006).
2.2.1 Calculation of CAI
While RSCU characterizes codon usage bias in each codon family, CAI quantifies the codon usage bias in one gene. It is based on (1) the codon frequencies of the gene and (2) the codon frequencies of a set of known HEGs (often referred to as the reference set). The reference set of genes is used to generate a column of w values computed as
where RefCodFreqij is the frequency of codon j in synonymous codon family i and RefCodFreqi.max is the maximum codon frequency in synonymous codon family i. For example, if the four alanine codons GCA, GCC, GCG, and GCU have frequencies 20, 4, 4, and 2, respectively, then their associated w value are 1, 0.2, 0.2, and 0.1, respectively. The codon whose frequency is RefCodFreqi.max is often referred to as the major codon (whose w is 1), and the other codons in the synonymous codon family are referred to as minor codons. The major codon is assumed to be the translationally optimal codon.
It is easy to see the relationship between wij and RSCU . The former is obtained by dividing each RSCU by the largest RSCU value within each codon family. With the w values for a particular species, we can now compute the CAI value of any protein-coding sequence from the species by using the following equation:
where n is the number of sense codons (excluding codon families with a single codon, e.g., AUG for methionine and UGG for tryptophan in the standard genetic code). Note that the exponent is simply a weighted average of ln(w). Because the maximum of w is 1, ln(w) will never be greater than 0. Consequently, the exponent will never be greater than 0. Thus, the maximum CAI value is 1. The minimum CAI depends on the w values for minor codons in each codon family. If the minor codons all have w values close to zero, then the minimum CAI will also be very close to zero.
The calculation of CAI is numerically illustrated in Table 9.8 for a gene whose observed codon frequency is in column ObsFreq (Table 9.8). The codon frequency of the highly expressed reference set is in column “RefCodFreq.” The column “w” is obtained by dividing RefCodFreq values by the largest value in the codon family. For example, the first w value in the table, 0.606, is obtained by dividing RefCodFreq value 195 by the largest RefCodFreq value in the alanine codon family, i.e., 322. We take a weight average of ln(w) as shown in Eq. (9.5) and then exponentiate it to obtain CAI.
The way w is calculated implies that, if a protein contains only methionine and tryptophan, both encoded by a single codon (AUG and UGG, respectively, in standard code), then the gene will have the highest CAI value of 1 because w values are 1 for such codons. Similarly, a gene with many AUG and UGG codons would have high CAI values even if it is not under any tRNA-mediated selection. For this reason, a good implementation of CAI should exclude single-member codon families from CAI calculation.
I have previously mentioned that codon usage indices such as CAI can be implemented differently with different classification of codon families, so gene A could have a higher CAI value than gene B from one software, but the opposite from another software. I wish to illustrate this so that the reader can better interpret their results.
In highly expressed yeast genes (e.g., compiled in the Eyeastcai.cut in EMBOSS distribution), CGU is by far the most frequent codon in the CGN (coding for arginine) codon family. The overuse of CGT and the avoidance of CGG, CGA, and CGC codons in highly expressed yeast genes make sense because the yeast genome contains six tRNAArg genes with anticodon ACG forming Watson-Crick base pairing with the CGT codon, but no other tRNAArg gene forming Watson-Crick base pairing with the other three CGN codons (the nucleotide A in anticodon ACG is modified to inosine but still pairs with U better than with other nucleotides). While this illustrates well the codon-anticodon adaptation, it causes practical problems with computing CAI .
Suppose we now use a sequence consisting entirely of CGU codons and expect the resulting CAI to be 1 by using the Eyeastcai.cut reference set. The resulting CAI value from the EMBOSS.cai program is 0.140 instead of 1. It turns out that amino acid arginine is coded by two codon subfamilies, the CGN codon family we have mentioned and the AGR codon family. The largest codon frequency among these six codons is 314 (for AGA codon) in Eyeastcai.cut. So the w value for CGT is not 1 (43/43) as we have thought but is only 0.1369 (= 43/314). For this reason, some CAI-calculating programs, e.g., DAMBE (Xia 2013, 2017d), may separate compound codon families such as the arginine family into two separate families, one twofold and one fourfold.
2.2.2 Illustration of CAI Applications
The most obvious application of CAI or related codon usage indices is to optimize codon usage to optimize protein expression. Many experiments have demonstrated increased protein production by optimizing codon usage and decreased protein production if codons are replaced by rarely used ones (Haas et al. 1996; Kaishima et al. 2016; Ngumbela et al. 2008; Robinson et al. 1984; Sorensen et al. 1989). There are claims that codon optimization does increase protein production (e.g., Kudla et al. 2009), but these claims were found to be due to wrong data analysis (Tuller et al. 2010; Xia 2015) and will be dealt with on a later section on ITE (Xia 2015). Below I list two less obvious applications of CAI.
2.2.2.1 Does High Mutation Rate Prevent HIV-1 Genes from Evolving Codon Adaptation?
I have mentioned in the section on RSCU that the lack of concordance in codon usage between HIV-1 and human genes was conventionally explained by high mutation rate in HIV-1, based on the observation that (1) HIV-1 genome is known to experience strongly A-biased mutations, (2) usage of A-ending codons in HIV-1 genes is particularly different from that of the host genes, and (3) HTLV-1 that parasitizes the same human CD4+ T cells but has reduced mutation rate does have codon usage similar to human genes (Fig. 9.6b). Thus, the lack of concordance in codon usage between HIV-1 and human genes is interpreted as poor codon adaptation caused by high mutation rate disrupting codon adaptation.
However, van Weringh et al. (2011) objected to this interpretation. They argued that the lack of concordance in codon usage between HIV-1 and human genes is not due to poor codon adaptation in the part of HIV-1 genes, but because HIV-1 genes, especially the late genes, have adapted to a tRNA pool that is fundamentally different from that in a normal human CD4+ T cell. What originally prompted them to formulate this hypothesis is the observation that CAI for HIV-1 early genes are significantly greater than CAI for HIV-1 late genes when highly expressed human genes are used as reference genes. These late genes encode mass-translated HIV-1 structural proteins and are typically expected to have higher CAI than the relatively lowly expressed early genes. So it is thus a surprise to see late genes having smaller CAI than early genes, unless the mass-translated late genes adapt to a tRNA pool different from the early genes.
van Weringh et al. (2011) investigated experimentally measured tRNA abundance in the human cell when the late HIV-1 genes are translated and HIV-1 virions are produced. The tRNA pool for the late genes is indeed different in the expected direction, supporting their hypothesis that the lack of concordance in codon usage between HIV-1 and human genes is not due to poor codon adaptation in HIV-1 genes but because HIV-1 genes, especially the late genes, have adapted to a tRNA pool different from the one with which highly expressed human genes are translated (van Weringh et al. 2011).
2.2.2.2 Detecting Horizontally Transferred Genes
CAI has also been used jointly with a reformulated effective number of codons (Nc, Sun et al. 2013) to detect horizontally transferred genes. E. coli genes with a strong codon usage bias typically have high CAI values. However, three genes (yagF, yagG, and yagH) from the defective CP 4–6 prophages of E. coli (Wang et al. 2010) have strongly biased codon usage (small Nc values) but relatively small CAI values. This codon usage pattern sets the three genes apart from the rest of E. coli genes (Fig. 9.7) which highlight the value of using the “Nc versus CAI” plot to detect recently horizontally transferred genes. These genes have been “naturalized” in E. coli genome and contribute to E. coli survival and growth (Wang et al. 2010).
The largest mucin gene (mucin 14A) in Drosophila melanogaster also exhibits strong codon usage bias (Nc = 38.6), but in the direction opposite to those highly expressed D. melanogaster genes. Its CAI value is equal to 0.1277, which is the second smallest among all D. melanogaster genes. It is unknown how and why the gene has evolved to have such a peculiar feature.
The distribution of CAI values for the 179 annotated pseudogenes are indicated in red. These pseudogenes have not accumulated frameshifting mutations and presumably were pseudogenized only recently. They tend to be clustered on the lower end of CAI distribution, suggesting that genes with high CAI values require tRNA-mediated selection to maintain the high CAI values.
The gene with the smallest CAI is mgtL, which has only 17 sense codons and is a bacterial mRNA leader that controls the expression of the downstream mgtA (Park et al. 2010). The low CAI is not due to stochastic fluctuation due to small number of codons but because almost all used codons are minor codons. This may represent a real case of a gene preferring minor codons to facilitate its regulatory function.
2.2.3 Problems with CAI and Other Gene-Specific Codon Usage Indices
There are major problems with CAI and other commonly used codon usage indices. While some minor problems have been addressed before (Xia 2007c), the key issue of properly inferring translationally optimal codons (TOCs) remains unresolved. These gene-specific codon usage indices all need to infer TOCs, by using two types of information. The first, represented by tAI (dos Reis et al. 2004), uses the most abundant tRNA and its anticodon to infer TOC within each codon family, i.e., the codon that base-pairs best with the most abundant tRNA is the TOC. The second, represented by CAI, considers the most frequent codon in HEGs as the TOC within each codon family. I will outline the problems to pave the way for the presentation of a new index of translation elongation in the next section (ITE , Xia 2015).
2.2.3.1 Problem with Codon Usage Indices Using tRNA Abundance to Infer TOCs
For indices such as tAI that use tRNA abundance information to define TOCs, the main problem is that TOCs cannot be inferred reliably from tRNA gene copy numbers or experimentally measured tRNA abundance. For example, inosine is expected to pair best with C and U, less with A (partly because of the bulky I/A pairing involving two purines), and not with G. However, tRNAVal/IAC from rabbit liver pairs better with GUG codon than with other synonymous codons (Jank et al. 1977; Mitra et al. 1977). No one would have identified GUG as the best codon for tRNAVal/IAC without actually seeing the experimental result.
Similarly, the Bacillus subtilis genome codes tRNAAla/GGC for decoding GCY codons. One would have thought that GCC codon, which forms Watson-Crick base pairing with the anticodon, would be translationally more optimal than GCU. However, GCU is used much more frequently than GCC in HEGs than LEGs in B. subtilis . We have encountered a similar example in Table 9.4 involving Cys codon usage in HEGs. There are four tRNACys genes with the same anticodon GCA forming Watson-Crick base pair with UGC codon, but no tRNACys gene with anticodon forming Watson-Crick base pair with the alternative UGU codon. We would have taken UGC as the TOC. However, UGU is used far more frequently than UGC codon in highly expressed yeast genes relative to LEGs. In short, in all these cases we would be wrong to use the most abundant tRNA species and its matching codon to infer TOC.
There is one more reason for tRNA abundance not able to reliably predict TOCs. What matters in translation elongation is not the abundance of transcribed tRNAs but the availability of charged tRNAs. It is tedious to determine the level of charged tRNAs, and researchers typically would use transcriptionally determined tRNAs or even the number of tRNA genes in the genome as a proxy of charged tRNAs. Unfortunately, the abundance of tRNAs often do not reflect the abundance of charged tRNA (Elf et al. 2003).
Furthermore, codon-anticodon base pairing is known to be context-dependent (Lustig et al. 1989). For example, a wobble cmo5U in the anticodon of tRNAPro, tRNAAla, and tRNAVal can read all four synonymous codons in the respective codon family, but the same cmo5U in tRNAThr cannot read C-ending codons (Nasvall et al. 2007). For this reason, the optimal codon usage is likely better approximated by the codon usage of HEGs than what we can infer based on codon-anticodon pairing. Consistent with this proposition, CAI , which is based on the codon usage of HEGs (HEGs), performs better in predicting protein production or abundance than other indices based on tRNAs (Coghlan and Wolfe 2000; Comeron and Aguade 1998; Duret and Mouchiroud 1999).
2.2.3.2 Problem with Using Codon Usage of HEGs to Infer TOCs
Codon usage indices such as CAI that use codon usage of HEGs to infer TOCs also have problems. Other than those previously outlined (Xia 2007c), it often leads to wrong interpretation of tRNA-mediated selection. I illustrate this problem here with the Ala codon subfamily GCR (where R stands for either A or G). The frequencies of GCA and GCG in E. coli HEGs, as compiled and distributed with EMBOSS (Rice et al. 2000), are 1973 and 2654, respectively, which may lead one to think that E. coli translation machinery prefers GCG over GCA. However, the codon frequencies of GCA and GCG for E. coli non-HEGs are 25,511 and 43,261, respectively. Thus, GCA is relatively more frequent in E. coli HEGs than in E. coli non-HEGs. This suggests that mutation bias favors GCG, but tRNA-mediated selection favors GCA. The battle between the mutation bias and tRNA-mediated selection leads to increased usage of GCA in E. coli HEGs relative to LEGs , although GCA is still not as frequent as GCG in HEGs. This interpretation is corroborated by the E. coli genome encoding three tRNAArg genes for GCR codons, all with a UGC anticodon forming perfect Watson-Crick base pair with codon GCA.
The example above illustrates the point that mutation bias is reflected to codon usage of lowly expressed genes. This is what has driven the formulation, development, and implementation of a new codon usage index, ITE (Xia 2015).
2.3 ITE (Index of Translation Elongation)
2.3.1 Illustration of ITE Calculation
ITE is implemented in DAMBE (Xia 2013, 2017d). There are in fact four different implementations of ITE in DAMBE, depending on how one would classify codons into codon families. The first implementation is the most extreme (unconventional) and classifies all sense codons into NNR or NNY codon families or subfamilies. For example, the fourfold alanine codon is broken into GCR and GCY subfamilies. For such an NNR or NNY codon family or subfamily i, we first define Pi.HEG and Pi.non-HEG as the proportion of codon i within its R-ending or Y-ending family for E. coli HEGs and non-HEGs. Take data for codons GCA and GCG in Table 9.9, for example:
where SGCA and SGCG may be viewed as relative codon frequencies of HEGs corrected for the “background” non-HEGs. Codon i is considered selected for if Si > 1 and against if Si < 1. Thus, codon GCA is considered selected for because, according to Eq. (9.7), SGCA > 0. This insight would be obscured if we use codon frequency data from E. coli HEGs only which would have suggested that codon GCA is selected against. The Si values for the four sense codons in E. coli are listed in Table 9.9.
We now compute wi as follows:
The index of translation elongation (ITE ) is then calculated in the same way as CAI except that, in this particular codon family classification, the computation is applied to NNR and NNY codon subfamilies:
where Fi is the frequency of codon i and Ns is the number of sense codons (excluding those in single-codon families). For example, AUG for methionine, AUA for isoleucine, and UGG for tryptophan in the standard genetic code are excluded from computing ITE . Just like CAI , tAI, and Nc, ITE is a gene-specific index of codon usage bias .
One may note that CAI is a special case of ITE when there is absolutely no codon usage bias in non-HEGs in all codon subfamilies. That is, when NGCA.non-HEG = NGCG.non-HEG, NGCC.non-HEG = NGCU.non-HEG, and so on. The range of ITE is the same as CAI, i.e., between 0 and 1.
Readers may demand a justification for the extreme classification of all sense codons into NNR and NNY codon families. The main reason is that, for genes encoded by the nuclear genome, the R-ending codons are typically decoded by two types of tRNA species (one with a wobble C and the other with a wobble U), whereas the Y-ending codons are decoded typically by a single type of tRNA species with either a wobble G or a wobble A modified to inosine, but never by both (Grosjean et al. 2007; Marck and Grosjean 2002). For this reason, the R-ending and Y-ending codons, even within a single fourfold codon family, are subject to different tRNA-mediated selection and therefore should be treated separately. Such implementation is also relevant for certain experimental settings that induce mutation almost exclusively in NNY codons, which is the case in Kudla et al. (2009). However, for comparative purposes, I have included two alternative ITE implementations in DAMBE (Xia 2013, 2017d): (1) with compound sixfold and eightfold codon families broken into twofold and fourfold codon families and (2) lumping all synonymous codons into one codon family. One may access the function by clicking “Seq.Analysis|Codon usage|Index of translation elongation” and then choosing the desired implementation.
2.3.2 A Major Controversy Resolved by the Application of ITE
Highly expressed genes in bacteria and unicellular eukaryotes overuse codons that match the anticodon of the most abundant tRNA (Ikemura 1981a, b, 1982, 1992). When such codons are replaced by rarely used codons, protein production is reduced (Robinson et al. 1984; Sorensen et al. 1989). Similarly, when codon usage is optimized, protein production is increased (Haas et al. 1996; Kaishima et al. 2016; Ngumbela et al. 2008). However, to what degree is translation elongation rate-limiting has been controversial. Early theoretical considerations (Andersson and Kurland 1983; Bulmer 1990, 1991; Liljenstrom and von Heijne 1987) tend to favor the argument that translation elongation is not rate-limiting in protein production, but translation initiation is. This hypothesis states that codon-anticodon adaptation and increased elongation efficiency are not related to protein production. Instead, the benefit of codon adaptation and increased elongation efficiency is to increase ribosomal availability for global translation and timely response to environmental perturbations.
To test these two alternative hypotheses, Kudla et al. (2009) engineered a synthetic library of 154 genes, all encoding the same green fluorescent protein in Escherichia coli , but differing in synonymous sites (and consequently the degree of codon adaptation, as measured by codon adaptation index or CAI). All sequences share an identical 5′ UTR of 144 nt long, so there is no variation in the Shine-Dalgarno sequence. Because the engineered genes all encode the same protein, it is justifiable to use protein abundance as a proxy for protein production (assuming that protein molecules sharing the same amino acid sequence have the same degradation rate).
Kudla et al. (2009) used minimum folding energy (MFE), computed from sites −4 to +37 (where ribosomes position themselves at the initiation codon), as a proxy for initiation efficiency. The rationale for using MFE as a measure of translation initiation is that an initiation codon would be inaccessible if it is embedded in a strong secondary structure and that accessibility of the initiation codon is a key determinant of translation initiation efficiency (Nakamoto 2006). Stable secondary structure in sequences positioned at or before the start codon has been experimentally shown to inhibit translation initiation (Osterman et al. 2013), presumably because it embeds SD and start codon in a structural stem and consequently hiding the SD and start codon signals from ribosomes. The previous chapter on translation initiation has already highlighted the point that mRNAs in bacteria and unicellular eukaryotes tend to have much weaker secondary structure near the start codon than elsewhere, especially those from highly expressed.
Kudla et al. interpreted CAI as a proxy of translation elongation. If both translation initiation and elongation contribute to translation efficiency, then protein production is expected to depend on both MFE and CAI. If only translation initiation is important, then protein production will depend on MFE only. They found that MFE accounts for 44% of the variation in protein production but CAI is essentially unrelated to protein production. They concluded consequently that “translation initiation, not elongation, is rate-limiting for gene expression.”
The conclusion by Kudla et al. (2009), however, is based on two critical assumptions. First, MFE and CAI are good proxies of translation initiation and elongation efficiencies, respectively. Second, the effect of translation elongation is independent on translation initiation. The problem with the second assumption has been pointed out recently (Supek and Smuc 2010; Tuller et al. 2010) who reanalyzed the data in addition to providing an overwhelming amount of additional empirical evidence to demonstrate the joint effect of both translation initiation and elongation on protein production. In short, protein production rate is expected to increase with elongation efficiency only when translation initiation is efficient. If translation initiation is slow, then increasing elongation rate is not expected to increase protein production. Kudla et al. (2009) ignored the dependence of elongation effect on translation initiation.
Xia (2015) reanalyzed the experimental data in Kudla et al. (2009) with two improvements, by replacing CAI by ITE and by incorporating translation initiation and elongation into one model. Three points are worth highlighting in Fig. 9.8a. First, in contrast to a nonsignificant relationship between protein abundance and CAI, the protein abundance and ITE are highly significantly correlated (p = 0.0001, Fig. 9.8a). Second, when ITE is small (e.g., ITE < 0), protein abundance is generally low, suggesting that translation elongation is limiting. Third, a large ITE (efficient translation elongation) does not imply high protein production, e.g., when translation initiation is very slow. One expects a large ITE to be associated with increase protein production only when translation initiation is efficient.
Xia (2015) binned MFE into four MFE categories, from strong secondary structure to weak secondary structure (−15.3, −11), (−10.9, −9), (−8.7, −6.2), and (−6, −3.5), representing translation initiation from the lowest to the highest, and designated as MFE1-MFE4 (Fig. 9.8b). The intervals are chosen in such a way that all MFE values fall into four roughly equal-sized groups with within-group MFE being as small as possible. The benefit of binning is that one can exclude the MFE variable so that the effect of ITE can be modeled more explicitly. It is for the same reason that Tuller et al. (2010) also used binned analysis for this data set.
In the MFE1 group, translation initiation is the lowest, and we should expect little increase of protein production with translation elongation efficiency (ITE ). This is consistent with the empirical result (Fig. 9.8b) where the relationship between ITE and protein abundance is not statistically significant in the MFE1 group (b = 67.545, p = 0.4213, Fig. 9.8b), with ITE accounting for only 2% of total variation in ranked protein abundance (rProt). In contrast, when translation initiation is more efficient in groups MFE2-MFE4, rProt increases significantly with ITE, with the simple linear model consistently accounts for about 17% of the total variation in rProt (Fig. 9.8b, with b varying from 216.60 to 263.87). Thus, the contribution of translation elongation (ITE) to protein production is much greater than previously documented for this data set, i.e., absent (Kudla et al. 2009) or less than 3% of the total variation in protein production (Tuller et al. 2010). Readers may consult Xia (2015) for more explicit modeling of the protein abundance on translation initiation and elongation.
One might wonder why previous studies, although not taking translation initiation into consideration, almost always consistently show positive relationship between translation efficiency and codon adaptation. There are two explanations. First, previous experimental studies were carried out typically on highly expressed genes with efficient translation initiation efficiency. Such studies are equivalent to excluding the MFE1 group in Fig. 9.8b. Second, for correlational studies, nature generally does not generate bacterial genes with high translation initiation efficiency but poor codon adaptation or low translation initiation with high codon adaptation. However, the experiment by Kudla et al. (2009) generated both of these unnatural associations, leading to a lack of positive association between protein production and codon adaptation. This example highlights the point that a well-intended and well-done experiment can mislead us. It represents another illustration of Simpson’s Paradox in which wrong conclusion is reached when one omits a contributing variable.
3 Translation Elongation Efficiency and Accuracy
Given a fixed translation initiation efficiency, our conceptual model for the relationship between codon adaptation (CA) and tRNA-mediated selection, in its simplest form, is
where CA is tRNA-mediated codon adaptation often measured by CAI or ITE (Xia 2015) and SE is selection for translation efficiency (in unit of protein produced per mRNA molecule). The slope b is typically positive, i.e., stronger selection for translation efficiency leads to better codon adaptation. Many studies have demonstrated a strong relationship between codon adaptation and gene expression (Coghlan and Wolfe 2000; Duret and Mouchiroud 1999; Gouy and Gautier 1982).
One key deficiency in Eq. (9.10) is that it does not distinguish between selection due to translation efficiency or that due to translation accuracy (Akashi 1994). Take Asn codons AAC and AAU in E. coli, for example. AAC is a major codon (heavily used by highly expressed genes and decoded by the most abundant isoacceptor tRNA ), whereas AAU is a rarely used minor codon. A major codon is typically translated faster than a minor codon, and highly expressed E. coli genes use AAC almost exclusively to code for Asn, so one could argue that the overuse of AAC is driven by SE. However, AAC and AAU also differ in misreading rate, in particular by tRNALys which ideally should decode only AAA and AAG codons but does misread AAC and AAU, leading to Asn replaced by Lys. This misreading error rate is six times greater for AAU than for AAC, with the error ratio maintained in both Asn-starved and Asn-non-starved conditions (Johnston et al. 1984) or with streptomycin used to inhibit translation (Johnston and Parker 1985). Thus, the overuse of AAC could be driven either by selection for increased translation efficiency or increased translation accuracy or both. Designating SA as selection for translation accuracy, we have three alternative hypotheses expressed, in the simplest form, as
Akashi (1994) classified amino acid sites into conserved sites (assumed to be functionally important with high SA) and variable sites (assumed to experience low SA). He reasoned that, if codon adaptation is due to selection for translation efficiency, then all codons in the gene should be subject to similar selection regardless of whether the codon is in a functionally important or unimportant site. In contrast, if codon adaptation is driven by selection for translation accuracy, then the selection is stronger in functionally important sites than in functionally unimportant sites. So we should observe greater codon usage bias in functionally important codon sites than functionally unimportant codon sites. He found greater codon adaptation in conserved amino acid sites than in variable amino acid sites and concluded that this difference between the conserved and variable sites to have resulted from selection for accuracy.
There is a problem with the conclusion. Take lysine codons (AAA and AAG) and glutamate codons (GAA and GAG), for example. Suppose that AAA codon is favored by selection in lysine codon family and GAG favored in glutamate codon family. Also suppose that an ancestral gene has good codon adaptation with lysine coded by AAA and glutamate coded by GAG. Now some lysine sites experienced nonsynonymous substitutions from AAA to GAA. These sites are now designated as variable sites and are occupied by a minor codon GAA. This would result in an association between “poor codon adaptation” and variable sites that have little to do with translation accuracy. Akashi (1994) was aware of this problem but did not provide a definitive solution.
4 Amino Acid Usage and Translation Elongation Efficiency
There are at least four factors contributing to amino acid usage. The first two are related to selection for translation elongation efficiency, the third related to number of synonymous codons, and the fourth related to genomic mutation bias.
4.1 Factors Related to Selection for Translation Elongation Efficiency
Some amino acids are abundant and energetically cheap to make, i.e., consuming few ATPs in their production, whereas others are rare and energetically expensive, so mass-produced proteins should maximize the use of abundant and cheap amino acids (Akashi and Gojobori 2002). However, such a hypothesis, without considering other factors, often does not produce easily testable predictions. For example, we expect highly expressed proteins to maximize the use of energetically cheap amino acids and avoid the use of the expensive ones. However, many ribosome proteins are highly expressed, yet the need for many of them to bind to the negatively charged mRNA demands the usage of positively charged amino acids such as Lys and Arg that are typically energetically expensive to make in the cell. This would lead to an association between high expression and energetically expensive amino acid, thus confounding the prediction that highly expressed genes should maximize the use of cheap amino acids. Furthermore, amino acid availability changes with environment, and the same amino acid may be manufactured differently with different energy consumption in different organisms. So it is not easy to measure energetic cost of amino acids in different organisms. One could, however, turn the question around and ask how one can characterize energetic costs of amino acids by bioinformatic means. For example, in the ideal situation when all other factors affecting amino acid usage have been controlled for, we may infer that the avoided amino acid is perhaps rare or energenetically expensive to make. This type of inference is of course not very satisfactory and is often derogatively termed the backdoor smuggling approach because one does not present direct evidence for energetic cost.
The other factor related to translation elongation is the tRNA abundance, and one expects mass-produced proteins to use amino acids with many tRNAs to carry them. Designating the proportion of tRNAs carrying amino acid i as Pi, and the frequency of amino acid i in highly expressed genes as Ni, Xia (1998a) analytically derived an equation with Pi linearly increasing with the square root of Ni. The relationship was well substantiated with data from E. coli , Salmonella typhimurium, and Saccharomyces cerevisiae (Xia 1998a).
Single-stranded DNA (ssDNA) bacteriophages do not carry their own tRNA and depend entirely on the host tRNA pool for decoding their codons. So one would predict that amino acid usage in these phages should be correlated with the abundance of tRNAs in the host cell. This prediction is tested in a study (Chithambaram et al. 2014b) of phages infecting E. coli, by using tRNA gene copy number in E. coli as a proxy of tRNA abundance (Fig. 9.9). An amino acid carried by more tRNA is used more frequently than another carried by few tRNAs.
4.2 Number of Synonymous Codons
In the lack of any selection, we would expect amino acid usage to increase with the number of synonymous codons (Fig. 9.10). However, this relationship is confounded with the number of tRNAs carrying each amino acid in the cell. If we designate the number of tRNA carrying amino acid i as Ni.tRNA and the number of synonymous codons for amino acid i as Ni.syn codon, then amino acid usage depends on both. Ni.tRNA and Ni.syn codon are also positively correlated.
4.3 Genomic Mutation Bias
E. coli genomes have roughly equal nucleotide frequencies. A more AT-rich or GC-rich genome would tend to have more AT-rich or GC-rich codon and their encoded amino acids. For example, AT-rich genomes in bacterial pathogens tend to have many more lysine (encoded by AAA and AAG) than less AT-rich genomes (Xia and Palidwor 2005). This is highly visible even with mild difference in genomic AT content. For example, yeast ( Saccharomyces cerevisiae ) is only mildly AT-rich (0.3090, 0.1917, 0.1913, and 0.3080 for A, C, G, and T, respectively), but the yeast clearly uses more amino acids encoded by AT-rich codons and fewer amino acid encoded by GC-rich codons (Table 9.10).
In summary, amino acid usage (U) is a function of four factors:
where E is energetic cost, NtRNA and Nsyncodon have been defined before, and GC% is genomic GC% reflecting mutation bias. One needs to include all these factors in a model in order to reach a reasonable understanding of the determinants of amino acid usage.
References
Abdel-Hameed EA, Ji H, Shata MT (2016) HIV-induced epigenetic alterations in host cells. Adv Exp Med Biol 879:27–38
Abolbaghaei A, Silke JR, Xia X (2017) How changes in anti-SD sequences would affect SD sequences in Escherichia coli and Bacillus subtilis. G3 (Bethesda, Md) 7(5):1607–1615
Abraham EP, Chain E (1940) An enzyme from bacteria able to destroy penicillin. Rev Infect Dis 10(4):677–678
Abraham EP, Chain E, Fletcher CM, Florey HW, Gardner AD, Heatley NG, Jennings MA (1941) Further observations on penicillin. Lancet 238(6155):177–189
Abraham JM, Feagin JE, Stuart K (1988) Characterization of cytochrome c oxidase III transcripts that are edited only in the 3′ region. Cell 55(2):267–272
Adamski FM, McCaughan KK, Jorgensen F, Kurland CG, Tate WP (1994) The concentration of polypeptide chain release factors 1 and 2 at different growth rates of Escherichia coli. J Mol Biol 238(3):302–308
Aerts S, Van Loo P, Thijs G, Mayer H, de Martin R, Moreau Y, De Moor B (2005) TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res 33(Web Server):W393–W396
Aerts S, van Helden J, Sand O, Hassan BA (2007) Fine-tuning enhancer models to predict transcriptional targets across multiple genomes. PLoS One 2(11):e1115
Ahn BY, Jones EV, Moss B (1990) Identification of the vaccinia virus gene encoding an 18-kilodalton subunit of RNA polymerase and demonstration of a 5′ poly(A) leader on its early transcript. J Virol 64(6):3019–3024
Aird WC, Parvin JD, Sharp PA, Rosenberg RD (1994) The interaction of GATA-binding proteins and basal transcription factors with GATA box-containing core promoters. A model of tissue-specific gene expression. J Biol Chem 269(2):883–889
Akaike H (1973) Information theory and an extension of maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136(3):927–935
Akashi H, Gojobori T (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA 99(6):3695–3700
Alatortsev VS, Cruz-Reyes J, Zhelonkina AG, Sollner-Webb B (2008) Trypanosoma brucei RNA editing: coupled cycles of U deletion reveal processive activity of the editing complex. Mol Cell Biol 28(7):2437–2445
Alderwick LJ, Seidel M, Sahm H, Besra GS, Eggeling L (2006) Identification of a novel arabinofuranosyltransferase (AftA) involved in cell wall arabinan biosynthesis in Mycobacterium tuberculosis. J Biol Chem 281(23):15653–15661
Allen A, Flemstrom G, Garner A, Kivilaakso E (1993) Gastroduodenal mucosal protection. Physiol Rev 73(4):823–857
Alm RA, Trust TJ (1999) Analysis of the genetic diversity of Helicobacter pylori: the tale of two genomes. J Mol Med 77(12):834–846
Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL et al (1999) Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397(6715):176–180
Alm RA, Bina J, Andrews BM, Doig P, Hancock RE, Trust TJ (2000) Comparative genomics of Helicobacter pylori: analysis of the outer membrane protein families. Infect Immun 68(7):4155–4168
Althaus E, Caprara A, Lenhof HP, Reinert K (2002) Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics. Bioinformatics 18(Suppl 2):S4–S16
Altschul SF (1996) Local alignment statistics. Meth Enzymol 274:460–480
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Anderson KP, Crable SC, Lingrel JB (1998) Multiple proteins binding to a GATA-E box-GATA motif regulate the erythroid Kruppel-like factor (EKLF) gene. J Biol Chem 273(23):14347–14354
Andersson DI, Kurland CG (1983) Ram ribosomes are defective proofreaders. Mol Gen Genet 191(3):378–381
Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003) Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100(7):3889–3894
Arbibe L, Sansonetti PJ (2007) Epigenetic regulation of host response to LPS: causing tolerance while avoiding toll errancy. Cell Host Microbe 1(4):244–246
Arnqvist G (2006) Sensory exploitation and sexual conflict. Philos Trans R Soc Lond Ser B Biol Sci 361(1466):375–386
Arvaniti E, Moulos P, Vakrakou A, Chatziantoniou C, Chadjichristos C, Kavvadas P, Charonis A, Politis PK (2016) Whole-transcriptome analysis of UUO mouse model of renal fibrosis reveals new molecular players in kidney diseases. Sci Rep 6:26235
Ast G (2004) How did alternative splicing evolve? Nat Rev Genet 5(10):773–782
Auch AF, Henz SR, Holland BR, Goker M (2006) Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinform 7:350
Awan AR, Manfredo A, Pleiss JA (2013) Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans. Proc Natl Acad Sci USA 110(31):12762–12767
Axon AT (1999) Are all helicobacters equal? Mechanisms of gastroduodenal pathology and their clinical implications. Gut 45(Suppl 1):I1–I4
Bablanian R, Banerjee AK (1986) Poly(riboadenylic acid) preferentially inhibits in vitro translation of cellular mRNAs compared with vaccinia virus mRNAs: possible role in vaccinia virus cytopathology. Proc Natl Acad Sci USA 83(5):1290–1294
Bablanian R, Coppola G, Masters PS, Banerjee AK (1986) Characterization of vaccinia virus transcripts involved in selective inhibition of host protein synthesis. Virology 148(2):375–380
Bablanian R, Goswami SK, Esteban M, Banerjee AK (1987) Selective inhibition of protein synthesis by synthetic and vaccinia virus-core synthesized poly(riboadenylic acids). Virology 161(2):366–373
Bablanian R, Scribani S, Esteban M (1993) Amplification of polyadenylated nontranslated small RNA sequences (POLADS) during superinfection correlates with the inhibition of viral and cellular protein synthesis. Cell Mol Biol Res 39(3):243–255
Bag J (2001) Feedback inhibition of poly(A)-binding protein mRNA translation. A possible mechanism of translation arrest by stalled 40 S ribosomal subunits. J Biol Chem 276(50):47352–47360
Bag J, Bhattacharjee RB (2010) Multiple levels of post-transcriptional control of expression of the poy (A)-binding protein. RNA Biol 7(1):5–12
Baik SC, Kim KM, Song SM, Kim DS, Jun JS, Lee SG, Song JY, Park JU, Kang HL, Lee WK et al (2004) Proteomic analysis of the sarcosine-insoluble outer membrane fraction of Helicobacter pylori strain 26695. J Bacteriol 186(4):949–955
Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373
Baird SD, Turcotte M, Korneluk RG, Holcik M (2006) Searching for IRES. RNA 12(10):1755–1785
Baird SD, Lewis SM, Turcotte M, Holcik M (2007) A search for structurally similar cellular internal ribosome entry sites. Nucleic Acids Res 35(14):4664–4677
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, Cambridge, MA
Bamford DH, Caldentey J, Bamford JK (1995) Bacteriophage PRD1: a broad host range DSDNA tectivirus with an internal membrane. Adv Virus Res 45:281–319
Bao J, Bedford MT (2016) Epigenetic regulation of the histone-to-protamine transition during spermiogenesis. Reproduction 151(5):R55–R70
Baron D, Cocquet J, Xia X, Fellous M, Guiguen Y, Veitia RA (2004) An evolutionary and functional analysis of FoxL2 in rainbow trout gonad differentiation. J Mol Endocrinol 33:705–715
Bastianelli G, Bouillon A, Nguyen C, Crublet E, Petres S, Gorgette O, Le-Nguyen D, Barale JC, Nilges M (2011) Computational reverse-engineering of a spider-venom derived peptide active against Plasmodium falciparum SUB1. PLoS One 6(7):e21812
Bauerfeind P, Garner R, Dunn BE, Mobley HL (1997) Synthesis and activity of Helicobacter pylori urease and catalase at low pH. Gut 40(1):25–30
Baumgartner HK, Montrose MH (2004) Regulated alkali secretion acts in tandem with unstirred layers to regulate mouse gastric surface pH. Gastroenterology 126(3):774–783
Beier H, Grimm M (2001) Misreading of termination codons in eukaryotes by natural nonsense suppressor tRNAs. Nucleic Acids Res 29(23):4767–4782
Bell D, Bell AH, Bondaruk J, Hanna EY, Weber RS (2016) In-depth characterization of the salivary adenoid cystic carcinoma transcriptome with emphasis on dominant cell type. Cancer 122(10):1513–1522
Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21(11):2657–2666
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple hypothesis testing under dependency. Ann Stat 29:1165–1188
Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031
Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, Rizk G (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinform 16:288
Benzer S, Champe SP (1962) A change from nonsense to sense in the genetic code. Proc Natl Acad Sci USA 48:1114–1121
Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry. W. H. Freeman and Co, New York
Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C et al (2010) Integrative analysis of the melanoma transcriptome. Genome Res 20(4):413–427
Bergsten E, Uutela M, Li X, Pietras K, Ostman A, Heldin CH, Alitalo K, Eriksson U (2001) PDGF-D is a specific, protease-activated ligand for the PDGF beta-receptor. Nat Cell Biol 3(5):512–516
Bertholet C, Van Meir E, ten Heggeler-Bordier B, Wittek R (1987) Vaccinia virus produces late mRNAs by discontinuous synthesis. Cell 50(2):153–162
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454
Bestor TH, Coxon A (1993) The pros and cons of DNA methylation. Curr Biol 6:384–386
Betney R, de Silva E, Krishnan J, Stansfield I (2010) Autoregulatory systems controlling translation factor expression: thermostat-like control of translational accuracy. RNA 16(4):655–663
Beznoskova P, Gunisova S, Valasek LS (2016) Rules of UGA-N decoding by near-cognate tRNAs and analysis of readthrough on short uORFs in yeast. RNA 22(3):456–466
Bhagwat M, Aravind L (2007) PSI-BLAST tutorial. Methods Mol Biol 395:177–186
Bhatia B, Ponia SS, Solanki AK, Dixit A, Garg LC (2014) Identification of glutamate ABC-transporter component in Clostridium perfringens as a putative drug target. Bioinformation 10(7):401–405
Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL et al (2011) High density DNA methylation array with single CpG site resolution. Genomics 98(4):288–295
Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19(7):818–824
Bierne H, Hamon M, Cossart P (2012) Epigenetics and bacterial infections. Cold Spring Harb Perspect Med 2(12):a010272
Bigaud E, Corrales FJ (2016) Methylthioadenosine (MTA) regulates liver cells proteome and methylproteome: implications in liver biology and disease. Mol Cell Proteomics 15(5):1498–1510
Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816
Bjorkholm B, Lundin A, Sillen A, Guillemin K, Salama N, Rubio C, Gordon JI, Falk P, Engstrand L (2001) Comparison of genetic divergence and fitness between two subclones of Helicobacter pylori. Infect Immun 69(12):7832–7838
Bjornsson A, Isaksson LA (1996) Accumulation of a mRNA decay intermediate by ribosomal pausing at a stop codon. Nucleic Acids Res 24(9):1753–1757
Blackburne BP, Whelan S (2013) Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 30(3):642–653
Blakqori G, van Knippenberg I, Elliott RM (2009) Bunyamwera orthobunyavirus S-segment untranslated regions mediate poly(A) tail-independent translation. J Virol 83(8):3637–3646
Blanchet S, Cornu D, Argentini M, Namy O (2014) New insights into the incorporation of natural suppressor tRNAs at stop codons in Saccharomyces cerevisiae. Nucleic Acids Res 42(15):10061–10072
Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748
Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D et al (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 6(5):656–668
Boehringer D, Thermann R, Ostareck-Lederer A, Lewis JD, Stark H (2005) Structure of the hepatitis C virus IRES bound to the human 80S ribosome: remodeling of the HCV IRES. Structure 13(11):1695
Bogenhagen DF, Clayton DA (2003) The mitochondrial DNA replication bubble has not burst. Trends Biochem Sci 28(7):357–360
Bolden JE, Peart MJ, Johnstone RW (2006) Anticancer activities of histone deacetylase inhibitors. Nat Rev Drug Discov 5(9):769–784
Borodovsky M, McIninch J (1993) GENMARK: parallel gene recognition for both DNA strands. Comput Chem 17:123–133
Bossi L (1983) Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message. J Mol Biol 164(1):73–87
Bossi L, Ruth JR (1980) The influence of codon context on genetic code translation. Nature 286(5769):123–127
Brauch H, Weirich G, Brieger J, Glavac D, Rodl H, Eichinger M, Feurer M, Weidt E, Puranakanitstha C, Neuhaus C et al (2000) VHL alterations in human clear cell renal cell carcinoma: association with advanced tumor stage and a novel hot spot mutation. Cancer Res 60(7):1942–1948
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371
Britten RJ (1986) Rates of DNA sequence evolution differ between taxonomic groups. Science 231:1393–1398
Brooks DR, McLennan DA (1991) Phylogeny, ecology and behavior: a research program in comparative biology. University of Chicago Press, Chicago
Brown CM, Stockwell PA, Trotman CN, Tate WP (1990) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18(21):6339–6345
Brown M, Hughey R, Krogh A, Mian IS, Sjolander K, Haussler D (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol 1:47–55
Brown TA, Cecconi C, Tkachuk AN, Bustamante C, Clayton DA (2005) Replication of mitochondrial DNA occurs by strand displacement with alternative light-strand origins, not via a strand-coupled mechanism. Genes Dev 19(20):2466–2476
Brumme ZL, Dong WW, Yip B, Wynhoven B, Hoffman NG, Swanstrom R, Jensen MA, Mullins JI, Hogg RS, Montaner JS et al (2004) Clinical and immunological impact of HIV envelope V3 sequence variation after starting initial triple antiretroviral therapy. AIDS 18(4):F1–F9
Bucklew JA (1990) Large deviation techniques in decision, simulation, and estimation. Wiley, New York
Bulmer M (1990) The effect of context on synonymous codon usage in genes with low codon usage bias. Nucleic Acids Res 18(10):2869–2873
Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907
Bumann D, Aksu S, Wendland M, Janek K, Zimny-Arndt U, Sabarth N, Meyer TF, Jungblut PR (2002) Proteome analysis of secreted proteins of the gastric pathogen Helicobacter pylori. Infect Immun 70(7):3396–3403
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Burge CB, Karlin S (1998) Finding the genes in genomic DNA. Curr Opin Struct Biol 8(3):346–354
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York
Bury-Mone S, Skouloubris S, Labigne A, De Reuse H (2001) The Helicobacter pylori UreI protein: role in adaptation to acidity and identification of residues essential for its activity and for acid activation. Mol Microbiol 42(4):1021–1034
Calderone TL, Stevens RD, Oas TG (1996) High-level misincorporation of lysine for arginine at AGA codons in a fusion protein expressed in Escherichia coli. J Mol Biol 262(4):407–412
Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Paabo S, Hasegawa M (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47(3):307–322
Capecchi MR (1967) Polypeptide chain termination in vitro: isolation of a release factor. Proc Natl Acad Sci USA 58(3):1144–1151
Capuano F, Mulleder M, Kok R, Blom HJ, Ralser M (2014) Cytosine DNA methylation is found in Drosophila melanogaster but absent in Saccharomyces cerevisiae, Schizosaccharomyces pombe, and other yeast species. Anal Chem 86(8):3697–3702
Cardon LR, Burge C, Clayton DA, Karlin S (1994) Pervasive CpG suppression in animal mitochondrial genomes. Proc Natl Acad Sci USA 91:3799–3803
Carlini DB (2005) Context-dependent codon bias and messenger RNA longevity in the yeast transcriptome. Mol Biol Evol 22(6):1403–1411
Carroll J, Fearnley IM, Shannon RJ, Hirst J, Walker JE (2003) Analysis of the subunit composition of complex I from bovine heart mitochondria. Mol Cell Proteomics 2(2):117–126
Carullo M, Xia X (2008) An extensive study of mutation and selection on the wobble nucleotide in tRNA anticodons in fungal mitochondrial genomes. J Mol Evol 66(5):484–493
Censini S, Lange C, Xiang Z, Crabtree JE, Ghiara P, Borodovsky M, Rappuoli R, Covacci A (1996) Cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors. Proc Natl Acad Sci USA 93(25):14648–14653
Cesar Sanchez J, Padron G, Santana H, Herrera L (1998) Elimination of an HuIFN alpha 2b readthrough species, produced in Escherichia coli, by replacing its natural translational stop signal. J Biotechnol 63(3):179–186
Chakrabarti S, Lanczycki CJ (2007) Analysis and prediction of functionally important sites in proteins. Protein Sci 16(1):4–13
Chakraborty R (1977) Estimation of time of divergence from phylogenetic studies. Can J Genet Cytol 19:217–223
Chambaud I, Heilig R, Ferris S, Barbe V, Samson D, Galisson F, Moszer I, Dybvig K, Wroblewski H, Viari A et al (2001) The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res 29(10):2145–2153
Chan S-W, Egan P (2009) Effects of hepatitis C virus envelope glycoprotein unfolded protein response activation on translation and transcription. Arch Virol 154(10):1631–1640
Chan PP, Lowe TM (2009) GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37(Database issue):D93–D97
Chang SY, McGary EC, Chang S (1989) Methionine aminopeptidase gene of Escherichia coli is essential for cell growth. J Bacteriol 171(7):4071–4072
Charig CR, Webb DR, Payne SR, Wickham JE (1986) Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. Br Med J (Clin Res Ed) 292(6524):879–882
Chen JJ, Peck K, Hong TM, Yang SC, Sher YP, Shih JY, Wu R, Cheng JL, Roffler SR, Wu CW et al (2001) Global analysis of gene expression in invasion by a lung cancer model. Cancer Res 61(13):5223–5230
Chen Q, Yan M, Cao Z, Li X, Zhang Y, Shi J, Feng GH, Peng H, Zhang X, Qian J et al (2016) Sperm tsRNAs contribute to intergenerational inheritance of an acquired metabolic disorder. Science 351(6271):397–400
Chilingaryan A, Gevorgyan N, Vardanyan A, Jones D, Szabo A (2002) Multivariate approach for selecting sets of differentially expressed genes. Math Biosci 176(1):59–69
Chithambaram S, Prabhakaran R, Xia X (2014a) Differential codon adaptation between dsDNA and ssDNA phages in escherichia coli. Mol Biol Evol 31(6):1606–1617
Chithambaram S, Prabhakaran R, Xia X (2014b) The effect of mutation and selection on codon adaptation in escherichia coli bacteriophage. Genetics 197(1):301–315
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73
Chou PY, Fasman GD (1978a) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276
Chou PY, Fasman GD (1978b) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47:45–148
Chu C, Qu K, Zhong FL, Artandi SE, Chang HY (2011) Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell 44(4):667–678
Chu C, Quinn J, Chang HY (2012) Chromatin isolation by RNA purification (ChIRP). J Vis Exp 61:e3912
Chuang SE, Daniels DL, Blattner FR (1993) Global regulation of gene expression in Escherichia coli. J Bacteriol 175(7):2026–2036
Clark AT (2015) DNA methylation remodeling in vitro and in vivo. Curr Opin Genet Dev 34:82–87
Claverie JM (1994) Some useful statistical properties of position-weight matrices. Comput Chem 18(3):287–294
Claverie JM, Audic S (1996) The statistical significance of nucleotide position-weight matrix matches. Comput Appl Biosci 12(5):431–439
Clayton DA (1982) Replication of animal mitochondrial DNA. Cell 28(4):693–705
Clayton DA (2000) Transcription and replication of mitochondrial DNA. Hum Reprod 15(Suppl 2):11–17
Cocquet J, De Baere E, Gareil M, Pannetier M, Xia X, Fellous M, Veitia RA (2003) Structure, evolution and expression of the FOXL2 transcription unit. Cytogenet Genome Res 101:206–211
Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B (2003) INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res 31(13):3468–3470
Coghlan A, Wolfe KH (2000) Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 16(12):1131–1145
Comeron JM, Aguade M (1998) An evaluation of measures of synonymous codon usage bias. J Mol Evol 47(3):268–274
Correa P (1997) Helicobacter pylori as a pathogen and carcinogen. J Physiol Pharmacol 48(Suppl 4):19–24
Cottrell JS (1994) Protein identification by peptide mass fingerprinting. Pept Res 7(3):115–124
Cottrell JS, Sutton CW (1996) The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods Mol Biol 61:67–82
Covacci A, Falkow S, Berg DE, Rappuoli R (1997) Did the inheritance of a pathogenicity island modify the virulence of Helicobacter pylori? Trends Microbiol 5(5):205–208
Covell DG, Wallqvist A, Rabow AA, Thanki N (2003) Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. Mol Cancer Ther 2(3):317–332
Cox SS, van der Giezen M, Tarr SJ, Crompton MR, Tovar J (2006) Evidence from bioinformatics, expression and inhibition studies of phosphoinositide-3 kinase signalling in Giardia intestinalis. BMC Microbiol 6:45
Craigen WJ, Caskey CT (1986) Expression of peptide chain release factor 2 requires high-efficiency frameshift. Nature 322(6076):273–275
Craigen WJ, Caskey CT (1987) The function, structure and regulation of E. coli peptide chain release factors. Biochimie 69(10):1031–1041
Craigen WJ, Cook RG, Tate WP, Caskey CT (1985) Bacterial peptide chain release factors: conserved primary structure and possible frameshift regulation of release factor 2. Proc Natl Acad Sci USA 82(11):3616–3620
Craigen WJ, Lee CC, Caskey CT (1990) Recent advances in peptide chain termination. Mol Microbiol 4(6):861–865
Crick FH (1966) Codon—anticodon pairing: the wobble hypothesis. J Mol Biol 19(2):548–555
Curran JF, Yarus M (1988) Use of tRNA suppressors to probe regulation of Escherichia coli release factor 2. J Mol Biol 203(1):75–83
Czerwoniec A, Dunin-Horkawicz S, Purta E, Kaminska KH, Kasprzak JM, Bujnicki JM, Grosjean H, Rother K (2009) MODOMICS: a database of RNA modification pathways. 2008 update. Nucleic Acids Res 37(Database issue):D118–D121
Danchin A (2002) The Delphic boat : what genomes tell us. Harvard University Press, Cambridge, MA
David E, Tramontin T, Zemmel R (2009) Pharmaceutical R&D: the road to positive returns. Nat Rev Drug Discov 8(8):609–610
Davies J, Jones DS, Khorana HG (1966) A further study of misreading of codons induced by streptomycin and neomycin using ribopolynucleotides containing two nucleotides in alternating sequence as templates. J Mol Biol 18(1):48–57
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC, pp 345–352
Delorenzi M, Speed T (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18(4):617–625
Deng R, Huang M, Wang J, Huang Y, Yang J, Feng J, Wang X (2006) PTreeRec: phylogenetic tree reconstruction based on genome BLAST distance. Comput Biol Chem 30(4):300–302
Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149(6):1233–1244
Deng Q, Ramskold D, Reinius B, Sandberg R (2014a) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196
Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al (2014b) Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158(4):849–860
Desper R, Gascuel O (2002) Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol 9(5):687–705
Dewey CN, Rogozin IB, Koonin EV (2006) Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics 7:311
Diehn M, Eisen MB, Botstein D, Brown PO (2000) Large-scale identification of secreted and membrane-associated gene products using DNA microarrays. Nat Genet 25(1):58–62
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
Dobzhansky T (1973) Nothing in biology makes sense except in the light of evolution. Am Biol Teach 35:125–129
Donly BC, Edgar CD, Adamski FM, Tate WP (1990) Frameshift autoregulation in the gene for Escherichia coli release factor 2: partly functional mutants result in frameshift enhancement. Nucleic Acids Res 18(22):6517–6522
Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN (1983) Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science 221(4607):275–277
Dorokhov YL, Skulachev MV, Ivanov PA, Zvereva SD, Tjulkina LG, Merits A, Gleba YY, Hohn T, Atabekov JG (2002) Polypurine (A)-rich sequences promote cross-kingdom conservation of internal ribosome entry. Proc Natl Acad Sci USA 99(8):5301–5306
dos Reis M, Savva R, Wernisch L (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32(17):5036–5044 Print 2004
Doudna JA, Sarnow P (2007) Translation initiation by viral internal ribosome entry sites. In: Mathews MB, Sonenberg N, Hershey J (eds) Translational control in biology and medicine. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 129–154
Drews J, Ryser S (1997) The role of innovation in drug development. Nat Biotechnol 15(13):1318–1319
Drouin G, Daoud H, Xia J (2008) Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol 49(3):827–831
Drummond A, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7(1):214
Drummond A, Rodrigo AG (2000) Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. Mol Biol Evol 17(12):1807–1815
Drummond A, Forsberg R, Rodrigo AG (2001) The inference of stepwise changes in substitution rates using serial sequence samples. Mol Biol Evol 18(7):1365–1371
Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG (2003a) Measurably evolving populations. Trends Ecol Evol 18(9):481–488
Drummond A, Pybus OG, Rambaut A (2003b) Inference of viral evolutionary rates from molecular sequences. Adv Parasitol 54:331–358
Durbin R (1998) Biological sequence analysis : probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96(8):4482–4487
DuRose JB, Scheuner D, Kaufman RJ, Rothblum LI, Niwa M (2009) Phosphorylation of eukaryotic translation initiation factor 2alpha coordinates rRNA transcription and translation inhibition during endoplasmic reticulum stress. Mol Cell Biol 29(15):4295–4307
Duval M, Korepanov A, Fuchsbauer O, Fechter P, Haller A, Fabbretti A, Choulier L, Micura R, Klaholz BP, Romby P et al (2013) Escherichia coli Ribosomal protein S1 unfolds structured mRNAs onto the ribosome for active translation initiation. PLoS Biol 11(12):e1001731
Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA et al (2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 38(12):1378–1385
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797
Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373
Efron B (1982) The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics, Philadelphia
Ehnman M, Missiaglia E, Folestad E, Selfe J, Strell C, Thway K, Brodin B, Pietras K, Shipley J, Ostman A et al (2013) Distinct effects of ligand-induced PDGFRalpha and PDGFRbeta signaling in the human rhabdomyosarcoma tumor cell and stroma cell compartments. Cancer Res 73(7):2139–2149
Ehrenberg M, Tenson T (2002) A new beginning of the end of translation. Nat Struct Biol 9(2):85–87
Einstein A, Russell B, Dewey J, Millikan RA, Dreiser T, Wells HG, Nansen F, Jeans SJ, Babbitt I, Keith SA et al (1931) Living philosophies. Simon and Schuster, New York
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868
Elf J, Nilsson D, Tenson T, Ehrenberg M (2003) Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300(5626):1718–1722
Elroy-Stein O, Merrick W (2007) Translation initiation via cellular internal ribosome entry sites. In: Mathews MB, Sonenberg N, Hershey J (eds) Translational control in biology and medicine. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 155–172
Engel E, Peskoff A, Kauffman GL Jr, Grossman MI (1984) Analysis of hydrogen ion concentration in the gastric gel mucus layer. Am J Phys 247(4 Pt 1):G321–G338
Engelberg-Kulka H (1981) UGA suppression by normal tRNA Trp in Escherichia coli: codon context effects. Nucleic Acids Res 9(4):983–991
Epstein CB, Butow RA (2000) Microarray technology – enhanced versatility, persistent challenge. Curr Opin Biotechnol 11(1):36–41
Eswarappa SM, Potdar AA, Koch WJ, Fan Y, Vasu K, Lindner D, Willard B, Graham LM, DiCorleto PE, Fox PL (2014) Programmed translational readthrough generates antiangiogenic VEGF-Ax. Cell 157(7):1605–1618
Evans T, Felsenfeld G, Reitman M (1990) Control of globin gene transcription. Annu Rev Cell Biol 6:95–124
Eyre-Walker A (1996) The close proximity of Escherichia coli genes: consequences for stop codon and synonymous codon use. J Mol Evol 42(2):73–78
Eyre-Walker A, Bulmer M (1993) Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res 21:4599–4603
Ezzell C (2002) Proteins rule. Sci Am 286(4):40–47
Farazi TA, Waksman G, Gordon JI (2001) The biology and enzymology of protein N-myristoylation. J Biol Chem 276(43):39501–39504
Farnham PJ, Platt T (1981) Rho-independent termination: dyad symmetry in DNA causes RNA polymerase to pause during transcription in vitro. Nucleic Acids Res 9(3):563–577
Fasman GD, Chou PY (1974) Prediction of protein conformation: consequences and aspirations. In: Blout ER, Bovey FA, Goodman M, Latan N (eds) Peptides, polypeptides and proteins. Wiley, New York, pp 114–125
Fatemi M, Hermann A, Pradhan S, Jeltsch A (2001) The activity of the murine DNA methyltransferase Dnmt1 is controlled by interaction of the catalytic domain with the N-terminal part of the enzyme leading to an allosteric activation of the enzyme after binding to methylated DNA. J Mol Biol 309(5):1189–1199
Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249
Felsenstein J (1978a) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410
Felsenstein J (1978b) The number of evolutionary trees. Syst Zool 27:27–33
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland
Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360
Feng DF, Doolittle RF (1990) Progressive alignment and phylogenetic tree construction of protein sequences. Methods Enzymol 183:375–387
Fernandez-Pinar R, Lo Sciuto A, Rossi A, Ranucci S, Bragonzi A, Imperi F (2015) In vitro and in vivo screening for novel essential cell-envelope proteins in Pseudomonas aeruginosa. Sci Rep 5:17593
Fickett JW (1996) Quantitative discrimination of MEF2 sites. Mol Cell Biol 16(1):437–441
Figeys D (2002) Adapting arrays and lab-on-a-chip technology for proteomics. Proteomics 2(4):373–382
Figeys D (2003a) Novel approaches to map protein interactions. Curr Opin Biotechnol 14(1):119–125
Figeys D (2003b) Proteomics in 2002: a year of technical development and wide-ranging applications. Anal Chem 75(12):2891–2905
Fisher RA (1926) The arrangement of field experiments. J Minist Agric 33:503–513
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416
Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155:279–284
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512
Fong TC, Emerson BM (1992) The erythroid-specific protein cGATA-1 mediates distal enhancer activity through a specialized beta-globin TATA box. Genes Dev 6(4):521–532
Forde CE, McCutchen-Maloney SL (2002) Characterization of transcription factors by mass spectrometry and the role of SELDI-MS. Mass Spectrom Rev 21(6):419–439
Forrester WC, Epner E, Driscoll MC, Enver T, Brice M, Papayannopoulou T, Groudine M (1990) A deletion of the human beta-globin locus activation region causes a major alteration in chromatin structure and replication across the entire beta-globin locus. Genes Dev 4(10):1637–1649
Frank C, Makkonen H, Dunlop TW, Matilainen M, Vaisanen S, Carlberg C (2005) Identification of pregnane X receptor binding sites in the regulatory regions of genes involved in bile acid homeostasis. J Mol Biol 346(2):505–519
Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270(5235):397–403
Frederico LA, Kunkel TA, Shaw BR (1990) A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry (Mosc) 29(10):2532–2537
Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26(12):2941–2947
Frolova LY, Tsivkovskii RY, Sivolobova GF, Oparina NY, Serpinsky OI, Blinov VM, Tatkov SI, Kisselev LL (1999) Mutations in the highly conserved GGQ motif of class 1 polypeptide release factors abolish ability of human eRF1 to trigger peptidyl-tRNA hydrolysis. RNA 5(8):1014–1020
Frottin F, Martinez A, Peynot P, Mitra S, Holz RC, Giglione C, Meinnel T (2006) The proteomics of N-terminal methionine cleavage. Mol Cell Proteomics 5(12):2336–2349
Furukawa R, Hachiya T, Ohmomo H, Shiwa Y, Ono K, Suzuki S, Satoh M, Hitomi J, Sobue K, Shimizu A (2016) Intraindividual dynamics of transcriptome and genome-wide stability of DNA methylation. Sci Rep 6:26424
Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI (1999) A sampling of the yeast proteome. Mol Cell Biol 19(11):7357–7368
Gaasterland T, Bekiranov S (2000) Making the most of microarray data [news]. Nat Genet 24(3):204–206
Gallie DR, Tanguay R (1994) Poly(A) binds to initiation factors and increases cap-dependent translation in vitro. J Biol Chem 269(25):17166–17173
Gal-Mor O, Finlay BB (2006) Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell Microbiol 8(11):1707–1719
Galtier N, Lobry JR (1997) Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J Mol Evol 44(6):632–636
Gao L, Qi J (2007) Whole genome molecular phylogeny of large dsDNA viruses using composition vector method. BMC Evol Biol 7:41
Gapp K, Jawaid A, Sarkies P, Bohacek J, Pelczar P, Prados J, Farinelli L, Miska E, Mansuy IM (2014) Implication of sperm RNAs in transgenerational inheritance of the effects of early trauma in mice. Nat Neurosci 17(5):667–669
Gascuel O, Steel M (2006) Neighbor-joining revealed. Mol Biol Evol 23(11):1997–2000
Ge Y, Sealfon SC, Speed TP (2008) Some step-down procedures controlling the false discovery rate under dependence. Stat Sin 18(3):881–904
Geller AI, Rich A (1980) A UGA termination suppression tRNATrp active in rabbit reticulocytes. Nature 283(5742):41–46
Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, Weissman JS (2003) Global analysis of protein expression in yeast. Nature 425(6959):737–741
Gibbs JB (2000) Mechanism-based target identification and drug discovery in cancer research. Science 287(5460):1969–1973
Giglione C, Vallon O, Meinnel T (2003) Control of protein life-span by N-terminal methionine excision. EMBO J 22(1):13–23
Giglione C, Boularot A, Meinnel T (2004) Protein N-terminal methionine excision. Cell Mol Life Sci 61(12):1455–1474
Gilbert WV (2010) Alternative ways to think about cellular internal ribosome entry. J Biol Chem 285(38):29033–29038
Gilbert WV, Zhou K, Butler TK, Doudna JA (2007) Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317(5842):1224–1227
Gillespie JH (1991) The causes of molecular evolution. Oxford University Press, Oxford
Gojobori T, Li WH, Graur D (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18(5):360–369
Gonzalez B, Ceciliani F, Galizzi A (2003) Growth at low temperature suppresses readthrough of the UGA stop codon during the expression of Bacillus subtilis flgM gene in Escherichia coli. J Biotechnol 101(2):173–180
Gorodkin J, Heyer LJ, Brunak S, Stormo GD (1997) Displaying the information contents of structural RNA alignments: the structure logos. Comput Appl Biosci 13(6):583–586
Goto M, Washio T, Tomita M (2000) Causal analysis of CpG suppression in the Mycoplasma genome. Microb Comp Genomics 5(1):51–58
Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708
Gould SJ, Vrba ES (1982) Exaptation – a missing term in the science of form. Paleobiology 8:4–15
Gouy M (1987) Codon contexts in enterobacterial and coliphage genes. Mol Biol Evol 4(4):426–444
Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7064
Gowri-Shankar V, Rattray M (2007) A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Mol Biol Evol 24(6):1286–1299
Grahn AM, Butcher SJ, Bamford JKH, Bamford DH (2006) PRD1: dissecting the genome, structure and entry. In: Calendar R (ed) The bacteriophages. Oxford University Press, Oxford, pp 176–185
Gramm J, Niedermeier R (2002) Breakpoint medians and breakpoint phylogenies: a fixed-parameter approach. Bioinformatics 18(Suppl 2):S128–S139
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864
Graveley BR (2005) Mutually exclusive splicing of the insect Dscam pre-mRNA directed by competing intronic RNA secondary structures. Cell 123(1):65–73
Grech B, Maetschke S, Mathews S, Timms P (2007) Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol 158(8–9):685–693
Grigg GW (1996) Sequencing 5-methylcytosine residues by the bisulphite method. DNA Seq 6(4):189–198
Grigg G, Clark S (1994) Sequencing 5-methylcytosine residues in genomic DNA. BioEssays 16(6):431–436
Grosjean H, Marck C, de Crecy-Lagard V (2007) The various strategies of codon decoding in organisms of the three domains of life: evolutionary implications. Nucleic Acids Symp Ser (Oxf) 51:15–16
Grosjean H, de Crecy-Lagard V, Marck C (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584(2):252–264
Grossi de Sa MF, Standart N, Martins de Sa C, Akhayat O, Huesca M, Scherrer K (1988) The poly(A)-binding protein facilitates in vitro translation of poly(A)-rich mRNA. Eur J Biochem 176(3):521–526
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321
Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New York
Gupta SK, Kececioglu JD, Schaffer AA (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol 2(3):459–472
Gusfield D (1997) Algorithms on strings, trees, and sequences : computer science and computational biology. Cambridge University Press, Cambridge
Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19(3):1720–1730
Haas J, Park E-C, Seed B (1996) Codon usage limitation in the expression of HIV-1 envelope glycoprotein. Curr Biol 6(3):315–324
Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679
Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23(6):1089–1097
Hamajima N, Goto Y, Nishio K, Tanaka D, Kawai S, Sakakibara H, Kondo T (2004) Helicobacter pylori eradication as a preventive tool against gastric cancer. Asian Pac J Cancer Prev 5(3):246–252
Hanada K, Suzuki Y, Gojobori T (2004) A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes. Mol Biol Evol 21(6):1074–1080
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Hasegawa M, Kishino H (1989) Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Jpn J Genet 64(4):243–258
Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22(2):160–174
Haustead DJ, Stevenson A, Saxena V, Marriage F, Firth M, Silla R, Martin L, Adcroft KF, Rea S, Day PJ et al (2016) Transcriptome analysis of human ageing in male skin shows mid-life period of variability and central role of NF-kappaB. Sci Rep 6:26846
Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8(11):1154–1171
Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216
Hein J (1990) A unified approach to phylogenies and alignments. Methods Enzymol 183:625–644
Hein J (1994) TreeAlign. Methods Mol Biol 25:349–364
Hendy MD, Penny D (1982) Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci 60:133–142
Hendy MD, Penny D (1989) A framework for the quantitative study of evolutionary trees. Syst Zool 38:297–309
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919
Henz SR, Huson DH, Auch AF, Nieselt-Struwe K, Schuster SC (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335
Herman JL, Challis CJ, Novak A, Hein J, Schmidler SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 31(9):2251–2266
Hernández G (2008) Was the initiation of translation in early eukaryotes IRES-driven? Trends Biochem Sci 33(2):58
Hernandez G, Vazquez-Pianzola P, Sierra JM, Rivera-Pomar R (2004) Internal ribosome entry site drives cap-independent translation of reaper and heat shock protein 70 mRNAs in Drosophila embryos. RNA 10(11):1783–1797
Herniou EA, Luque T, Chen X, Vlak JM, Winstanley D, Cory JS, O’Reilly DR (2001) Use of whole genome sequence data to infer baculovirus phylogeny. J Virol 75(17):8117–8126
Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577
Hertz GZ, Hartzell GW 3rd, Stormo GD (1990) Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 6(2):81–92
Hertzberg L, Izraeli S, Domany E (2007) STOP: searching for transcription factor motifs using gene expression. Bioinformatics 23(14):1737–1743
Hiard S, Maree R, Colson S, Hoskisson PA, Titgemeyer F, van Wezel GP, Joris B, Wehenkel L, Rigali S (2007) PREDetector: a new tool to identify regulatory elements in bacterial genomes. Biochem Biophys Res Commun 357(4):861–864
Hickson RE, Simon C, Perrey SW (2000) The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol Biol Evol 17(4):530–539
Higashi K, Kashiwagi K, Taniguchi S, Terui Y, Yamamoto K, Ishihama A, Igarashi K (2006) Enhancement of +1 frameshift by polyamines during translation of polypeptide release factor 2 in Escherichia coli. J Biol Chem 281(14):9527–9537
Higgins DG (1994) CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol Biol 25:307–318
Higgs PG, Attwood TK (2005) Bioinformatics and molecular evolution. Blackwell, Malden
Higgs PG, Ran W (2008) Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon usage. Mol Biol Evol 25(11):2279–2291
Hiller K, Grote A, Scheer M, Munch R, Jahn D (2004) PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res 32(Web Server issue):W375–W379
Hirao I, Kimoto M (2010) Expansion of the genetic alphabet in nucleic acids by creating new base pairs. In: Mayer G (ed) The chemical biology of nucleic acids. Wiley, Chichester, pp 39–62
Hirsh D, Gold L (1971) Translation of the UGA triplet in vitro by tryptophan transfer RNA’s. J Mol Biol 58(2):459–468
Hirst JD, Sternberg MJ (1991) Prediction of ATP/GTP-binding motif: a comparison of a perceptron type neural network and a consensus sequence method [corrected]. Protein Eng 4(6):615–623
Hoagland MB, Stephenson ML, Scott JF, Hecht LI, Zamecnik PC (1958) A soluble ribonucleic acid intermediate in protein synthesis. J Biol Chem 231(1):241–257
Hobolth A, Christensen OF, Mailund T, Schierup MH (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3(2):e7
Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13):3429–3431
Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319(5):1059–1066
Hofer A, Steverding D, Chabes A, Brun R, Thelander L (2001) Trypanosoma brucei CTP synthetase: a target for the treatment of African sleeping sickness. Proc Natl Acad Sci U S A 98(11):6412–6416
Hogeweg P, Hesper aB (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186
Holmes I, Bruno WJ (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9):803–820
Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95(5):717–728 Transcriptomic data at http://web.wi.mit.edu/young/pub/data/orf_transcriptome.txt
Hou C, Zhao H, Tanimoto K, Dean A (2008) CTCF-dependent enhancer-blocking by alternative chromatin loop formation. Proc Natl Acad Sci U S A 105(51):20398–20403
Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8):721–728
Hudson RR (1992) Gene trees, species trees and the segregation of ancestral alleles. Genetics 131(2):509–513
Huelsenbeck JP, Larget B, Alfaro ME (2004) Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Mol Biol Evol 21(6):1123–1133
Hughes D (1987) Mutant forms of tufA and tufB independently suppress nonsense mutations. J Mol Biol 197(4):611–615
Hui A, de Boer HA (1987) Specialized ribosome system: preferential translation of a single mRNA species by a subpopulation of mutated ribosomes in Escherichia coli. Proc Natl Acad Sci U S A 84(14):4762–4766
Hunt RH (2004) Will eradication of Helicobacter pylori infection influence the risk of gastric cancer? Am J Med 117(Suppl 5A):86S–91S
Hurst LD, Merchant AR (2001) High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc R Soc Lond B 268:493–497
Huynen M, Dandekar T, Bork P (1998) Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett 426(1):1–5
Hwang S, Gou Z, Kuznetsov IB (2007) DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5):634–636
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119
Igarashi K, Kashiwagi K (2006) Polyamine Modulon in Escherichia coli: genes involved in the stimulation of cell growth by polyamines. J Biochem 139(1):11–16
Ikemura T (1981a) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol 146:1–21
Ikemura T (1981b) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E coli translational system. J Mol Biol 151:389–409
Ikemura T (1982) Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol 158(4):573–597
Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34
Ikemura T (1992) Correlation between codon usage and tRNA content in microorganisms. In: Hatfield DL, Lee BJ, Pirtle RM (eds) Transfer RNA in protein synthesis. CRC Press, Boca Raton, pp 87–111
Ilkow CS, Mancinelli V, Beatch MD, Hobman TC (2008) Rubella virus capsid protein interacts with poly(a)-binding protein and inhibits translation. J Virol 82(9):4284–4294
Ingolia NT (2010) Genome-wide translational profiling by ribosome footprinting. Methods Enzymol 470:119–142
Ingolia NT (2014) Ribosome profiling: new views of translation, from single codons to genome scale. Nat Rev Genet 15(3):205–213
Ingolia NT (2016) Ribosome footprint profiling of translation throughout the Genome. Cell 165(1):22–33
Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223
Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802
Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, Wills MR, Weissman JS (2014) Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep 8(5):1365–1379
Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178(4537):792–794
Ingram VM (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180(4581):326–328
Ingrosso D, Perna AF (2009) Epigenetics in hyperhomocysteinemic states. A special focus on uremia. Biochim Biophys Acta 1790(9):892–899
Ingrosso D, Cimmino A, Perna AF, Masella L, De Santo NG, De Bonis ML, Vacca M, D’Esposito M, D’Urso M, Galletti P et al (2003) Folate treatment and unbalanced methylation and changes of allelic expression induced by hyperhomocysteinaemia in patients with uraemia. Lancet 361(9370):1693–1699
Ink BS, Pickup DJ (1990) Vaccinia virus directs the synthesis of early mRNAs containing 5′ poly(A) sequences. Proc Natl Acad Sci U S A 87(4):1536–1540
Insinga A, Minucci S, Pelicci PG (2005a) Mechanisms of selective anticancer action of histone deacetylase inhibitors. Cell Cycle 4(6):741–743
Insinga A, Monestiroli S, Ronzoni S, Gelmetti V, Marchesi F, Viale A, Altucci L, Nervi C, Minucci S, Pelicci PG (2005b) Inhibitors of histone deacetylases induce tumor-selective apoptosis through activation of the death receptor pathway. Nat Med 11(1):71–76
Ito T, Bulger M, Pazin MJ, Kobayashi R, Kadonaga JT (1997) ACF, an ISWI-containing and ATP-utilizing chromatin assembly and remodeling factor. Cell 90(1):145–155
Ito K, Uno M, Nakamura Y (2000) A tripeptide ‘anticodon’ deciphers stop codons in messenger RNA. Nature 403(6770):680–684
Jackson RJ, Hellen CU, Pestova TV (2010) The mechanism of eukaryotic translation initiation and principles of its regulation. Nat Rev Mol Cell Biol 11(2):113–127
Jacob F (1982) The possible and the actual. University of Washington Press, Seattle, p 70
Jacob F (1988) The statue within: an autobiography. Basic Books, Inc., New York
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356
Jacobson A, Favreau M (1983) Possible involvement of poly(A) in protein synthesis. Nucleic Acids Res 11(18):6353–6368
James P, Quadroni M, Carafoli E, Gonnet G (1994) Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci 3(8):1347–1350
Jan E, Sarnow P (2002) Factorless ribosome assembly on the internal ribosome entry site of cricket paralysis virus. J Mol Biol 324(5):889–902
Jan E, Thompson SR, Wilson JE, Pestova TV, Hellen CU, Sarnow P (2001) Initiator Met-tRNA-independent translation mediated by an internal ribosome entry site element in cricket paralysis virus-like insect viruses. Cold Spring Harb Symp Quant Biol 66:285–292
Janin L, Schulz-Trieglaff O, Cox AJ (2014) BEETL-fastq: a searchable compressed archive for DNA reads. Bioinformatics 30(19):2796–2801
Jank P, Shindo-Okada N, Nishimura S, Gross HJ (1977) Rabbit liver tRNA1Val:I. Primary structure and unusual codon recognition. Nucleic Acids Res 4(6):1999–2008
Jayaswal V, Jermiin LS, Robinson J (2005) Estimation of phylogeny using a general markov model. Evol Bioinform Online 1:62–80
Jenkins GM, Holmes EC (2003) The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92(1):1–7
Jensen JL, Hein J (2005) Gibbs sampler for statistical multiple alignment. Stat Sin 15:889–907
Jia W, Higgs PG (2008) Codon usage in mitochondrial genomes: distinguishing context-dependent mutation from translational selection. Mol Biol Evol 25(2):339–351
Jin P, Alisch RS, Warren ST (2004a) RNA and microRNAs in fragile X mental retardation. Nat Cell Biol 6(11):1048–1053
Jin VX, Leu YW, Liyanarachchi S, Sun H, Fan M, Nephew KP, Huang TH, Davuluri RV (2004b) Identifying estrogen receptor alpha target genes using integrated computational genomics and chromatin immunoprecipitation microarray. Nucleic Acids Res 32(22):6627–6635
Jin VX, O’Geen H, Iyengar S, Green R, Farnham PJ (2007) Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches. Genome Res 17(6):807–817
Johnston TC, Parker J (1985) Streptomycin-induced, third-position misreading of the genetic code. J Mol Biol 181(2):313–315
Johnston TC, Borgia PT, Parker J (1984) Codon specificity of starvation induced misreading. Mol Gen Genet MGG 195(3):459–465
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282
Jorgensen F, Adamski FM, Tate WP, Kurland CG (1993) Release factor-dependent false stops are infrequent in Escherichia coli. J Mol Biol 230(1):41–50
Josse J, Kaiser AD, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid VII. Frequencies of nearest neighbor base-sequences in deoxyribonucleic acid. J Biol Chem 236:864–875
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123
Kaishima M, Ishii J, Matsuno T, Fukuda N, Kondo A (2016) Expression of varied GFPs in Saccharomyces cerevisiae: codon optimization yields stronger than expected expression and fluorescence intensity. Sci Rep 6:35932
Kamalakaran S, Radhakrishnan SK, Beck WT (2005) Identification of estrogen-responsive genes using a genome-wide analysis of promoter elements for transcription factor binding sites. J Biol Chem 280(22):21491–21497
Kanehisa M (2013) Molecular network analysis of diseases and drugs in KEGG. Methods Mol Biol 939:263–275
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462
Kaneko T, Tanaka A, Sato S, Kotani H, Sazuka T, Miyajima N, Sugiura M, Tabata S (1995) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res 2(4):153–166 191-8
Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S et al (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3(3):109–136
Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. TIG 11(7):283–290
Katsafanas GC, Moss B (2007a) Colocalization of transcription and translation within cytoplasmic poxvirus factories coordinates viral expression and subjugates host functions. Cell Host Microbe 2(4):221
Karlin S, Mrazek J (1996) What drives codon choices in human genes. J Mol Biol 262:459–472
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795
Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9(4):286–298
Katoh K, Toh H (2010) Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 26(15):1899–1900
Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518
Katoh K, Asimenos G, Toh H (2009) Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol 537:39–64
Katsafanas GC, Moss B (2007b) Colocalization of transcription and translation within cytoplasmic poxvirus factories coordinates viral expression and subjugates host functions. Cell Host Microbe 2(4):221
Kawashima T, Douglass S, Gabunilas J, Pellegrini M, Chanfreau GF (2014) Widespread use of non-productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genet 10(4):e1004249
Kazan K (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci 8(10):468–471
Keeling PJ, Doolittle WF (1996) A non-canonical genetic code in an early diverging eukaryotic lineage. EMBO J 15(9):2285–2290
Kersulyte D, Chalkauskas H, Berg DE (1999) Emergence of recombinant strains of Helicobacter pylori during human infection. Mol Microbiol 31(1):31–43
Kim H, Park H (2004) Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Proteins 54(3):557–562
Kim DW, Lee KH, Lee D (2005) Detecting clusters of different geometrical shapes in microarray gene expression data. Bioinformatics 21(9):1927–1934
Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626
Kimura M (1977) Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge
Kimura M, Ohta T (1972) On the stochastic model for estimation of mutational distance between homologous proteins. J Mol Evol 2:87–90
King MC, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798
Kingsford C, Patro R (2015) Reference-based compression of short-read sequences using path encoding. Bioinformatics 31(12):1920–1928
Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306(5944):662–666
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Methods Enzymol 183:550–570
Kjer KM (1995) Use of ribosomal-RNA secondary structure in phylogenetic studies to identify homologous positions – an example of alignment and data presentation from the frogs. Mol Phylogenet Evol 4(3):314–330
Kliman RM, Bernal CA (2005) Unusual usage of AGG and TTG codons in humans and their viruses. Gene 352:92
Kobayashi H, Akitomi J, Fujii N, Kobayashi K, Altaf-Ul-Amin M, Kurokawa K, Ogasawara N, Kanaya S (2007) The entire organization of transcription units on the Bacillus subtilis genome. BMC Genomics 8:197
Kodama Y, Shumway M, Leinonen R (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40(Database issue):D54–D56
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Komar AA, Hatzoglou M (2005) Internal ribosome entry sites in cellular mRNAs: mystery of their existence. J Biol Chem 280(25):23425–23428
Korenke GC, Fuchs S, Krasemann E, Doerr HG, Wilichowski E, Hunneman DH, Hanefeld F (1996) Cerebral adrenoleukodystrophy (ALD) in only one of monozygotic twins with an identical ALD genotype. Ann Neurol 40(2):254–257
Korkmaz G, Holm M, Wiens T, Sanyal S (2014) Comprehensive analysis of stop codon usage in bacteria and its correlation with release factor abundance. J Biol Chem 289(44):30334–30342
Kornblihtt AR (2005) Promoter usage and alternative splicing. Curr Opin Cell Biol 17(3):262–268
Kozak M (1978) How do eucaryotic ribosomes select initiation regions in messenger RNA? Cell 15(4):1109–1123
Kozak M (1980a) Evaluation of the “scanning model” for initiation of protein synthesis in eucaryotes. Cell 22(1 Pt 1):7–8
Kozak M (1980b) Influence of mRNA secondary structure on binding and migration of 40S ribosomal subunits. Cell 19(1):79–90
Kozak M (1981) Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res 9(20):5233–5252
Kozak M (1986) Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44(2):283–292
Kozak M (1991) Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr 1(2):117–125
Kozak M (1997) Recognition of AUG and alternative initiator codons is augmented by G in position +4 but is not generally affected by the nucleotides in positions +5 and +6. EMBO J 16(9):2482–2492
Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes. Gene 234(2):187–208
Kozak M (2005) A second look at cellular mRNA sequences said to function as internal ribosome entry sites. Nucleic Acids Res 33(20):6593–6602
Kozak M (2007) Some thoughts about translational regulation: forward and backward glances. J Cell Biochem 102(2):280–290
Krasemann EW, Meier V, Korenke GC, Hunneman DH, Hanefeld F (1996) Identification of mutations in the ALD-gene of 20 families with adrenoleukodystrophy/adrenomyeloneuropathy. Hum Genet 97(2):194–197
Kreutzer DA, Essigmann JM (1998) Oxidized, deaminated cytosines are a source of C --> T transitions in vivo. Proc Natl Acad Sci U S A 95(7):3578–3582
Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22(22):4768–4778
Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in escherichia coli. Science 324(5924):255–258
Kullback S (1959) Information theory and statistics. Wiley, New York
Kullback S (1987) The Kullback-Leibler distance. Am Stat 41:340–341
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Kumar S, Filipski A (2007) Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res 17(2):127–135
Kumar KK, Shelokar PS (2008) An SVM method using evolutionary information for the identification of allergenic proteins. Bioinformation 2(6):253–256
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874
Kungulovski G, Jeltsch A (2016) Epigenome editing: state of the art, concepts, and perspectives. Trends Genet 32(2):101–113
Kurland CG (1987) Strategies for efficiency and accuracy in gene expression. Trends Biochem Sci 12:126
Kutlar A (2007) Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31(2):209–224
Kuznetsov IB, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64(1):19–27
Kypr J, Mrazek JAN (1987) Unusual codon usage of HIV. Nature 327(6117):20
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132
Lacerda R, Menezes J, Romao L (2016) More than just scanning: the importance of cap-independent mRNA translation initiation for cellular stress response and cancer. Cell Mol Life Sci 74(9):1659–1680
Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nat Biotechnol 227:680–685
Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A 91:1455–1459
Lamendola DE, Duan Z, Yusuf RZ, Seiden MV (2003) Molecular description of evolving paclitaxel resistance in the SKOV-3 human ovarian carcinoma cell line. Cancer Res 63(9):2200–2205
Lamond AI (1988) RNA editing and the mysterious undercover genes of trypanosomatid mitochondria. Trends Biochem Sci 13(8):283–284
Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20(1):86–93
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Lang BF, Burger G, O’Kelly CJ, Cedergren R, Golding GB, Lemieux C, Sankoff D, Turmel M, Gray MW (1997) An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387(6632):493–497
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009a) Searching for SNPs with cloud computing. Genome Biol 10(11):R134
Langmead B, Trapnell C, Pop M, Salzberg SL (2009b) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25
Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11(8):R83
Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214
Lee C, Wang Q (2005) Bioinformatics analysis of alternative splicing. Brief Bioinform 6(1):23–33
Leinonen R, Sugawara H, Shumway M (2011) The sequence read archive. Nucleic Acids Res 39(Database):D19–D21
Lemay DG, Hwang DH (2006) Genome-wide identification of peroxisome proliferator response elements using integrated computational genomics. J Lipid Res 47(7):1583–1587
Lesk AM (2004) Introduction to protein science: architecture, function and genomics. Oxford University Press, New York
Li CC (1976) First course in population genetics. The Boxwood Press, Pacific Grove
Li W-H (1983) Evolution of duplicate genes and pseudogenes. Sinauer, Sunderland
Li W-H (1997) Molecular evolution. Sinauer, Sunderland
Li X, Chang YH (1995) Amino-terminal protein processing in Saccharomyces cerevisiae is an essential function that requires two distinct methionine aminopeptidases. Proc Natl Acad Sci U S A 92(26):12357–12361
Li GL, Leong TY (2005) Feature selection for the prediction of translation initiation sites. Genomics Proteomics Bioinformatics 3(2):73–83
Li W-H, Tanimura M (1987) The molecular clock runs more slowly in man than in apes and monkeys. Nature 326:93–96
Li WH, Wu CI (1987) Rates of nucleotide substitution are evidently higher in rodents than in man. Mol Biol Evol 4(1):74–82
Li WH, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292(5820):237–239
Li W-H, Wolfe KH, Sourdis J, Sharp PM (1987) Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. Cold Spring Harb Symp Quant Biol 52:847–856
Li F, Ge P, Hui WH, Atanasov I, Rogers K, Guo Q, Osato D, Falick AM, Zhou ZH, Simpson L (2009) Structure of the core editing complex (L-complex) involved in uridine insertion/deletion RNA editing in trypanosomatid mitochondria. Proc Natl Acad Sci U S A 106(30):12306–12310
Liang KC, Wang X, Anastassiou D (2008) A profile-based deterministic sequential Monte Carlo algorithm for motif discovery. Bioinformatics 24(1):46–55
Liberman N, Gandin V, Svitkin YV, David M, Virgili G, Jaramillo M, Holcik M, Nagar B, Kimchi A, Sonenberg N (2015) DAP5 associates with eIF2beta and eIF4AI to promote Internal Ribosome Entry Site driven translation. Nucleic Acids Res 43(7):3764–3775
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293
Liebler DC, TBDC L III., fb JRY, Publisher : c (2002) Introduction to proteomics: tools for the new biology. Humana Press, Totowa
Liljenstrom H, von Heijne G (1987) Translation rate modification by preferential codon usage: intragenic position effects. J Theor Biol 124(1):43–55
Lim VI (1994) Analysis of action of wobble nucleoside modifications on codon-anticodon pairing within the ribosome. J Mol Biol 240(1):8–19
Lin JP, Aker M, Sitney KC, Mortimer RK (1986) First position wobble in codon-anticodon pairing: amber suppression by a yeast glutamine tRNA. Gene 49(3):383–388
Lin HC, Tsai K, Chang BL, Liu J, Young M, Hsu W, Louie S, Nicholas HB Jr, Rosenquist GL (2003) Prediction of tyrosine sulfation sites in animal viruses. Biochem Biophys Res Commun 312(4):1154–1158
Lin GN, Cai Z, Lin G, Chakraborty S, Xu D (2009) ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinform 10(Suppl 1):S5
Lindahl T (1993) Instability and decay of the primary structure of DNA. Nature 362:709–715
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci U S A 86(12):4412–4415
Lipscombe D (2005) Neuronal proteins custom designed by alternative splicing. Curr Opin Neurobiol 15(3):358–363
Lithwick G, Margalit H (2005) Relative predicted protein levels of functionally associated proteins are conserved across organisms. Nucleic Acids Res 33(3):1051–1057
Liu J, Louie S, Hsu W, Yu KM, Nicholas HB Jr, Rosenquist GL (2008) Tyrosine sulfation is prevalent in human chemokine receptors important in lung disease. Am J Respir Cell Mol Biol 38(6):738–743
Liu X, Jiang H, Gu Z, Roberts JW (2013) High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A 110(29):11928–11933
Livesey R (2002) Have microarrays failed to deliver for developmental biology? Genome Biol 3(9):comment2009
Lobry JR (1996) Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol 13(5):660–665
Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612
Lodish HF, Nathan DG (1972) Regulation of hemoglobin synthesis. Preferential inhibition of and globin synthesis. J Biol Chem 247(23):7822–7829
Lopez P, Philippe H, Myllykallio H, Forterre P (1999) Identification of putative chromosomal origins of replication in Archaea. Mol Microbiol 32(4):883–886
Lowry JA, Atchley WR (2000) Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J Mol Evol 50(2):103–115
Lu C, Bablanian R (1996) Characterization of small nontranslated polyadenylylated RNAs in vaccinia virus-infected cells. Proc Natl Acad Sci U S A 93(5):2037–2042
Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 18(2):298–309
Lustig F, Boren T, Guindy YS, Elias P, Samuelsson T, Gehrke CW, Kuo KC, Lagerkvist U (1989) Codon discrimination and anticodon structural context. Proc Natl Acad Sci U S A 86(18):6873–6877
Ma B, Nussinov R (2004) Release factors eRF1 and RF2: a universal mechanism controls the large conformational changes. J Biol Chem 279(51):53875–53885
Ma P, Xia X (2011) Factors affecting splicing strength of yeast genes. Comp Funct Genomics:Article ID 212146, 13 pages
Ma S, Musa T, Bag J (2006) Reduced stability of mitogen-activated protein kinase kinase-2 mRNA and phosphorylation of poly(A)-binding protein (PABP) in cells overexpressing PABP. J Biol Chem 281(6):3145–3156
MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R et al (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics 3(5):478–489
Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry GA (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene 15(9):1079–1085
Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458(7234):97–101
Mannella CA, Neuwald AF, Lawrence CE (1996) Detection of likely transmembrane beta strand regions in sequences of mitochondrial pore proteins using the Gibbs sampler. J Bioenerg Biomembr 28(2):163–169
Marck C, Grosjean H (2002) tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 8(10):1189–1232
Marin A, Xia X (2008) GC skew in protein-coding genes between the leading and lagging strands in bacterial genomes: new substitution models incorporating strand bias. J Theor Biol 253(3):508–513
Martinez MA, Vartanian J-P, Simon W-H (1994) Hypermutagenesis of RNA using human immunodeficiency virus type 1 reverse transcriptase and biased dNTP concentrations. Proc Natl Acad Sci U S A 91(25):11787–11791
Matin A, Zychlinsky E, Keyhan M, Sachs G (1996) Capacity of Helicobacter pylori to generate ionic gradients at low pH is similar to that of bacteria which grow under strongly acidic conditions. Infect Immun 64(4):1434–1436
McNulty DE, Claffee BA, Huddleston MJ, Porter ML, Cavnar KM, Kane JF (2003) Mistranslational errors associated with the rare arginine codon CGG in Escherichia coli. Protein Expr Purif 27(2):365–374
McPherson DT (1988) Codon preference reflects mistranslational constraints: a proposal. Nucleic Acids Res 16(9):4111–4120
Medawar PB, Medawar JS (1983) Aristotle to zoos: a philosophical dictionary of biology. Harvard University Press, Cambridge, MA
Meinnel T, Mechulam Y, Blanquet S (1993) Methionine as translation start signal: a review of the enzymes of the pathway in Escherichia coli. Biochimie 75(12):1061–1075
Melo EO, de Melo Neto OP, Martins de Sa C (2003a) Adenosine-rich elements present in the 5′-untranslated region of PABP mRNA can selectively reduce the abundance and translation of CAT mRNAs in vivo. FEBS Lett 546(2–3):329–334
Melo EO, Dhalia R, Martins de Sa C, Standart N, de Melo Neto OP (2003b) Identification of a C-terminal poly(A)-binding protein (PABP)-PABP interaction domain: role in cooperative binding to poly (A) and efficient cap distal translational repression. J Biol Chem 278(47):46357–46368
Menaker RJ, Sharaf AA, Jones NL (2004) Helicobacter pylori infection and gastric cancer: host, bug, environment, or all three? Curr Gastroenterol Rep 6(6):429–435
Mendz GL, Hazell SL (1996) The urea cycle of Helicobacter pylori. Microbiology 142(Pt 10):2959–2967
Meng SY, Hui JO, Haniu M, Tsai LB (1995) Analysis of translational termination of recombinant human methionyl-neurotrophin 3 in Escherichia coli. Biochem Biophys Res Commun 211(1):40–48
Metropolis N (1987) The beginnning of the Monte Carlo method. Los Alamos Sci 15(Special issue):125–130
Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32(2):776–783
Miller JH, Albertini AM (1983) Effects of surrounding sequence on the suppression of nonsense codons. J Mol Biol 164(1):59–71
Miller CG, Kukral AM, Miller JL, Movva NR (1989) pepM is an essential gene in Salmonella typhimurium. J Bacteriol 171(9):5215–5217
Milman G, Goldstein J, Scolnick E, Caskey T (1969) Peptide chain termination. 3. Stimulation of in vitro termination. Proc Natl Acad Sci U S A 63(1):183–190
Min Jou W, Haegeman G, Ysebaert M, Fiers W (1972) Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature 237(5350):82–88
Minakshi R, Padhan K, Rani M, Khan N, Ahmad F, Jameel S (2009) The SARS coronavirus 3a protein causes endoplasmic reticulum stress and induces ligand-independent downregulation of the type 1 interferon receptor. PLoS One 4(12):e8342
Mine T, Muraoka H, Saika T, Kobayashi I (2005) Characteristics of a clinical isolate of urease-negative Helicobacter pylori and its ability to induce gastric ulcers in Mongolian gerbils. Helicobacter 10(2):125–131
Mitra SK, Lustig F, Akesson B, Lagerkvist U (1977) Codon-acticodon recognition in the valine codon family. J Biol Chem 252(2):471–478
Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T (2006) A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci 103(47):17846–17851
Miyata T, Yasunaga T (1980) Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol 16(1):23–36
Miyata T, Miyazawa S, Yasunaga T (1979) Two types of amino acid substitutions in protein evolution. J Mol Evol 12(3):219–236
Mlera L, Lam J, Offerdahl DK, Martens C, Sturdevant D, Turner CV, Porcella SF, Bloom ME (2016) Transcriptome analysis reveals a signature profile for tick-borne Flavivirus persistence in HEK 293T cells. MBio 7(3):e00314–e00316
Mobley HL, Hu LT, Foxal PA (1991) Helicobacter pylori urease: properties and role in pathogenesis. Scand J Gastroenterol 187(Supplement):39–46
Moerschell RP, Hosokawa Y, Tsunasawa S, Sherman F (1990) The specificities of yeast methionine aminopeptidase and acetylation of amino-terminal methionine in vivo. Processing of altered iso-1-cytochromes c created by oligonucleotide transformation. J Biol Chem 265(32):19638–19643
Moffat JG, Rudolph J, Bailey D (2014) Phenotypic screening in cancer drug discovery – past, present and future. Nat Rev Drug Discov 13(8):588–602
Moi P, Loudianos G, Lavinha J, Murru S, Cossu P, Casu R, Oggiano L, Longinotti M, Cao A, Pirastu M (1992) Delta-thalassemia due to a mutation in an erythroid-specific binding protein sequence 3′ to the delta-globin gene. Blood 79(2):512–516
Monteiro PT, Mendes ND, Teixeira MC, d’Orey S, Tenreiro S, Mira NP, Pais H, Francisco AP, Carvalho AM, Lourenco AB et al (2008) YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res 36(Database issue):D132–D136
Mora L, Heurgue-Hamard V, Champ S, Ehrenberg M, Kisselev LL, Buckingham RH (2003) The essential role of the invariant GGQ motif in the function and stability in vivo of bacterial release factors RF1 and RF2. Mol Microbiol 47(1):267–275
Mora L, Heurgue-Hamard V, de Zamaroczy M, Kervestin S, Buckingham RH (2007) Methylation of bacterial release factors RF1 and RF2 is required for normal translation termination in vivo. J Biol Chem 282(49):35638–35645
Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M (2008a) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 45(1):81–94
Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, Zhao Y, McDonald H, Zeng T, Hirst M et al (2008b) Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res 18(4):610–621
Morita M, Shimozawa N, Kashiwayama Y, Suzuki Y, Imanaka T (2011) ABC subfamily D proteins and very long chain fatty acid metabolism as novel targets in adrenoleukodystrophy. Curr Drug Targets 12(5):694–706
Moriyama EN, Powell JR (1997) Codon usage bias and tRNA abundance in Drosophila. J Mol Evol 45(5):514–523
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628
Mottagui-Tabar S, Isaksson LA (1997) Only the last amino acids in the nascent peptide influence translation termination in Escherichia coli genes. FEBS Lett 414(1):165–170
Moult J, Hubbard T, Fidelis K, Pedersen JT (1999) Critical assessment of methods of protein structure prediction (CASP): round III. Proteins 37(Suppl 3):2–6
Muller HJ, Altenburg E (1930) The frequency of translocations produced by X-rays in Drosophila. Genetics 15(4):283–311
Murphy J, Mahony J, Ainsworth S, Nauta A, van Sinderen D (2013) Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl Environ Microbiol 79(24):7547–7555
Murtagh F (1984) Complexities of hierarchic clustering algorithms: state of the art. Comput Stat Q 1:101–113
Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A 84:166–169
Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics 156(1):297–304
Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–36
Nakamoto T (2006) A unified view of the initiation of protein synthesis. Biochem Biophys Res Commun 341(3):675–678
Nakamura Y, Ito K, Matsumura K, Kawazu Y, Ebihara K (1995) Regulation of translation termination: conserved structural motifs in bacterial and eukaryotic polypeptide release factors. Biochem Cell Biol 73(11–12):1113–1122
Nakamura Y, Ito K, Isaksson LA (1996) Emerging understanding of translation termination. Cell 87(2):147–150
Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28(1):292
Nakashima H, Fukuchi S, Nishikawa K (2003) Compositional changes in RNA, DNA and proteins for bacterial adaptation to higher and lower temperatures. J Biochem (Tokyo) 133(4):507–513
Nasvall SJ, Chen P, Bjork GR (2007) The wobble hypothesis revisited: uridine-5-oxyacetic acid is critical for reading of G-ending codons. RNA 13(12):2151–2164
Needleman SB, Wunsch CD (1970) A general method applicable to the search of similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Nei M (1996) Phylogenetic analysis in molecular evolutionary genetics. Annu Rev Genet 30:371–403
Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, New York
Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632
Ngumbela KC, Ryan KP, Sivamurthy R, Brockman MA, Gandhi RT, Bhardwaj N, Kavanagh DG (2008) Quantitative effect of suboptimal codon usage on translational efficiency of mRNA encoding HIV-1 gag in intact T cells. PLoS One 3(6):e2356
Nicholas HB Jr, Chan SS, Rosenquist GL (1999) Reevaluation of the determinants of tyrosine sulfation. Endocrine 11(3):285–292
Nichols T, Hayasaka S (2003) Controlling the familywise error rate in functional neuroimaging: a comparative review. Stat Meth Med Res 12(5):419–446
Nicolae M, Pathak S, Rajasekaran S (2015) LFQC: a lossless compression algorithm for FASTQ files. Bioinformatics 31(20):3276–3281
Nishimura S, Takahashi S, Kuroha T, Suwabe N, Nagasawa T, Trainor C, Yamamoto M (2000) A GATA box in the GATA-1 gene hematopoietic enhancer is a critical element in the network of GATA factors and sites that regulate this gene. Mol Cell Biol 20(2):713–723
Nissen P, Kjeldgaard M, Thirup S, Polekhina G, Reshetnikova L, Clark BF, Nyborg J (1995) Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog. Science 270(5241):1464–1472
Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM (2008) Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med 359(24):2619–2620
Noedl H, Socheat D, Satimai W (2009) Artemisinin-resistant malaria in Asia. N Engl J Med 361(5):540–541
Noedl H, Se Y, Sriwichai S, Schaecher K, Teja-Isavadharm P, Smith B, Rutvisuttinunt W, Bethell D, Surasri S, Fukuda MM et al (2010) Artemisinin resistance in Cambodia: a clinical trial designed to address an emerging problem in Southeast Asia. Clin Infect Dis 51(11):e82–e89
Nomenclature Committee of the International Union of Biochemistry (1985) Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Eur J Biochem 150:1–5
Notredame C, O’Brien EA, Higgins DG (1997) RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Res 25(22):4570–4580
Numanagic I, Bonfield JK, Hach F, Voges J, Ostermann J, Alberti C, Mattavelli M, Sahinalp SC (2016) Comparison of high-throughput sequencing data compression tools. Nat Methods 13(12):1005–1008
Nur I, Szyf M, Razin A, Glaser G, Rottem S, Razin S (1985) Procaryotic and eucaryotic traits of DNA methylation in spiroplasmas (mycoplasmas). J Bacteriol 164(1):19–24
Nussinov R (1984) Doublet frequencies in evolutionary distinct groups. Nucleic Acids Res 12(3):1749–1763
O’Brien JD, She ZS, Suchard MA (2008) Dating the time of viral subtype divergence. BMC Evol Biol 8:172
Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641
Ohta T, Gray TA, Rogan PK, Buiting K, Gabriel JM, Saitoh S, Muralidhar B, Bilienska B, Krajewska-Walasek M, Driscoll DJ et al (1999) Imprinting-mutation mechanisms in Prader-Willi syndrome. Am J Hum Genet 64(2):397–413
Ordway JM, Fenster SD, Ruan H, Curran T (2005) A transcriptome map of cellular transformation by the fos oncogene. Mol Cancer 4(1):19
Orkin SH (1990) Globin gene regulation and switching: circa 1990. Cell 63(4):665–672
Orkin SH (1992) GATA-binding transcription factors in hematopoietic cells. Blood 80(3):575–581
Osawa S, Jukes TH, Muto A, Yamao F, Ohama T, Andachi Y (1987) Role of directional mutation pressure in the evolution of the eubacterial genetic code. Cold Spring Harb Symp Quant Biol 52:777–789
Osterman IA, Evfratov SA, Sergiev PV, Dontsova OA (2013) Comparison of mRNA features affecting translation initiation and reinitiation. Nucleic Acids Res 41(1):474–486
Ostrin EJ, Li Y, Hoffman K, Liu J, Wang K, Zhang L, Mardon G, Chen R (2006) Genome-wide identification of direct targets of the Drosophila retinal determination protein Eyeless. Genome Res 16(4):466–476
Ota S, Li WH (2000) NJML: a hybrid algorithm for the neighbor-joining and maximum-likelihood methods. Mol Biol Evol 17(9):1401–1409
Ota S, Li WH (2001) NJML+: an extension of the NJML method to handle protein sequence data and computer software implementation. Mol Biol Evol 18(11):1983–1992
Otu HH, Sayood K (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16):2122–2130
Palidwor GA, Perkins TJ, Xia X (2010) A general model of codon bias due to GC mutational bias. PLoS One 5(10):e13431
Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W (2003) The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35(2):190–194
Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, Komorowski J, Nagano T, Mancini-Dinardo D, Kanduri C (2008) Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32(2):232–246
Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3(6):327–332
Park SY, Cromie MJ, Lee EJ, Groisman EA (2010) A bacterial mRNA leader that employs different mechanisms to sense disparate intracellular signals. Cell 142(5):737–748
Parker J (1989) Errors and alternatives in reading the universal genetic code. Microbiol Rev 53(3):273–298
Patel GP, Bag J (2006) IMP1 interacts with poly(A)-binding protein (PABP) and the autoregulatory translational control element of PABP-mRNA through the KH III-IV domain. FEBS J 273(24):5678–5690
Patel GP, Ma S, Bag J (2005) The autoregulatory translational control element of poly(A)-binding protein mRNA forms a heteromeric ribonucleoprotein complex. Nucleic Acids Res 33(22):7074–7089
Pauling L, Itano HA, Singer SJ, Wells IC (1949) Sickle cell anemia a molecular disease. Science 110(2865):543–548
Pazin MJ, Kamakaka RT, Kadonaga JT (1994) ATP-dependent nucleosome reconfiguration and transcriptional activation from preassembled chromatin templates. Science 266(5193):2007–2011
Pazin MJ, Sheridan PL, Cannon K, Cao Z, Keck JG, Kadonaga JT, Jones KA (1996) NF-kappa B-mediated chromatin reconfiguration and transcriptional activation of the HIV-1 enhancer in vitro. Genes Dev 10(1):37–49
Pazin MJ, Hermann JW, Kadonaga JT (1998) Promoter structure and transcriptional activation with chromatin templates assembled in vitro. A single Gal4-VP16 dimer binds to chromatin or to DNA with comparable affinity. J Biol Chem 273(51):34653–34660
Peabody MA, Laird MR, Vlasschaert C, Lo R, Brinkman FS (2016) PSORTdb: expanding the bacteria and archaea protein subcellular localization database to better reflect diversity in cell envelope structures. Nucleic Acids Res 44(D1):D663–D668
Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98
Pearson WR (1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 24:307–331
Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444–2448
Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36(7):2295–2300
Percudani R, Pavesi A, Ottonello S (1997) Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J Mol Biol 268(2):322–330
Pereira SL, Baker AJ (2006) A mitogenomic timescale for birds detects variable phylogenetic rates of molecular evolution and refutes the standard molecular clock. Mol Biol Evol 23(9):1731–1740
Pestova TV, Shatsky IN, Fletcher SP, Jackson RJ, Hellen CU (1998) A prokaryotic-like mode of cytoplasmic eukaryotic ribosome binding to the initiation codon during internal translation initiation of hepatitis C and classical swine fever virus RNAs. Genes Dev 12(1):67–83
Pestova TV, Lomakin IB, Hellen CU (2004) Position of the CrPV IRES on the 40S subunit and factor dependence of IRES/80S ribosome assembly. EMBO Rep 5(9):906–913
Petronis A (2004) The origin of schizophrenia: genetic thesis, epigenetic antithesis, and resolving synthesis. Biol Psychiatry 55(10):965–970
Petronis A (2006) Epigenetics and twins: three variations on the theme. Trends Genet 22(7):347–350
Petronis A, Gottesman II, Kan P, Kennedy JL, Basile VS, Paterson AD, Popendikyte V (2003) Monozygotic twins exhibit numerous epigenetic differences: clues to twin discordance? Schizophr Bull 29(1):169–178
Petrullo LA, Gallagher PJ, Elseviers D (1983) The role of 2-methylthio-N6-isopentenyladenosine in readthrough and suppression of nonsense codons in Escherichia coli. Mol Gen Genet 190(2):289–294
Petry S, Brodersen DE, FVt M, Dunham CM, Selmer M, Tarry MJ, Kelley AC, Ramakrishnan V (2005) Crystal structures of the ribosome in complex with release factors RF1 and RF2 bound to a cognate stop codon. Cell 123(7):1255–1266
Pevzner PA (2000) Computational molecular biology: an algorithmic approach. The MIT Press, Cambridge, MA
Pielou EC (1984) The interpretation of ecological data: a primer on classification and ordination. Wiley, New York
Pietras K, Sjoblom T, Rubin K, Heldin CH, Ostman A (2003) PDGF receptors as cancer drug targets. Cancer Cell 3(5):439–443
Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Springer, Berlin/Heidelberg
Pleiss JA, Whitworth GB, Bergkessel M, Guthrie C (2007) Rapid, transcript-specific changes in splicing in response to environmental stress. Mol Cell 27(6):928–937
Pobre V, Arraiano CM (2015) Next generation sequencing analysis reveals that the ribonucleases RNase II, RNase R and PNPase affect bacterial motility and biofilm formation in E. coli. BMC Genomics 16:72
Poole ES, Brown CM, Tate WP (1995) The identity of the base following the stop codon determines the efficiency of in vivo translational termination in Escherichia coli. EMBO J 14(1):151–158
Poole ES, Major LL, Mannering SA, Tate WP (1998) Translational termination in Escherichia coli: three bases following the stop codon crosslink to release factor 2 and affect the decoding efficiency of UGA-containing signals. Nucleic Acids Res 26(4):954–960
Popa A, Lebrigand K, Barbry P, Waldmann R (2016) Pateamine A-sensitive ribosome profiling reveals the scope of translation in mouse embryonic stem cells. BMC Genomics 17:52
Poulos MG, Batra R, Charizanis K, Swanson MS (2011) Developments in RNA splicing and disease. Cold Spring Harb Perspect Biol 3(1):a000778
Povolotskaya IS, Kondrashov FA, Ledda A, Vlasov PK (2012) Stop codons in bacteria are not selectively equivalent. Biol Direct 7:30
Prabhakaran R, Chithambaram S, Xia X (2015) Escherichia coli and Staphylococcus phages: effect of translation initiation efficiency on differential codon adaptation mediated by virulent and temperate lifestyles. J Gen Virol 96(Pt 5):1169–1179
Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, Laxman B, Asangani IA, Grasso CS, Kominsky HD et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29(8):742–749
Press WH, Teukolsky SA, Tetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientifi computing. Cambridge University Press, Cambridge
Prival MJ (1996) Isolation of glutamate-inserting ochre suppressor mutants of Salmonella typhimurium and Escherichia coli. J Bacteriol 178(10):2989–2990
Ptashne M (1986) A genetic switch: gene control and phage lambda. Cell Press and Blackwell Scientific, Cambridge, MA
Pure GA, Robinson GW, Naumovski L, Friedberg EC (1985) Partial suppression of an ochre mutation in Saccharomyces cerevisiae by multicopy plasmids containing a normal yeast tRNAGln gene. J Mol Biol 183(1):31–42
Pyronnet S, Pradayrol L, Sonenberg N (2000) A cell cycle-dependent internal ribosome entry site. Mol Cell 5(4):607–616
Qin ZS, McCue LA, Thompson W, Mayerhofer L, Lawrence CE, Liu JS (2003) Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 21(4):435–439
Qu K, McCue LA, Lawrence CE (1998) Bayesian protein family classifier. Proc Int Conf Intell Syst Mol Biol 6:131–139
Raaum RL, Sterner KN, Noviello CM, Stewart C-B, Disotell TR (2005) Catarrhine primate divergence dates estimated from complete mitochondrial genomes: concordance with fossil and nuclear DNA evidence. J Hum Evol 48(3):237
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Rahi SJ, Pecani K, Ondracka A, Oikonomou C, Cross FR (2016) The CDK-APC/C oscillator predominantly entrains periodic cell-cycle transcription. Cell 165(2):475–487
Rambaut A, Bromham L (1998) Estimating divergence dates from molecular sequences. Mol Biol Evol 15(4):442–448
Ran W, Higgs PG (2012) Contributions of speed and accuracy to translational selection in bacteria. PLoS One 7(12):e51652
Rannala B, Yang Z (2007) Inferring speciation times under an episodic molecular clock. Syst Biol 56(3):453–466
Rashid M, Saha S, Raghava GP (2007) Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 8:337
Razin A, Razin S (1980) Methylated bases in mycoplasmal DNA. Nucleic Acids Res 8(6):1383–1390
Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, Wetzer R, Martin JW, Cunningham CW (2010) Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 463(7284):1079–1083
Reinert K, Stoye J, Will T (2000) An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 16(9):808–814
Rektorschek M, Buhmann A, Weeks D, Schwan D, Bensch KW, Eskandari S, Scott D, Sachs G, Melchers K (2000) Acid resistance of Helicobacter pylori depends on the UreI membrane protein and an inner membrane proton barrier. Mol Microbiol 36(1):141–152
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16(6):276–277
Rideout WMI, Coetzee GA, Olumi AF, Jones PA (1990) 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science 249:1288–1290
Rimsky L, Hauber J, Dukovich M, Malim MH, Langlois A, Cullen BR, Greene WC (1988) Functional replacement of the HIV-1 rev protein by the HTLV-1 rex protein. Nature 335(6192):738–740
Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, Goodnough LH, Helms JA, Farnham PJ, Segal E et al (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129(7):1311–1323
Ritland K, Clegg M (1990) Optimal DNA sequence divergence for testing phylogenetic hypotheses. In: Molecular evolution. Alan R. Liss, New York, pp 289–296
Roberts A, Pachter L (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods 10(1):71–73
Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22
Roberts A, Feng H, Pachter L (2013a) Fragment assignment in the cloud with eXpress-D. BMC Bioinform 14:358
Roberts A, Schaeffer L, Pachter L (2013b) Updating RNA-Seq analyses after re-annotation. Bioinformatics 29(13):1631–1637
Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4(8):651–657
Robinson M, Lilley R, Little S, Emtage JS, Yarranton G, Stephens P, Millican A, Eaton M, Humphreys G (1984) Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res 12(17):6663–6671
Rodgers AB, Morgan CP, Leu NA, Bale TL (2015) Transgenerational epigenetic programming via sperm microRNA recapitulates effects of paternal stress. Proc Natl Acad Sci U S A 112(44):13699–13704
Rogers MF, Thomas J, Reddy AS, Ben-Hur A (2012) SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome Biol 13(1):R4
Rogozin IB, Managadze D, Shabalina SA, Koonin EV (2014) Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture. Genome Biol Evol 6(4):754–762
Rosenberg MS, Kumar S (2003) Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. Mol Biol Evol 20(4):610–621
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408
Ross S, Giglione C, Pierre M, Espagne C, Meinnel T (2005) Functional and developmental impact of cytosolic protein N-terminal methionine excision in Arabidopsis. Plant Physiol 137(2):623–637
Roth JR (1970) UGA nonsense mutations in Salmonella typhimurium. J Bacteriol 102(2):467–475
Rouchka EC (1997) A brief overview of Gibbs Sampling. IBC Statistics Study Group, Washington University, Institute for Biomedical Computing
Ruiz LM, Armengol G, Habeych E, Orduz S (2006) A theoretical analysis of codon adaptation index of the Boophilus microplus bm86 gene directed to the optimization of a DNA vaccine. J Theor Biol 239(4):445–449
Ryan MJ, Fox JH, Wilczynski W, Rand AS (1990) Sexual selection for sensory exploitation in the frog Physalaemus pustulosus. Nature 343:66–67
Ryden SM, Isaksson LA (1984) A temperature-sensitive mutant of Escherichia coli that shows enhanced misreading of UAG/A and increased efficiency for some tRNA nonsense suppressors. Mol Gen Genet 193(1):38–45
Rzhetsky A, Nei M (1994a) Unbiased estimates of the number of nucleotide substitutions when substitution rate varies among different sites. J Mol Evol 38(3):295–299
Rzhetsky A, Nei M (1994b) Unbiased estimates of the number of nucleotide substitutions when substitution rate varies among different sites. J Mol Evol 38(3):295–299
Rzhetsky A, Nei M (1995) Tests of applicability of several substitution models for DNA sequence data. Mol Biol Evol 12(1):131–151
Saadatpour A, Lai S, Guo G, Yuan GC (2015) Single-cell analysis in cancer genomics. Trends Genet 31(10):576–586
Sachs AB, Davis RW, Kornberg RD (1987) A single domain of yeast poly(A)-binding protein is necessary and sufficient for RNA binding and cell viability. Mol Cell Biol 7(9):3268–3276
Sachs G, Meyer-Rosberg K, Scott DR, Melchers K (1996) Acid, protons and Helicobacter pylori. Yale J Biol Med 69(3):301–316
Sachs G, Weeks DL, Melchers K, Scott DR (2003) The gastric biology of Helicobacter pylori. Annu Rev Physiol 65(1):349–369
Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Sakaluk SK (2000) Sensory exploitation as an evolutionary origin to nuptial food gifts in insects. Proc Biol Sci 267(1441):339–343
Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548
Sambrook JF, Fan DP, Brenner S (1967) A strong suppressor specific for UGA. Nature 214(5087):452–453
Samso M, Palumbo MJ, Radermacher M, Liu JS, Lawrence CE (2002) A Bayesian method for classification of images from electron micrographs. J Struct Biol 138(3):157–170
Sancar A, Sancar GB (1988) DNA repair enzymes. Annu Rev Biochem 57:29–67
Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1232
Sankoff D (1975) Minimal mutation trees of sequences. J SIAM Appl Math 28:35–42
Sankoff D, Morel C, Cedergren RJ (1973) Evolution of 5S RNA and the non-randomness of base replacement. Nat New Biol 245(147):232–234
Sankoff D, Cedergren RJ, Lapalme G (1976) Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA. J Mol Evol 7(2):133–149
Sawa T, Ohno-Machado L (2003) A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med 33(1):1–15
Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431
Schena M (2003) Microarray analysis. Wiley-Liss, New York
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW (1998) Microarrays: biotechnology’s discovery platform for functional genomics [see comments]. Trends Biotechnol 16(7):301–306
Schmucker D, Clemens JC, Shu H, Worby CA, Xiao J, Muda M, Dixon JE, Zipursky SL (2000) Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell 101(6):671–684
Schneider TD, Stephens RM (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18(20):6097–6100
Schuler M, Connell SR, Lescoute A, Giesebrecht J, Dabrowski M, Schroeer B, Mielke T, Penczek PA, Westhof E, Spahn CM (2006) Structure of the ribosome-bound cricket paralysis virus IRES RNA. Nat Struct Mol Biol 13(12):1092–1096
Schwartz S, Silva J, Burstein D, Pupko T, Eyras E, Ast G (2008) Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res 18(1):88–103
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Schwer B, Stunnenberg HG (1988) Vaccinia virus late transcripts generated in vitro have a poly(A) head. EMBO J 7(4):1183–1190
Schwer B, Visca P, Vos JC, Stunnenberg HG (1987) Discontinuous transcription or RNA processing of vaccinia virus late messengers results in a 5′ poly(A) leader. Cell 50(2):163–169
Scolnick EM, Caskey CT (1969) Peptide chain termination. V. The role of release factors in mRNA terminator codon recognition. Proc Natl Acad Sci U S A 64(4):1235–1241
Scolnick E, Tompkins R, Caskey T, Nirenberg M (1968) Release factors differing in specificity for terminator codons. Proc Natl Acad Sci U S A 61(2):768–774
Scott D, Weeks D, Melchers K, Sachs G (1998) The life and death of Helicobacter pylori. Gut 43(Suppl 1):S56–S60
Scott DR, Marcus EA, Weeks DL, Sachs G (2002) Mechanisms of acid resistance due to the urease system of Helicobacter pylori. Gastroenterology 123(1):187–195
Seetharam R, Heeren RA, Wong EY, Braford SR, Klein BK, Aykent S, Kotts CE, Mathis KJ, Bishop BF, Jennings MJ et al (1988) Mistranslation in IGF-1 during over-expression of the protein in Escherichia coli using a synthetic gene containing low frequency codons. Biochem Biophys Res Commun 155(1):518–523
Segurel L, Bon C (2017) On the evolution of lactase persistence in humans. Annu Rev Genomics Hum Genet 18:297–319
Sendler E, Johnson GD, Mao S, Goodrich RJ, Diamond MP, Hauser R, Krawetz SA (2013) Stability, delivery and functions of human sperm RNAs at fertilization. Nucleic Acids Res 41(7):4104–4117
Seo EY, Namkung JH, Lee KM, Lee WH, Im M, Kee SH, Tae Park G, Yang JM, Seo YJ, Park JK et al (2005) Analysis of calcium-inducible genes in keratinocytes using suppression subtractive hybridization and cDNA microarray. Genomics 86(5):528–538
Serero A, Giglione C, Sardini A, Martinez-Sanz J, Meinnel T (2003) An unusual peptide deformylase features in the human mitochondrial N-terminal methionine excision pathway. J Biol Chem 278(52):52953–52963
Shadel GS, Clayton DA (1997) Mitochondrial DNA maintenance in vertebrates. Annu Rev Biochem 66:409–435
Sharma U, Conine CC, Shea JM, Boskovic A, Derr AG, Bing XY, Belleannee C, Kucukural A, Serra RW, Sun F et al (2016) Biogenesis and function of tRNA fragments during sperm maturation and fertilization in mammals. Science 351(6271):391–396
Sharp PM (1986) What can AIDS virus codon usage tell us? Nature 324(6093):114
Sharp PM, Bulmer M (1988) Selective differences among translation termination codons. Gene 63(1):141–145
Sharp PM, Li WH (1987) The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15(3):1281–1295
Sharp PM, Tuohy TM, Mosurski KR (1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14(13):5125–5143
Sheppard K, Yuan J, Hohn MJ, Jester B, Devine KM, Soll D (2008) From one amino acid to another: tRNA-dependent amino acid biosynthesis. Nucleic Acids Res 36(6):1813–1825
Sheridan PL, Sheline CT, Cannon K, Voz ML, Pazin MJ, Kadonaga JT, Jones KA (1995) Activation of the HIV-1 enhancer by the LEF-1 HMG protein on nucleosome-assembled DNA in vitro. Genes Dev 9(17):2090–2104
Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R (2006) Comprehensive splice-site analysis using comparative genomics. Nucl Acids Res 34(14):3955–3967
Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16(8):1114–1116
Shine J, Dalgarno L (1974a) The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71(4):1342–1346
Shine J, Dalgarno L (1974b) Identical 3′-terminal octanucleotide sequence in 18S ribosomal ribonucleic acid from different eukaryotes. A proposed role for this sequence in the recognition of terminator codons. Biochem J 141(3):609–615
Shine J, Dalgarno L (1975) Determinant of cistron specificity in bacterial ribosomes. Nature 254(5495):34–38
Shirokikh NE, Spirin AS (2008) Poly(A) leader of eukaryotic mRNA bypasses the dependence of translation on initiation factors. Proc Natl Acad Sci U S A 105(31):10738–10743
Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6(10):813–823
Shoemaker DD, Schadt EE, Armour CD, He YD, Garrett-Engele P, McDonagh PD, Loerch PM, Leonardson A, Lum PY, Cavet G et al (2001) Experimental annotation of the human genome using microarray technology. Nature 409(6822):922–927
Shoemaker R, Deng J, Wang W, Zhang K (2010) Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. Genome Res 20(7):883–889
Shpaer EG (1986) Constraints on codon context in Escherichia coli genes. Their possible role in modulating the efficiency of translation. J Mol Biol 188(4):555–564
Siavoshi F, Malekzadeh R, Daneshmand M, Smoot DT, Ashktorab H (2004) Association between Helicobacter pylori infection in gastric cancer, ulcers and gastritis in Iranian patients. Helicobacter 9(5):470
Siepel A, Haussler D (2004a) Combining phylogenetic and hidden Markov models in biosequence analysis. J Comput Biol 11(2–3):413–428
Siepel A, Haussler D (2004b) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488
Siepel A, Haussler D (2005) Phylogenetic hidden Markov models. In: Nielsen R (ed) Statistical methods in molecular evolution. Springer, New York, pp 325–351
Sim J, Kim SY, Lee J (2005) PPRODO: prediction of protein domain boundaries using neural networks. Proteins 59(3):627–632
Simpson RM, Bruno AE, Bard JE, Buck MJ, Read LK (2016) High-throughput sequencing of partially edited trypanosome mRNAs reveals barriers to editing progression and evidence for alternative editing. RNA 22(5):677–695
Sloane AJ, Duff JL, Wilson NL, Gandhi PS, Hill CJ, Hopwood FG, Smith PE, Thomas ML, Cole RA, Packer NH et al (2002) High throughput peptide mass fingerprinting and protein macroarray analysis using chemical printing strategies. Mol Cell Proteomics 1(7):490–499
Smircich P, Eastman G, Bispo S, Duhagon MA, Guerra-Slompo EP, Garat B, Goldenberg S, Munroe DJ, Dallagiovanna B, Holetz F et al (2015) Ribosome profiling reveals translation control as a key mechanism generating differential gene expression in Trypanosoma cruzi. BMC Genomics 16:443
Smit AF (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9(6):657–663
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Smith AB, Pisani D, Mackenzie-Dodds JA, Stockley B, Webster BL, Littlewood DT (2006) Testing the molecular clock: molecular and paleontological estimates of divergence times in the Echinoidea (Echinodermata). Mol Biol Evol 23(10):1832–1851
Smyth RP, Davenport MP, Mak J (2012) The origin of genetic diversity in HIV-1. Virus Res 169(2):415–429
Smyth RP, Schlub TE, Grimm AJ, Waugh C, Ellenberg P, Chopra A, Mallal S, Cromer D, Mak J, Davenport MP (2014) Identifying recombination hot spots in the HIV-1 genome. J Virol 88(5):2891–2902
Sneath PHA (1962) The construction of taxonomic groups. In: Ainsworth GC, Sneath PHA (eds) Microbial classification. Cambridge University Press, Cambridge, pp 289–332
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 28:1409–1438
Solnick JV, Hansen LM, Salama NR, Boonjakuakul JK, Syvanen M (2004) Modification of Helicobacter pylori outer membrane protein expression during experimental infection of rhesus macaques. Proc Natl Acad Sci U S A 101(7):2106–2111
Sommerer N, Centeno D, Rossignol M (2006) Peptide mass fingerprinting: identification of proteins by maldi-tof. Methods Mol Biol 355:219–234
Sonenberg N, Meerovitch K (1990) Translation of poliovirus mRNA. Enzyme 44(1–4):278–291
Sorensen MA, Kurland CG, Pedersen S (1989) Codon usage determines translation rate in Escherichia coli. J Mol Biol 207:365–377
Staden R (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12(1 Pt 2):505–519
Stamm S, Ben-Ari S, Rafalska I, Tang Y, Zhang Z, Toiber D, Thanaraj TA, Soreq H (2005) Function of alternative splicing. Gene 344:1–20
Steinberg MH, Rodgers GP (2001) Pathophysiology of sickle cell disease: role of cellular and genetic modifiers. Semin Hematol 38(4):299–306
Steitz JA, Jakes K (1975) How ribosomes select initiator regions in mRNA: base pair formation between the 3′ terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc Natl Acad Sci U S A 72(12):4734–4738
Stepankiw N, Raghavan M, Fogarty EA, Grimson A, Pleiss JA (2015) Widespread alternative and aberrant splicing revealed by lariat sequencing. Nucleic Acids Res 43(17):8488–8501
Stingl K, Uhlemann Em EM, Deckers-Hebestreit G, Schmid R, Bakker EP, Altendorf K (2001) Prolonged survival and cytoplasmic pH homeostasis of Helicobacter pylori at pH 1. Infect Immun 69(2):1178–1180
Stingl K, Altendorf K, Bakker EP (2002a) Acid survival of Helicobacter pylori: how does urease activity trigger cytoplasmic pH homeostasis? Trends Microbiol 10(2):70–74
Stingl K, Uhlemann E-M, Schmid R, Altendorf K, Bakker EP (2002b) Energetics of Helicobacter pylori and its implications for the mechanism of urease-dependent acid tolerance at pH 1. J Bacteriol 184(11):3053–3060
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982a) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10(9):2997–3011
Stormo GD, Schneider TD, Gold LM (1982b) Characterization of translational initiation sites in E. coli. Nucleic Acids Res 10(9):2971–2996
Stormo GD, Schneider TD, Gold L (1986) Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res 14(16):6661–6679
Stoye J, Moulton V, Dress AW (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput Appl Biosci 13(6):625–626
Strebel K (2005) APOBEC3G & HTLV-1: inhibition without deamination. Retrovirology 2(1):37
Strigini P, Brickman E (1973) Analysis of specific misreading in Escherichia coli. J Mol Biol 75(4):659–672
Su HL, Liao CL, Lin YL (2002) Japanese encephalitis virus infection initiates endoplasmic reticulum stress and an unfolded protein response. J Virol 76(9):4162–4171
Sueoka N (1964) On the evolution of informational macromolecules. Academic, New York
Suerbaum S, Smith JM, Bapumia K, Morelli G, Smith NH, Kunstmann E, Dyrek I, Achtman M (1998) Free recombination within Helicobacter pylori. Proc Natl Acad Sci U S A 95(21):12619–12624
Suerbaum S, Josenhans C, Sterzenbach T, Drescher B, Brandt P, Bell M, Droge M, Fartmann B, Fischer HP, Ge Z et al (2003) The complete genome sequence of the carcinogenic bacterium Helicobacter hepaticus. Proc Natl Acad Sci U S A 100(13):7901–7906
Sun XY, Yang Q, Xia X (2013) An improved implementation of effective Number of Codons (Nc). Mol Biol Evol 30:191–196
Sund J, Ander M, Aqvist J (2010) Principles of stop-codon reading on the ribosome. Nature 465(7300):947–950
Supek F, Smuc T (2010) On relevance of codon usage to expression of synthetic and natural genes in Escherichia coli. Genetics 185(3):1129–1134
Sutton CW, Pemberton KS, Cottrell JS, Corbett JM, Wheeler CH, Dunn MJ, Pappin DJ (1995) Identification of myocardial proteins from two-dimensional gels by peptide mass fingerprinting. Electrophoresis 16(3):308–316
Sved J, Bird A (1990) The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci U S A 87:4692–4696
Svitkin YV, Imataka H, Khaleghpour K, Kahvejian A, Liebig HD, Sonenberg N (2001) Poly(A)-binding protein interaction with elF4G stimulates picornavirus IRES-dependent translation. RNA 7(12):1743–1752
Swofford D (1993) Phylogenetic analysis using parsimony. Illinois Natural History Survey, Champaign
Tajima F (1993) Unbiased estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 10(3):677–688
Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1(3):269–285
Takezaki N, Nei M (1994) Inconsistency of the maximum parsimony method when the rate of nucleotide substitution is constant. J Mol Evol 39(2):210–218
Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol 12(5):823–833
Tamai I, Sai Y, Kobayashi H, Kamata M, Wakamiya T, Tsuji A (1997) Structure-internalization relationship for adsorptive-mediated endocytosis of basic peptides at the blood-brain barrier. J Pharmacol Exp Ther 280(1):410–415
Tamura K, Kumar S (2002) Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol 19(10):1727–1736
Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–526
Tamura K, Nei M, Kumar S (2004) Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci U S A 101(30):11030–11035
Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 24(8):1596–1599
Tanabe M, Kanehisa M (2012) Using the KEGG database resource. Curr Protoc Bioinformatics Chapter 1:Unit1 12
Tanaka M, Ozawa T (1994) Strand asymmetry in human mitochondrial DNA mutations. Genomics 22(2):327–335
Tang N, Tornatore P, Weinberger SR (2004) Current developments in SELDI affinity technology. Mass Spectrom Rev 23(1):34–44
Tang Y, Gao XD, Wang Y, Yuan BF, Feng YQ (2012) Widespread existence of cytosine methylation in yeast DNA measured by gas chromatography/mass spectrometry. Anal Chem 84(16):7249–7255
Taniguchi T, Weissmann C (1978) Inhibition of Qbeta RNA 70S ribosome initiation complex formation by an oligonucleotide complementary to the 3′ terminal region of E. coli 16S ribosomal RNA. Nature 275(5682):770–772
Tao H, Bausch C, Richmond C, Blattner FR, Conway T (1999) Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181(20):6425–6440
Taramelli R, Kioussis D, Vanin E, Bartram K, Groffen J, Hurst J, Grosveld FG (1986) Gamma delta beta-thalassaemias 1 and 2 are the result of a 100 kbp deletion in the human beta-globin cluster. Nucleic Acids Res 14(17):7017–7029
Tate WP, Brown CM (1992) Translational termination: “stop” for protein synthesis or “pause” for regulation of gene expression. Biochemistry (Mosc) 31(9):2443–2450
Tate WP, Mannering SA (1996) Three, four or more: the translational stop signal at length. Mol Microbiol 21(2):213–219
Tate WP, Mansell JB, Mannering SA, Irvine JH, Major LL, Wilson DN (1999) UGA: a dual signal for ‘stop’ and for recoding in protein synthesis. Biochemistry (Mosc) 64(12):1342–1353
Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura RM (ed) Some mathematical questions in biology – DNA sequence analysis. American Mathematical Society, Providence, pp 57–86
Team GE (2011) Closure of the NCBI SRA and implications for the long-term future of genomics data storage. Genome Biol 12(3):402
Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3(4):441–451
Terasaki T, Deguchi Y, Sato H, K-i H, Tsuji A (1991) In vivo transport of a Dynorphin-like analgesic peptide, E-2078, through the blood–brain barrier: an application of brain microdialysis. Pharm Res 8(7):815
Terenin IM, Dmitriev SE, Andreev DE, Royall E, Belsham GJ, Roberts LO, Shatsky IN (2005) A cross-kingdom internal ribosome entry site reveals a simplified mode of internal ribosome entry. Mol Cell Biol 25(17):7879–7888
Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122
Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y (2002a) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 9(2):447–464
Thijs G, Moreau Y, De Smet F, Mathys J, Lescot M, Rombauts S, Rouze P, De Moor B, Marchal K (2002b) INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling. Bioinformatics 18(2):331–332
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585
Thompson W, Palumbo MJ, Wasserman WW, Liu JS, Lawrence CE (2004) Decoding human regulatory circuits. Genome Res 14(10A):1967–1974
Thorne JL, Kishino H (1992) Freeing phylogenies from artifacts of alignment. Mol Biol Evol 9(6):1148–1162
Thorne JL, Kishino H (2005) Estimation of divergence times from molecular sequence data. In: Nielsen R (ed) Statistical methods in molecular evolution. Springer, New York, pp 233–256
Tinn O, Oakley TH (2008) Erratic rates of molecular evolution and incongruence of fossil and molecular divergence time estimates in Ostracoda (Crustacea). Mol Phylogenet Evol 48(1):157–167
Tjaden B (2015) De novo assembly of bacterial transcriptomes from RNA-seq data. Genome Biol 16:1
Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W (2002) Looping and interaction between hypersensitive sites in the active beta-globin locus. Mol Cell 10(6):1453–1465
Tomatsu S, Orii KO, Bi Y, Gutierrez MA, Nishioka T, Yamaguchi S, Kondo N, Orii T, Noguchi A, Sly WS (2004) General implications for CpG hot spot mutations: methylation patterns of the human iduronate-2-sulfatase gene locus. Hum Mutat 23(6):590–598
Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA et al (1997) The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388(6642):539–547
Toronen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2):142–146
Trapnell C (2015) Defining cell types and states with single-cell genomics. Genome Res 25(10):1491–1498
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578
Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31(1):46–53
Trudel MV, Vincent AT, Attere SA, Labbe M, Derome N, Culley AI, Charette SJ (2016) Diversity of antibiotic-resistance genes in Canadian isolates of Aeromonas salmonicida subsp. salmonicida: dominance of pSN254b and discovery of pAsa8. Sci Rep 6:35617
Trutschl M, Dinkova TD, Rhoads RE (2005) Application of machine learning and visualization of heterogeneous datasets to uncover relationships between translation and developmental stage expression of C. elegans mRNAs. Physiol Genomics 21(2):264–273
Tuller T, Waldman YY, Kupiec M, Ruppin E (2010) Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci U S A 107(8):3645–3650
Valenzuela M, Cerda O, Toledo H (2003) Overview on chemotaxis and acid resistance in Helicobacter pylori. Biol Res 36(3–4):429–436
Van de Peer Y, Neefs JM, De Rijk P, De Wachter R (1993) Reconstructing evolution from eukaryotic small-ribosomal-subunit RNA sequences: calibration of the molecular clock. J Mol Evol 37(2):221–232
Van Dooren S, Pybus OG, Salemi M, Liu HF, Goubau P, Remondegui C, Talarmin A, Gotuzzo E, Alcantara LC, Galvao-Castro B et al (2004) The low evolutionary rate of human T-cell lymphotropic virus type-1 confirmed by analysis of vertical transmission chains. Mol Biol Evol 21(3):603–611
Van Esch H, Devriendt K (2001) Transcription factor GATA3 and the human HDR syndrome. Cell Mol Life Sci 58(9):1296–1300
van Hemert FJ, Berkhout B (1995) The tendency of lentiviral open reading frames to become A-rich: constraints imposed by viral genome organization and cellular tRNA availability. J Mol Evol 41(2):132–140
van Weringh A, Ragonnet-Cronin M, Pranckeviciene E, Pavon-Eternod M, Kleiman L, Xia X (2011) HIV-1 modulates the tRNA pool to improve translation efficiency. Mol Biol Evol 28(6):1827–1834
Vartanian J-P, Henry M, Wain-Hobson S (2002) Sustained G->A hypermutation during reverse transcription of an entire human immunodeficiency virus type 1 strain Vau group O genome. J Gen Virol 83(4):801–805
Vasilescu J, Figeys D (2006) Mapping protein-protein interactions by mass spectrometry. Curr Opin Biotechnol 17(4):394–399
Vazquez-Pianzola P, Hernandez G, Suter B, Rivera-Pomar R (2007) Different modes of translation for hid, grim and sickle mRNAs in Drosophila. Cell Death Differ 14(2):286–295
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487
Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW (1997) Characterization of the yeast transcriptome. Cell 88(2):243–251
Velculescu VE, Madden SL, Zhang L, Lash AE, Yu J, Rago C, Lal A, Wang CJ, Beaudry GA, Ciriello KM et al (1999) Analysis of human transcriptomes. Nat Genet 23(4):387–388
Velculescu VE, Vogelstein B, Kinzler KW (2000) Analysing uncharted transcriptomes with SAGE. Trends Genet 16(10):423–425
Vellanoweth RL, Rabinowitz JC (1992) The influence of ribosome-binding-site elements on translational efficiency in Bacillus subtilis and Escherichia coli in vivo. Mol Microbiol 6(9):1105–1114
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291(5507):1304–1351
Vert JP (2002) Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings. Pac Symp Biocomput 7:649–660
Vestergaard B, Van LB, Andersen GR, Nyborg J, Buckingham RH, Kjeldgaard M (2001) Bacterial polypeptide release factor RF2 is structurally distinct from eukaryotic eRF1. Mol Cell 8(6):1375–1382
Vestergaard B, Sanyal S, Roessle M, Mora L, Buckingham RH, Kastrup JS, Gajhede M, Svergun DI, Ehrenberg M (2005) The SAXS solution structure of RF1 differs from its crystal structure and is similar to its ribosome bound cryo-EM structure. Mol Cell 20(6):929–938
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269
Vlasschaert C, Xia X, Coulombe J, Gray DA (2015) Evolution of the highly networked deubiquitinating enzymes USP4, USP15, and USP11. BMC Evol Biol 15:230
Vlasschaert C, Xia X, Gray DA (2016) Selection preserves Ubiquitin Specific Protease 4 alternative exon skipping in therian mammals. Sci Rep 6:20039
Vlasschaert C, Cook D, Xia X, Gray DA (2017) The evolution and functional diversification of the deubiquitinating enzyme superfamily. Genome Biol Evol 9(3):558–573
Voelter-Mahlknecht S (2016) Epigenetic associations in relation to cardiovascular prevention and therapeutics. Clin Epigenetics 8:4
Waddell PJ, Steel MA (1997a) General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites. Mol Phylogenet Evol 8(3):398–414
Waddell PJ, Steel MA (1997b) General time-reversible distances with unequal rates across sites: mixing lambda and inverse Gaussian distributions with invariant sites. Mol Phylogenet Evol 8(3):398–414
Wade PA, Wolffe AP (2001) ReCoGnizing methylated DNA. Nat Struct Biol 8(7):575–577
Walsh D, Arias C, Perez C, Halladin D, Escandon M, Ueda T, Watanabe-Fukunaga R, Fukunaga R, Mohr I (2008) Eukaryotic translation initiation factor 4F architectural alterations accompany translation initiation factor redistribution in poxvirus-infected cells. Mol Cell Biol 28(8):2648–2658
Wang HC, Hickey DA (2002) Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes. Nucleic Acids Res 30(11):2501–2507
Wang G, Humayun MZ, Taylor DE (1999) Mutation as an origin of genetic variability in Helicobacter pylori. Trends Microbiol 7(12):488–493
Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O (2002) Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinform 3:36
Wang HC, Xia X, Hickey DA (2006) Thermal adaptation of ribosomal RNA genes: a comparative study. J Mol Evol 63(1):120–126
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Wang X, Kim Y, Ma Q, Hong SH, Pokusaeva K, Sturino JM, Wood TK (2010) Cryptic prophages help bacteria cope with adverse environments. Nat Commun 1:147
Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C (2012) PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 11(8):492–500
Washburn MP, Wolters D, Yates JR 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19(3):242–247
Waterfield MD, Scrace GT, Whittle N, Stroobant P, Johnsson A, Wasteson A, Westermark B, Heldin CH, Huang JS, Deuel TF (1983) Platelet-derived growth factor is structurally related to the putative transforming protein p28sis of simian sarcoma virus. Nature 304(5921):35–39
Waterman MS, Vingron M (1994) Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A 91(11):4625–4628
Webster J, Oxley D (2005) Peptide mass fingerprinting: protein identification using MALDI-TOF mass spectrometry. Methods Mol Biol 310:227–240
Weeks DL, Eskandari S, Scott DR, Sachs G (2000) A H+−gated urea channel: the link between Helicobacter pylori urease and gastric colonization. Science 287(5452):482–485
Wei Y, Xia X (2017) The role of +4U as an extended translation termination signal in bacteria. Genetics 205(2):539–549
Wei Y, Wang J, Xia X (2016) Coevolution between stop codon usage and release factors in bacterial species. Mol Biol Evol 33(9):2357–2367
Wei Y, Silke JR, Xia X (2017) Elucidating the 16S rRNA 3′ boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data. Sci Rep. https://doi.org/10.1038/s41598-017-17918-6
Weigert MG, Garen A (1965) Base composition of nonsense codons in E. coli. evidence from amino-acid substitutions at a tryptophan site in alkaline phosphatase. Nature 206(988):992–994
Weiner AM, Weber K (1973) A single UGA codon functions as a natural termination signal in the coliphage q beta coat protein cistron. J Mol Biol 80(4):837–855
Weir BS (1990) Genetic data analysis. Sinauer Associates, Sunderland
Weiss RB, Dunn DM, Dahlberg AE, Atkins JF, Gesteland RF (1988) Reading frame switch caused by base-pair formation between the 3′ end of 16S rRNA and the mRNA during elongation of protein synthesis in Escherichia coli. EMBO J 7(5):1503–1507
Wen Y, Marcus EA, Matrubutham U, Gleeson MA, Scott DR, Sachs G (2003) Acid-adaptive genes of Helicobacter pylori. Infect Immun 71(10):5921–5939
Wenthzel AM, Stancek M, Isaksson LA (1998) Growth phase dependent stop codon readthrough and shift of translation reading frame in Escherichia coli. FEBS Lett 421(3):237–242
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals Math Stat 9:60–62
Williams CL, Preston T, Hossack M, Slater C, McColl KE (1996) Helicobacter pylori utilises urea for amino acid synthesis. FEMS Immunol Med Microbiol 13(1):87–94
Williams KP, Sobral BW, Dickerman AW (2007) A robust species tree for the alphaproteobacteria. J Bacteriol 189(13):4578–4586
Wilson DS, Nock S (2002) Functional protein microarrays. Curr Opin Chem Biol 6(1):81–85
Wilson KS, von Hippel PH (1995) Transcription termination at intrinsic terminators: the role of the RNA hairpin. Proc Natl Acad Sci U S A 92(19):8793–8797
Winston F, Botstein D, Miller JH (1979) Characterization of amber and ochre suppressors in Salmonella typhimurium. J Bacteriol 137(1):433–439
Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast and nuclear DNAs. Proc Natl Acad Sci U S A 84:9054–9058
Wong KM, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319(5862):473–476
Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87(1):23–29
Wright GL Jr (2002) SELDI proteinchip MS: a platform for biomarker discovery and cancer diagnosis. Expert Rev Mol Diagn 2(6):549–563
Wu J, Bag J (1998) Negative control of the poly(A)-binding protein mRNA translation is mediated by the adenine-rich region of its 5′-untranslated region. J Biol Chem 273(51):34535–34542
Wu CI, Li WH (1985) Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci U S A 82(6):1741–1745
Wu J, Tzanakakis ES (2013) Deconstructing stem cell population heterogeneity: single-cell analysis and modeling approaches. Biotechnol Adv 31(7):1047–1062
Xia X (1996) Maximizing transcription efficiency causes codon usage bias. Genetics 144:1309–1320
Xia X (1998a) How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae? Genetics 149(1):37–44
Xia X (1998b) The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes. Mol Biol Evol 15:336–344
Xia X (2000) Phylogenetic relationship among horseshoe crab species: the effect of substitution models on phylogenetic analyses. Syst Biol 49:87–100
Xia X (2001) Data analysis in molecular biology and evolution. Kluwer Academic Publishers, Boston
Xia X (2003) DNA methylation and mycoplasma genomes. J Mol Evol 57:S21–S28
Xia X (2005) Mutation and selection on the anticodon of tRNA genes in vertebrate mitochondrial genomes. Gene 345(1):13–20
Xia X (2006) Topological bias in distance-based phylogenetic methods: problems with over- and underestimated genetic distances. Evol Bioinforma 2:375–387
Xia X (2007a) The +4G site in Kozak consensus is not related to the efficiency of translation initiation. PLoS One 2:e188
Xia X (2007b) Bioinformatics and the cell: modern computational approaches in genomics, proteomics and transcriptomics. Springer US, New York
Xia X (2007c) An improved implementation of codon adaptation index. Evol Bioinforma 3:53–58
Xia X (2008) The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses. BMC Evol Biol 8:211
Xia X (2009) Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances. Mol Phylogenet Evol 52:665–676
Xia X (2012a) DNA replication and strand asymmetry in prokaryotic and mitochondrial genomes. Curr Genomics 13(1):16–27
Xia X (2012b). Position Weight Matrix, Gibbs Sampler, and the associated significance tests in motif characterization and prediction. Scientifica 2012: Article ID 917540, 15 pp
Xia X (2012c) Rapid evolution of animal mitochondria. In: Singh RS, Xu J, Kulathinal RJ (eds) Evolution in the fast lane: rapidly evolving genes and genetic systems. Oxford University Press, Oxford, pp 73–82
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Xia X (2014) Phylogenetic bias in the likelihood method caused by missing data coupled with among-site rate variation: an analytical approach. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer, New York, pp 12–23
Xia X (2015) A major controversy in codon-anticodon adaptation resolved by a new codon usage index. Genetics 199:573–579
Xia X (2016) PhyPA: phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences. Mol Phylogenet Evol 102:331–343
Xia X (2017a) ARSDA: a new approach for storing, transmitting and analyzing transcriptomic data. G3: Genes|Genomes|Genetics. https://doi.org/10.1101/114470
Xia X (2017b) Bioinformatics and drug discovery. Curr Top Med Chem 17(15):1709–1726
Xia X (2017c) DAMBE6: new tools for microbial genomics, phylogenetics and molecular evolution. J Hered 108(4):431–437. https://doi.org/10.1093/jhered/esx033
Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43
Xia X, Holcik M (2009) Strong eukaryotic IRESs have weak secondary structure. PLoS One 4(1):e4136
Xia X, Kumar S (2006) Codon-based detection of positive selection can be biased by heterogeneous distribution of polar amino acids along protein sequences. In: Markstein P, Xu Y (eds) Computational systems bioinformatics: proceedings of the conference CSB 2006. Imperial College Press, London, pp 335–340
Xia X, Lemey P (2009) Assessing substitution saturation with DAMBE. In: Lemey P, Salemi M, Vandamme AM (eds) The phylogenetic handbook, 2nd edn. Cambridge University Press, Cambridge, pp 615–630
Xia X, Li WH (1998) What amino acid properties affect protein evolution? J Mol Evol 47(5):557–564
Xia X, Palidwor G (2005) Genomic adaptation to acidic environment: evidence from Helicobacter pylori. Am Nat 166(6):776–784
Xia X, Xie Z (2001a) AMADA: analysis of microarray data. Bioinformatics 17:569–570
Xia X, Xie Z (2001b) DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92(4):371–373
Xia X, Xie Z (2002) Protein structure, neighbor effect, and a new index of amino acid dissimilarities. Mol Biol Evol 19(1):58–67
Xia X, Yang Q (2011) A distance-based least-square method for dating speciation events. Mol Phylogenet Evol 59(2):342–353
Xia X, Yuen KY (2005) Differential selection and mutation between dsDNA and ssDNA phages shape the evolution of their genomic AT percentage. BMC Genet 6(1):20
Xia X, Hafner MS, Sudman PD (1996) On transition bias in mitochondrial genes of pocket gophers. J Mol Evol 43:32–40
Xia XH, Wei T, Xie Z, Danchin A (2002) Genomic changes in nucleotide and dinucleotide frequencies in Pasteurella multocida cultured under high temperature. Genetics 161(4):1385–1394
Xia X, Xie Z, Kjer KM (2003a) 18S ribosomal RNA and tetrapod phylogeny. Syst Biol 52(3):283–295
Xia X, Xie Z, Salemi M, Chen L, Wang Y (2003b) An index of substitution saturation and its application. Mol Phylogenet Evol 26(1):1–7
Xia X, Wang H, Xie Z, Carullo M, Huang H, Hickey D (2006) Cytosine usage modulates the correlation between CDS length and CG content in prokaryotic genomes. Mol Biol Evol 23(7):1450–1454
Xia X, Huang H, Carullo M, Betran E, Moriyama EN (2007) Conflict between translation initiation and elongation in vertebrate mitochondrial genomes. PLoS One 2:e227
Xia X, MacKay V, Yao X, Wu J, Miura F, Ito T, Morris DR (2011) Translation initiation: a regulatory role for poly(A) tracts in front of the AUG codon in saccharomyces cerevisiae. Genetics 189(2):469–478
Xiao L, Wang K, Teng Y, Zhang J (2003) Component plane presentation integrated self-organizing map for microarray data analysis. FEBS Lett 538(1–3):117–124
Xu Z, Hao B (2009) CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res 37(Web Server):W174–W178
Yamaoka Y, Kita M, Kodama T, Imamura S, Ohno T, Sawai N, Ishimaru A, Imanishi J, Graham DY (2002) Helicobacter pylori infection in mice: role of outer membrane proteins in colonization and inflammation. Gastroenterology 123(6):1992–2004
Yang Z (1995) A space-time process model for the evolution of DNA sequences. Genetics 139:993–1005
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Yang Z, Yoder AD (2003) Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst Biol 52(5):705–716
Yang Z, O’Brien JD, Zheng X, Zhu HQ, She ZS (2007) Tree and rate estimation by local evaluation of heterochronous nucleotide data. Bioinformatics 23(2):169–176
Yang Z, Bruno DP, Martens CA, Porcella SF, Moss B (2010) Simultaneous high-resolution analysis of vaccinia virus and host cell transcriptomes by deep RNA sequencing. Proc Natl Acad Sci U S A 107(25):11513–11518
Yates JR (2004a) Mass spectral analysis in proteomics. Annu Rev Biophys Biomol Struct 33:297–316
Yates JR (2004b) Mass spectrometry as an emerging tool for systems biology. BioTechniques 36(6):917–919
Yip TT, Lomas L (2002) SELDI ProteinChip array in oncoproteomic research. Technol Cancer Res Treat 1(4):273–280
Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17(7):1081–1090
Yoon JH, De S, Srikantan S, Abdelmohsen K, Grammatikakis I, Kim J, Kim KM, Noh JH, White EJ, Martindale JL et al (2014) PAR-CLIP analysis uncovers AUF1 impact on target RNA fate and genome integrity. Nat Commun 5:5248
Yoshinaka Y, Katoh I, Copeland TD, Oroszlan S (1985) Murine leukemia virus protease is encoded by the gag-pol gene and is synthesized through suppression of an amber termination codon. Proc Natl Acad Sci U S A 82(6):1618–1622
You J, Cohen RE, Pickart CM (1999) Construct for high-level expression and low misincorporation of lysine for arginine during expression of pET-encoded eukaryotic proteins in Escherichia coli. BioTechniques 27(5):950–954
Young JA, Johnson JR, Benner C, Yan SF, Chen K, Le Roch KG, Zhou Y, Winzeler EA (2008) In silico discovery of transcription regulatory elements in Plasmodium falciparum. BMC Genomics 9:70
Yu KM, Liu J, Moy R, Lin HC, Nicholas HB Jr, Rosenquist GL (2002) Prediction of tyrosine sulfation in seven-transmembrane peptide receptors. Endocrine 19(3):333–338
Yu Q, Chen D, König R, Mariani R, Unutmaz D, Landau NR (2004) APOBEC3B and APOBEC3C are potent inhibitors of simian immunodeficiency virus replication. J Biol Chem 279(51):53379–53386
Yu Y, Sweeney TR, Kafasla P, Jackson RJ, Pestova TV, Hellen CU (2011) The mechanism of translation initiation on Aichivirus RNA mediated by a novel type of picornavirus IRES. EMBO J 30(21):4423–4436
Yuan ZC, Zaheer R, Morton R, Finan TM (2006) Genome prediction of PhoB regulated promoters in Sinorhizobium meliloti and twelve proteobacteria. Nucleic Acids Res 34(9):2686–2697
Yuan J, Sheppard K, Soll D (2008) Amino acid modifications on tRNA. Acta Biochim Biophys Sin Shanghai 40(7):539–553
Zhang S, Ryden-Aulin M, Isaksson LA (1996) Functional interaction between release factor one and P-site peptidyl-tRNA on the ribosome. J Mol Biol 261(2):98–107
Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276(5316):1268–1272
Zhang HM, Ye X, Su Y, Yuan J, Liu Z, Stein DA, Yang D (2010) Coxsackievirus B3 infection activates the unfolded protein response and induces apoptosis through downregulation of p58IPK and activation of CHOP and SREBP1. J Virol 84(17):8446–8459
Zharkikh A (1994) Estimation of evolutionary distances between nucleotide sequences. J Mol Evol 39:315–329
Zheng CL, Fu XD, Gribskov M (2005) Characteristics and regulatory elements defining constitutive splicing and different modes of alternative splicing in human and mouse. RNA 11(12):1777–1787
Zhou J, Korostelev A, Lancaster L, Noller HF (2012) Crystal structures of 70S ribosomes bound to release factors RF1, RF2 and RF3. Curr Opin Struct Biol 22(6):733–742
Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw 23(4):550–560
Zhu J, Liu JS, Lawrence CE (1998) Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1):25–39
Zhu Z, Li L, Zhang Y, Yang Y, Yang X (2015a) CompMap: a reference-based compression program to speed up read mapping to related reference sequences. Bioinformatics 31(3):426–428
Zhu Z, Zhang Y, Ji Z, He S, Yang X (2015b) High-throughput DNA sequence data compression. Brief Bioinform 16(1):1–15
Zid BM, Rogers AN, Katewa SD, Vargas MA, Kolipinski MC, Lu TA, Benzer S, Kapahi P (2009) 4E-BP extends lifespan upon dietary restriction by enhancing mitochondrial activity in Drosophila. Cell 139(1):149–160
Zien A, Ratsch G, Mika S, Scholkopf B, Lengauer T, Muller KR (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9):799–807
Zon LI, Gurish MF, Stevens RL, Mather C, Reynolds DS, Austen KF, Orkin SH (1991) GATA-binding transcription factors in mast cells regulate the promoter of the mast cell carboxypeptidase A gene. J Biol Chem 266(34):22948–22953
Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ (eds) Evolving genes and proteins. Academic, New York, pp 97–166
Author information
Authors and Affiliations
Postscript
Postscript
I usually will share Simpson’s Paradox with students after lecturing on the joint effect of translation initiation and elongation on protein production. If we do not take translation initiation into consideration, we may arrive at a wrong conclusion that codon usage bias contributes little to the rate of protein synthesis, as did by Kudla et al. (2009). Simpson’s Paradox, illustrated with data in Table 9.11, presents a similar case in which one would reach a wrong conclusion when one factor is ignored.
Charig et al. (1986) summarized their findings in the abstract on the basis of the last row of “Pooled” data, stating that “Success was achieved in 273 (78%) patients after open surgery, 289 (83%) after percutaneous nephrolithotomy.” A reader would have thought that AOS is worse (78% success rate) than PN (83% success rate). However, taking kidney stone size into consideration allows us to immediately reach an opposite (and correct) conclusion, i.e., AOS is better than PN for both small stones (93% vs. 87%) and large stones (73% vs 69%). We also note that both AOS and PN have much higher success rate for small stones than for large stones. Patients treated with PN had mostly small stones and patients treated with AOS had mostly large stones. It is this association between PN and small stone that leads to the misleading conclusion that PN is better than AOS when kidney stone size is ignored.
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this chapter
Cite this chapter
Xia, X. (2018). Bioinformatics and Translation Elongation. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-90684-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90682-9
Online ISBN: 978-3-319-90684-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)