The universal distribution of the genetic code, the same essential features of genome replication and gene expression, basic anabolic reactions, and membrane-associated ATPase mediated energy production suggests that they were already present in the last common ancestor (LCA), i.e., the cenancestor, of all living beings. It is of course unlikely that such traits were already present in the first forms of life, whose actual nature can only be surmised. Comparative genomics can provide important insights on intermediate stages analysis provides important insights on evolutionary stages that may have existed prior to the LCA and the separation of the three major cell lineages. However, such information cannot be extrapolated into older evolutionary stages, including the events that may have taken place on the prebiotic soup, nor on the RNA world itself. At the time being, the applicability of molecular cladistics and comparative genomics cannot be extended beyond a threshold that corresponds to a period of cellular evolution in which protein biosynthesis was already in operation.

Studies of deep phylogenies provide important insights into the nature of the LCA itself. It could be argued that the most parsimonious characterization of the cenancestor could be achieved by summarizing the features of the oldest recognizable nodes of universal cladograms. The rooting of universal cladistic trees determines the directionality of evolutionary change and allows the recognition of ancestral from derived characters, i.e., primitive characters should appear in older, basal branches than do their derived counterparts. Determination of the rooting point of a tree normally imparts polarity to most or all characters. However, large-scale studies based on the availability of genomic data have revealed major discrepancies with the rRNA tree topology. Very often these differences have been interpreted as evidence of horizontal gene transfer events between different species and even domains, questioning the feasibility of the reconstruction and proper understanding of early biological history. Moreover, it is important to distinguish between ancient and primitive organisms. Species located near the root of universal rRNA-based trees are cladistically ancient, but they are not endowed with a primitive molecular genetic apparatus, nor seem to be more primitive in their metabolic abilities than their aerobic counterparts.

Reconstructions of gene complements of distant ancestors are mere statistical approximations of biological past, since their accuracy depends on manifold factors, including horizontal gene transfer, polyphyletic gene losses, the significant variations in substitution rates of different proteins, as well as methodological caveats, including the possible biases in the construction of genome databases (Becerra et al. 1997). Medical and veterinarian interests have shaped the nature of extant genome databases from which many species are absent, and that exclude, for the time being, representatives of all major biological groups. However, there is a significant overlap in the inventories of highly conserved sequences reported by different authors. Sequences involved in RNA metabolism, i.e., ORFs whose products synthesize, degrade, or interact with RNA, are among the most highly conserved sequences common to all known genomes, and provide insights into an early stage in cell evolution during which RNA played a much more conspicuous biological role. The conservation of this set of sequences is consistent with the proposal that the extant DNA/RNA/protein world was preceded by an RNA/protein world, an evolutionary stage in which ribonucleotide reduction and DNA genomes had not yet evolved (Becerra et al. 2007).

The available information suggests that the cenancestor was not an immediate descendant of the RNA world, a protocell, nor any other pre-life progenitor system, but that it was a complex organism, much alike extant prokaryotes. However, it should be kept in mind that inventories of LCA gene content include sequences that may have undergone horizontal transfer events, as well genes (or domains) that have originated in different pre-cenancestral epochs. For instance, invariant motifs that exhibit a surprising degree of conservation, such as GHVDHGKT, DTPGHVDF, and GAGKSTL (Goto et al. 2002), and the RNA-binding domains found in highly conserved genes (Delaye and Lazcano 2000), which may well be among the oldest recognizable polypeptides found in databases, and are very likely much more ancient that some of the proteins in which they are present.

From a cladistic viewpoint, the LCA is merely an inferred inventory of features shared among extant organisms, all of which are located at the tip of the branches of molecular phylogenies. However, if the term “universal distribution” is restricted to its most obvious sense, i.e., that of traits found in all completely sequenced genomes, then quite surprisingly the resulting repertoire is formed by relatively few features and by incompletely represented biochemical processes (Becerra et al. 2007). Analysis of some of the most likely a priori candidates for strict universality, such as the molecular machinery involved in DNA replication, have turned out to be of polyphyletic origin (Edgell and Doolittle 1997). It has been argued that polymerases and topoisomerases may have an ultimate viral origin (Forterre 2006). However, not all the components of multidomain enzymes are equally ancient. This appears to the case of the catalytic palm subdomain of the Klenow fragment of DNA polymerase I, which appears to be a vestige of the RNA/protein world replicase (Becerra et al. 2007).

Understanding of the evolution of central metabolic pathways during pre-LCA epochs is hampered by the absence of one or more biosynthetic genes in the genomes of manifold free-living prokaryotes that have been sequenced. It is not easy to explain the troublesome absence of a number of biosynthetic genes in the genomes of manifold free-living prokaryotes that have been sequenced. Addressing this issue will require not only the identification and proper annotation of highly conserved open reading frames found in all cell genomes, but also more complete tertiary structure databases. The possibility that some of the enzymes of archaic pathways may have survived in unusual organisms suggest that considerable prudence should also be exerted when attempting to describe the physiology of ancestral organisms.

It is also possible that extant enzymes participated in metabolic pathways which no longer exist or remain to be discovered (Zubay 1993; Becerra and Lazcano 1998), a possibility that has begun to be explored by computer searches for alternative reaction pathways (Goto et al. 1996). The discovery that carbamate kinase, which participates in fermentative ATP production, catalyzes the formation of carbamoyl phosphate in the archaea Pyrococcus furiosus and P. abyssi (Alcántara et al. 2000) shows that considerable attention should be given to the possibility that significant variations of the basic biosynthetic pathways may have existed in the past.

Clues to the genetic organization and biochemical complexity of primitive entities from which the LCA evolved may also be derived from the analysis of paralogous gene families. The number of sequences that have undergone such duplications prior to the divergence of the three lineages includes genes encoding for a variety of enzymes that participate in widely different processes such as translation, DNA replication, biosynthetic pathways, and energy-producing processes. As noted elsewhere (Becerra et al. 2007), a survey of the available information shows that sequences that have resulted from early pre-ancestral paralogous expansion may be classified in three major groups:

  1. (a)

    sequences formed by two tandemly arranged homologous modules which underwent fusion events, such as the (1) protein disulfide oxidoreductase (Ren et al. 1998), (2) large subunit of carbomoyl phosphate synthetase (Alcántara et al. 2000), and (3) HisA, an isomerase that forms part of the histidine biosynthetic pathway (Alifano et al. 1996);

  2. (b)

    gene families which have undergone a major expansion of sequences, such as ABC transporters and other enzymes involved in membrane transport phenomena (Clayton et al. 1997); and

  3. (c)

    families formed by a relatively small number of paralogous sequences. These includes, among others, the pair of homologous genes encoding the EF-Tu and EF-G elongation factors, (Iwabe et al. 1989) as well as the duplicated sequences encoding the F-type ATPase hydrophilic alpha and beta subunits (Gogarten et al. 1989).

The identification of sequences formed by tandemly fused homologous modules provides direct evidence of the existence during early Precambrian times of smaller, functional genes. Moreover, the families of paralogous duplicates also imply that the LCA was preceded by simpler cells with a smaller genome in which only one copy of each of these genes existed, i.e., by cells in which, for instance, protein synthesis involved only one elongation factor, and with ATPases with limited regulatory abilities. Paralogous families of metabolic genes also support the proposal that anabolic pathways were assembled by the recruitment of primitive enzymes that could react with a wide range of chemically related substrates, i.e., the so-called patchwork assembly of biosynthetic routes (Jensen 1976). Such relatively slow, unspecific enzymes may have represented a mechanism by which primitive cells with small genomes could have overcome their limited coding abilities. How early cells overcame the bottlenecks imposed by such limitations is still an open problem than can be addressed experimentally.