Evolutionary hallmarks of the human proteome: chasing the age and coregulation of protein-coding genes
- 1.1k Downloads
The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution.
We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps (“hallmarks”), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules.
Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function.
KeywordsHuman protein evolution Human gene evolution Transcriptomics RNA-seq Tissue transcriptomics Protein families Gene coexpression Gene house-keeping Gene tissue-enriched
RNA-Seq allows the measurement of the gene expression levels in a manner far more precise than previous methods. Studies using this approach have already altered our view of the extent and complexity of the eukaryotic transcriptomes . Important analyses have been performed based on transcriptomic data obtained from multiple human tissues. The FANTOM project, for example, presents the gene expression profiling of 56 human healthy tissues associated with the functional annotation of the mammalian genomes . The Human Protein Atlas applies RNA-Seq to 32 human tissues to find the correlation between gene expression and protein presence in the characterization of several parts of the human proteome: membrane proteome, druggable proteome, cancer proteome, secretome and proteome involved in metabolic processes . The Genotype-Tissue Expression (The GTEx Consortium)  provides an extensive resource of human transcriptomic data permitting the inference of different patterns across human tissues and individuals , and the construction of tissue-specific gene co-expression networks . Other studies are more focused on specific cells or tissue types, like the article entitled “A comprehensive analysis of the human placenta transcriptome” that characterizes the transcriptome of placenta from 20 healthy women with uncomplicated pregnancies using RNA-Seq .
The assembly of comprehensive maps of the human transcriptome is essential for a clear identification of the functional elements of our genome and to reveal the molecular constituents of different cells and tissues . Despite the accomplishment of many transcriptomic studies, little has been reported until recently about the evolutionary determinants of human cell identity, particularly from a joint perspective of protein evolution and gene expression . The evolutionary history of a gene can be very informative about its function. Gene age (i.e. the question How old is a gene?) is an important piece of information that can be inferred in different ways and has been used in some genome-scale studies and in some studies on gene families . Phylostratigraphy is the usual methodology applied to find the origin and emergence of genes [10, 11]. Previous phylogenetic studies showed that the evolutionary history of different coding parts of the genome have relations with diseases , codon usage , essentiality, interactions , stemness and self-renewal . Other studies have shown that older genes evolve slower , encode longer proteins, present higher expression levels, possess higher intron density and are subject to stronger purifying selection [17, 18]. Several of these studies approach the question of gene age in different ways, but most of them are not focused on human genes or do not applied phylostratigraphy using large-scale genomic data.
In this work, we address the key question about the age of the human genes using a combination of complementary genome-wide data and databases that are analysed and integrated to achieve a map of the human genes on the evolutionary time-scale. This integrative analysis uses a comprehensive human tissue RNA-Seq dataset that allows deep expression profiling of protein-coding genes ; a database of orthologous proteins that allows finding the oldest relatives to each human protein along different species [19, 20]; the taxonomy mapping of genes to lineage clades from the NCBI Taxonomy database (www.ncbi.nlm.nih.gov/taxonomy); and the time-scale mapping provided by TimeTree resource (www.timetree.org) [21, 22]. The generation of clusters of orthologous proteins built along multiple species is more accurate for phylogenetic studies than simple sequence homology search , because it does not look for singular best homologous but it implies a conservation along the evolutionary tree and can provide a way to date the origin of protein modules involved in specific functions or in specific biological pathways [23, 24]. Finally, to complete the view of the protein-coding gene phylostratigraphy, we use the genome-wide expression data to produce a human gene network based on a coexpression analysis of the transcriptomic RNA-Seq profiles along multiple tissues and identify the genes that can be considered House-keeping (HKg) or Tissue-enriched (TEg). The allocation of these gene subsets (HKg and TEg) on the evolutionary time map shows a clear difference in gene age, indicating that house-keeping genes are older.
Gene expression data from human normal tissues
The genome-wide expression dataset used in this work corresponds to a series of RNA-Seq analyses performed with Illumina HiSeq 2000 paired end sequencing on cDNA libraries prepared from samples of 122 human individuals from 33 different tissues  (ArrayExpress DB: E-MTAB-2836). The data provided reads for 20,344 genes detected per sample, and 18,545 of these genes showed relevant expression signal corresponding to mean(FPKM) ≥ 1 in all the selected tissues. After normalization and comparative analysis of the expression distributions of the samples from this dataset, we selected a total of 116 samples with two to five biological replicates for the following 32 tissues: adrenal gland (3 replicates), appendices (3), bone marrow (4), brain (3), colon rectum (5), duodenum (2), endometrium (5), esophagus (3), fallopian tube (5), adipose tissue (3), gallbladder (3), heart (4), kidney (4), liver (3), lung (5), lymph node (5), ovary (3), pancreas (2), placenta (4), prostate (4), rectum (4), salivary gland (3), skeletal muscle (5), skin (3), small intestine (4), smooth muscle (3), spleen (4), stomach (3), testis (5), thyroid (4), tonsil (3) and urinary bladder (2).
Expression profiling and coexpression data analyses
For the evolutionary analysis and determination of Lowest Common Ancestor (LCA), we used a database of orthologous proteins: Orthologous MAtrix (OMA, http://omabrowser.org/) [19, 20]. OMA includes a database and resource with methods for the inference of orthologous among complete genomes [19, 20]. We downloaded the OMA database into a local MySQL database and created Python scripts to search for the Ensembl ID's from our transcriptomic data into this local database and to calculate the LCA into each respective ortholog group. In this way we obtained a table with the number of protein-coding genes assigned to each clade in the human taxonomy lineage, as defined by the Taxonomy resource from NCBI (Taxonomy ID 9606 for human, Homo sapiens; database accessed 9 January 2016). Furthermore to produce a time-scale, we integrated these data with the TimeTree of life (www.timetree.org) [21, 22], that includes the tree of living species calibrated to time. The analysis of the number of genes placed along the evolutionary time-scale allowed visualization of the profile of human genes origin for the whole genome (genome-wide) or for specific subsets of genes. In this genes/time profile we calculated the number of protein-coding genes that had LCA corresponding to each clade (or level) in the taxonomy lineage –that for human includes 31 consecutive levels– and we identified certain levels where major changes occur. These are taken as most significant stages and proposed as key evolutionary hallmarks including specific sets of the human protein-coding genes that are identified.
Functional enrichment analysis and identification of gene modules
For the functional enrichment analysis we used DAVID (david.ncifcrf.gov)  and GeneTerm-Linker (gtlinker.cnb.csic.es)  bioinformatic tools with the list of genes from each evolutionary stage level of the human lineage. In all cases the enrichment analyses were done using a hypergeometric test and adjusting the p-values for multiple testing with the Benjamini-Hochberg procedure . In the same way, we also investigated the functional enrichment of the subnetworks generated by the clusters and modules found in the analysis of the gene coexpression network.
General calculations and statistics
All the data analyses and graphics have been produced in the R statistic environment. General functions as boxplot, image, qplot (from ggplot2 library) or wilcox.test, have been applied over the different data presented. Some specific methods or algorithms are cited along different sections of this manuscript.
Results and discussion
Human global transcriptome profile based on expression in multiple tissues
The unsupervised clustering analysis of the global expression correlation along 116 samples of 32 human normal tissues is presented in Fig. 1. This plot shows the color heatmap and dendrogram produced by the comparison of all the expressed genes (18,545) along all samples. The pair-wise distances were calculated as: (1 – Spearman correlation coefficient); that is a non-parametric approach to calculate similarity and distance. In this way, the heatmap shows a clear relationship between the samples from the same tissue (i.e. the biological replicates come together) and also presents the proximity between the tissues that have strong biological and physiological links, such as: spleen, lymph nodes and tonsils (all related to the lymphatic system); or stomach, duodenum, small intestine, colon and rectum (all related to the digestive system). A color bar with scales for the heatmap is included in Fig. 1, to indicate that dark-red corresponds to minimum distance (i.e. maximum correlation) and dark-blue to maximum distance (i.e. minimum correlation). White color corresponds to medium values and the distributions inside the color bars show that the major part of the comparisons have values around r = 0.80–0.85 (i.e. pale-red colors). The results also reveal that some tissues have transcriptomic profiles very different to the rest, being testis the most different one that produces a clearly separated branch in the dendrogram.
House-keeping and tissue-enriched genes
Human gene hallmarks on the evolutionary time-scale
For the evolutionary analysis, we used a phylostratigraphic method for reconstruction of macro-evolutionary trends based on the principle of “founder gene” formation . Typically, these methods first identify the homologues of a given gene and then use the divergence between the two most distant to determine the gene age. Historically, such studies have been using BLAST  for homology searches. However, this approach was shown to introduce some biases into the analyses [30, 31]. Another approach is to use orthologous groups to determine the age of a gene. Orthologues are believed to be functionally more similar than paralogues  and by definition they trace back to an ancestral gene that was present in a common ancestor of the compared species . The parameters used for clustering orthologous groups affect the age estimations for a gene; for example, restrictive parameters tend to limit the set of possible progenitors . Nevertheless, both approaches used for dating the gene origin depend on the correct identification of homologues and/or orthologues, but in the second case the accurate reconstruction of orthologous families imposes a higher stringency and it implies a conservation along the evolutionary clades.
The analysis of the hallmarks also reveals that the HK genes are more ancient than TE genes. The plots in Fig. 3 and the data in Fig. 4 show clear difference between them. The HK genes present a major increase or expansion in stage 2 (Prokaryota to Eukaryota), with 1009 genes and a change of ≈ 30 % with respect to the total. By contrast, the TE genes show a major increase in stage 7 (Mammalia to Primates) with 799 genes and a change of ≈ 37 %. These observations seem to indicate that house-keeping genes emerged early in evolution and are older in age, knowing that they reflect more essential and constitutive functions. The idea that gene essentiality is associated with older genes has been reported in several studies, for example on yeast  and mammalian genes . By contrast, the observations that human tissue-specific genes had emerged later in evolution may reveal that human specific cellular or physiological roles are implemented at molecular level by the appearance of newer genes.
Gene age data comparison
As we indicated above, there are some studies that use the phylostratigraphic method to explore the age of human genes, but most of these studies use sequence similarity search (with algorithms like BLAST) to look for the oldest homologues to the human [12, 16, 34]. To compare the results on human gene age assignment done in this work with other available age assignments, we took the published data from Domazet-Lošo (2008)  and from Neme (2013) , and we represented the information about allocation to Lowest Common Ancestor (LCA) of the human genes in phylogenetic clades of the evolutionary tree. The assignments were done using 15 common phylostratum to allow the comparison of the data. The results of this comparison are included in Additional file 1: Figure S4 and they show a general similarity but some important differences. The most significant difference corresponds to the fact that both Domazet-Lošo  and Neme  placed a very large number of genes on the first stage of the evolutionary time-scale that goes from the origin of life to first cellular organisms (i.e. pre-eukaryota): 8285 of 22,845 (36 %)  and 7309 of 22,154 (33 %) . This result denotes a bias that, as we indicated above, can be due to the methodology of using the homology search approach. In any case, the idea suggested that one third of the human proteoma may have emerged in evolutionary time before the origin of eukaryotic cells needs deeper studies and it is not what we observed in our analyses.
Another important difference is that the proposed age mapping allocates the largest number of genes, first, to the Chordata-Vertebrata-Euteleostomi phylostratums (with 5070 genes) and, second, to the Mammalia-Eutheria (with 2172 genes) (Additional file 1: Figure S4). From the evolutionary point of view these results make a lot of sense since the time-scale of life [21, 22] reveals two large expansions of the species precisely around the vertebrates time (between 600 and 400 Mya) and around the time of the mammals appearance (between 250 and 100 Mya). These expansions are well reflected in our time-scale profile (Fig. 3). Finally, it is important to indicate that the age mapping presented in our study only considers human protein-coding genes that are included in orthologous families (mapping a total of 17,437) and in this way it has a lower coverage over human genes than the other reported studies which include more than 22,000 genes in each case [12, 34].
Functional enrichment of the genes at different evolutionary hallmarks
We performed functional enrichment analyses of the sets of protein-coding genes included in each one of eight major stages found in the evolutionary study. The full results of these analyses are provided as Additional file 5. In all the stages the functional enrichment makes clear biological sense and provides a strong support to the allocation of many biological processes in evolutionary time. In brief, we comment and discuss below some interesting functions enriched in each stage.
Stage-1, from the origin of life to first cellular organisms (i.e. Prokaryota). This stage comprises the genes occurring over two major domains of life: Archaea and Bacteria. Determining the LCA, our data shows that human has 6.76 % (1178) of the protein-coding genes assigned to Prokaryotic age. Prokaryotes are organisms that lack both membrane-bound organelles and nucleus. Functional enrichment analysis showed that this stage involved many basic metabolic processes like glycolysis (GO:0006007, glucose catabolic process), the Krebs cycle (GO:0006099, tricarboxylic acid cycle), and lipid oxidation (GO:0009062, fatty acid catabolic process). The enrichment also shows the appearance of the oldest cellular organelle, the mitochondria, and the oldest macromolecular machine, the ribosome, that are well reported to be dated to Prokaryotic times.
Stage-2, Cellular organisms to Eukaryota. According to basic literature, the defining feature of eukaryotic cells is that they have membrane-bound organelles, especially the nucleus, which contains the genetic material, and is enclosed by the nuclear envelope. Protists, fungi, animals, and plants all consist of eukaryotic cells. Eukaryotic cells also contain other membrane-bound organelles such as the Golgi apparatus. Eukaryotic organisms can be unicellular or multicellular. The functional enrichment analysis for the 2178 genes that emerged along this stage showed well the formation of the principal complexes expected in Eukaryotes. In this way it is remarkable the enrichment on nuclear pore proteins, nuclear import proteins, nucleosome and chromatin proteins, as well as many proteins involved DNA and RNA activity: mRNA and rRNA processing, mRNA splicing, DNA unwinding, DNA polymerase, DNA/RNA helicase. This stage also marks in time the appearance and biogenesis of some major molecular complexes: the proteasome (GO:0005839, proteasome core complex), the spliceosome, and the ribosome (at this stage mainly the proteins of the large subunit RPLs, in contrast to the ribosomal proteins of the small subunit RPSs, that were mostly allocated to Prokaryotic age).
Stage-3, Eukaryota to Metazoa. The third stage comprises organisms from Opisthokonta and Metazoan clades with 1395 protein-coding genes (27.25 % cumulative). The Opisthokonts are a broad group of eukaryotes, including both the animal and fungi kingdoms, sometimes referred to as the “fungi/metazoan group” . This stage comprises metazoan, fungal and protistan taxa, and other multicellular taxa (such as plants, and red and brown algae) . They also include known fungi and/or parasites of plants like: Ascomycota, Basidiomycota, Chytridiomycetes, Glomeromycota, Microsporidia, Urediniomycetes, Ustilaginomycetes and Zygomycota . According to our functional enrichment analysis, this stage involves different genes responsible for signal transduction like the GTPases. Some of the enriched terms are: pyrophosphatase activity, nucleoside-triphosphatase activity, GTP binding, transferring phosphorus-containing groups. All these functions indicate that it may be the time when phosphorus and phospate acquired a key role in protein function and regulation. Other enriched functions, like post-translational protein modification, calcium-binding EF-hand, protein transport and localization also indicate cellular protein regulation.
Stage-4, Metazoa to Vertebrata. This stage includes organisms from Eumetazoa, Bilateria, Deuterostomia, Chordata, Craniata and Vertebrata with 2333 genes (40.63 % cumulative). The main novelties of this stage are the appearance of protein kinase activity, and the presence of growth factors and some specific signaling proteins like WNT. All biochemically characterized members of the WNT superfamily encode enzymes that transfer organic acids, typically fatty acids, onto hydroxyl groups of membrane-embedded targets . Other enriched terms in this stage, like sarcomere and contractile fiber part, may indicate the emergence of the muscular structures present in vertebrates .
Stage-5, Vertebrata to Euteleostomi. This stage represents the largest step in the human lineage according to the number of protein-coding genes assigned (5070), that correspond to a 29 % of the total. The stage comprises organisms from Gnathostomata (jawed vertebrates), Teleostomi (bony fish and tetrapods) and Euteleostomi (bony vertebrate) . The enrichment analysis shows a large functional expansion including new biological systems, like the neural and the vascular-circulatory systems, represented in enriched terms like: neurogenesis, neuron differentiation, axogenesis, voltage-gated channels, neuromuscular junction development, blood vessel development, vasculature development, mesenchymal cell development and differentiation, etc. Many other genes are assigned to biological regulation and regulation of cellular processes; including cell death and apoptosis. Finally, the appearance of the large family of homeobox proteins seems to be placed at this stage.
Stage-6, Euteleostomi to Mammalia. In this stage there are organisms from Sarcopterygii (lobe-finned fishes) , Dipnotetrapodomorpha (new taxon from NCBI comprising lungfishes), Tetrapoda (four-legged vertebrates), Amniota (comprising the reptiles, birds and mammals that lay their eggs on land or retain the fertilized egg within the mother) and up to Mammalia clades. With 1953 genes at this stage, the human lineage achieves 80 % of its gene composition. The most relevant enriched terms are related to the hematologic system, marking the appearance of the leukocytes and the lymphocytes. Previous phylogenetic analyses based on gene expression data also placed the date of many proteins from leukocytes around the time of the mammals’ clade .
Stage-7, Mammalia to Primates. This stage comprises clades of Theria, Eutheria, Boreoeutheria, Euarchontoglires and Primates, representing organisms that give birth to live young without using a shelled egg up to placental mammals . There are 2821 genes emerged on this stage, adding up to 97.08 % of the cummulative profile in the human gene lineage. A large amount of these genes are enriched in the terms: regulation of gene expression and transcription. Other more specific terms are related to the skin (epidermal and epithelial cell differentiation, keratinization) or with the sexual reproductive system (male gamete generation, spermatogenesis and sexual reproduction). This stage also includes a family of cytochrome P450 proteins (that are around 23) and the mammalian defensins (that are 6): DEFA1B, DEFA3, DEFA4, DEFA5, DEFA6, and DEFB4A. Defensins are a family of antimicrobial peptides and vital contributors to host immune response. Being constitutive or inducible expressed genes, they have been shown to contribute to innate host defense via direct bactericidal activity, as well as to adaptive immunity through effector and regulatory functions .
Stage-8, Primates to Homo sapiens. The last stage of human development, with 509 genes, presents a group of quite specific functions played by specific protein families, such as: somatotropin hormone, cytochrome P450, GTPase activator activity, defense response to fungus and bacterium provided by histatins. HIS1 and HIS3 (histatin proteins) have been found only in saliva of humans, macaques and some other primates but not in any other mammals . They are a family of histidine-rich polypeptides that probably function as part of the non-immune host defense system and appeared very late in evolution . Cytochromes P450 constitute a superfamily of proteins that existed in virtually all species from prokaryotes to humans. Most of these proteins in the CYP1, CYP2, CYP3 and CYP4 families encode enzymes involved in the metabolism and elimination of potential toxic compounds like drugs or foreign xenobiotics, and are inducible by various environmental stimuli . This last stage includes a small subset of six cytochromes P450 that seem to be very specific of the primates-human clades: CYP2A7, CYP2C9, CYP2D6, CYP2J2, CYP2S1 and CYP3A43. The appearance late in evolution of some of these genes may reflect their functional specificity and it is known that they play a key role in human health .
Network analysis reveals evolutionary age conservation of coexpressed proteins
Finally, we did a functional enrichment analysis of the proteins forming the 11 subnetworks which again showed a coherent biological enrichment in specific functions: (subnetwork 1) immune response; (2) cell cycle; (3) cytoskeleton; (4) RNA splicing; (5) ribosome; (6) extracellular matrix; (7) muscle and contraction; (8) gametes and reproductive process; (9) cell junction and cell adhesion; (10) mitochondria and ATP synthesis; (11) angiogenesis and vasculogenesis. More detailed results for this analysis are included in Additional file 1: Figure S6. Thus, we observed that age-related proteins are predisposed to present expression coregulation and to have close functional links.
Combining age data, functional data and coexpression data can provide a deeper view about the links and roles of the human protein-coding genes. In this way, we observed for example that subnetwork 5 (which contains proteins related with ribosome and translation) presented as expected an overwhelming majority of ancient genes from the Prokaryotic or Eukaryotic age (stages 1 and 2). On the other hand, subnetworks 1 (immune response, leukocyte/lymphocyte activations), 8 (gametes and reproductive process) and 11 (angiogenesis and vasculogenesis) showed a higher proportion of recent genes dated after Vertebrata (Additional file 1: Figure S5). These results agree with studies based on yeast protein physical interaction networks, arguing that proteins preferentially interact with proteins of same age and origin . Moreover, it was previously shown that coexpression networks can be conserved over the evolutionary history, and these genes tend to be functionally related and provide selective advantages . It has been also reported that coexpression networks are found associated to functions like cell adhesion, cell cycle, DNA replication and DNA repair , and this is in agreement with functions found enriched in subnetworks of our analyses: subnetwork 2 and 9 (Additional file 1: Figure S5).
The transcriptomic study of the human gene expression distributions and profiles along 32 tissues provided a global mapping of the activity of most human genes and of the links between them, showing the expected association of samples from common physiological regions, i.e.: the gastrointestinal tract (stomach, duodenum, small intestine, colon and rectum), the hematopoietic and lymphatic system (bone marrow, lymph node, spleen, tonsil and appendix) and the muscle (cardiac and skeletal muscle).
The evolutionary study of the human protein-coding genes placed them in the time-scale of the living species and revealed eight distinct hallmarks along such time-scale (i.e. eight major steps), showing that the HK genes are more ancient than the TE genes. The HK genes present the major emergence in stage 2 of the evolutionary profile, while the TE genes have the major emergence in stage 7. The functional enrichment study found coherent groups of terms and annotations assigned to the genes placed at each evolutionary stage. For example, in stage 1 there were many functional terms on essential metabolic processes, like aerobic respiration and mitochondrial activity; and in stage 2 there were enriched functions related to the nucleus and genome regulation, like chromatin and nucleosome assembly, DNA replication, mRNA processing.
Finally, the study of the pair-wise correlation of the gene expression profiles along tissues allowed building human gene coexpression networks and find modules with functional and biological meaning. The mapping of the age of the protein-coding genes on these networks demonstrated the existence of tight links between age-related proteins.
We acknowledge the funding provided to Dr. J. De Las Rivas group by the Local Government, “Junta de Castilla y Leon” (JCyL, Valladolid, Spain, grant number BIO/SA08/14); and by the Spanish Government, “Ministerio de Economia y Competitividad” (MINECO) with grants of the ISCiii co-funded by FEDER (grant references PI12/00624 and PI15/00328). We also acknowledge a research grant to K.P. Lopes as visiting PhD student at the CiC-IMBCC (from January 2015 to January 2016) provided by the Brazilian National Council of Technological and Scientific Development (CNPq). We also acknowledge a PhD research grant to F.J. Campos-Laborie (“Ayudas a la contratación de Personal Investigador”) provided by the JCyL with the support of the “Fondo Social Europeo” (FSE).
About this supplement
This article has been published as part of BMC Genomics Volume 17 Supplement 8: Selected articles from the Sixth International Conference of the Iberoamerican Society for Bioinformatics on Bioinformatics and Computational Biology for Innovative Genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-17-supplement-8.
The publication costs for this article were funded by the research grant PI12/00624, from the Instituto de Salud Carlos III (ISCiii) co-funded by the Fondo Europeo de Desarrollo Regional (FEDER).
Availability of data and materials
The datasets supporting the results of this article, provided as Additional Files, are also available at: http://bioinfow.dep.usal.es/evolutionaryhallmarks.
KPL carried out the data collection, the databases construction and the computational analyses and comparisons. She also helped to write the manuscript. KPL and JDLR designed the study. JDLR supervised the data analysis, coordinated the study and corrected the manuscript. FJCL helped in the RNA-seq data analysis and in the coexpression study. RAV helped in the reconstruction of the human gene time-scale. JMO participated in the coordination of the study and helped to draft the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Not applicable. Our work only uses human data from the open public databases: ArrayExpress DB (https://www.ebi.ac.uk/arrayexpress); Human Protein Atlas (http://www.proteinatlas.org); Orthologous MAtrix (OMA, http://omabrowser.org/); Taxonomy from NCBI (http://www.ncbi.nlm.nih.gov/taxonomy); TimeTree of life (http://www.timetree.org).
- 21.Hedges SB, Kumar S. Discovering the timetree of life. In: Hedges SB, Kumar S, editors. The Timetree of Life. New York: Oxford University Press; 2009. p. 3–18.Google Scholar
- 27.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B. 1995;57:289–300.Google Scholar
- 43.Myers P, Espinosa R, Parr CS, Jones T, Hammond GS, Dewey TA. The Animal Diversity Web. 2006, University of Michigan. http://animaldiversity.org/.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.