Architecture of polymorphisms in the human genome reveals functionally important and positively selected variants in immune response and drug transporter genes
Genetic polymorphisms can contribute to phenotypic differences amongst individuals, including disease risk and drug response. Characterization of genetic polymorphisms that modulate gene expression and/or protein function may facilitate the identification of the causal variants. Here, we present the architecture of genetic polymorphisms in the human genome focusing on those predicted to be potentially functional/under natural selection and the pathways that they reside.
In the human genome, polymorphisms that directly affect protein sequences and potentially affect function are the most constrained variants with the lowest single-nucleotide variant (SNV) density, least population differentiation and most significant enrichment of rare alleles. SNVs which potentially alter various regulatory sites, e.g. splicing regulatory elements, are also generally under negative selection.
Interestingly, genes that regulate the expression of transcription/splicing factors and histones are conserved as a higher proportion of these genes is non-polymorphic, contain ultra-conserved elements (UCEs) and/or has no non-synonymous SNVs (nsSNVs)/coding INDELs. On the other hand, major histocompatibility complex (MHC) genes are the most polymorphic with SNVs potentially affecting the binding of transcription/splicing factors and microRNAs (miRNA) exhibiting recent positive selection (RPS). The drug transporter genes carry the most number of potentially deleterious nsSNVs and exhibit signatures of RPS and/or population differentiation. These observations suggest that genes that interact with the environment are highly polymorphic and targeted by RPS.
In conclusion, selective constraints are observed in coding regions, master regulator genes, and potentially functional SNVs. In contrast, genes that modulate response to the environment are highly polymorphic and under positive selection.
KeywordsSingle-nucleotide variant Natural selection Potentially functional SNV Immune response genes Drug transporters
3′ Untranslated region
5′ Untranslated region
Derived allele frequency
Exon splicing enhancer or silencer
Intronic splicing regulatory element
Major histocompatibility complex
MicroRNA binding site
Potentially functional SNV
Recent positive selection
Transcription factor binding site
Genetic polymorphisms may contribute to the differences in disease risks and drug responses amongst different individuals. Different forms of genetic variants are found in the human genome. Single-nucleotide variants (SNVs) account for more than 90% of genomic variants and are the major form of genetic polymorphisms .
Some polymorphisms can affect phenotype. These polymorphisms are likely to alter gene expression or protein function leading to modulation of cellular function and influencing disease risk or drug response. However, to identify the single or a group of causal variants for a particular phenotype from a pool of more than 100 million polymorphisms is like ‘finding a needle in a haystack’ and remains a great challenge since not all genetic variants are functionally important.
While non-synonymous SNVs (nsSNVs) have been extensively investigated as they are the most likely to modulate phenotypes via changing the amino acid composition of proteins, synonymous SNVs (sSNVs) and non-coding variants can also account for phenotypic differences since these variants can affect mRNA stability and transcriptional or translational efficiency and have been associated with gene expression levels in various cell lines and tissues [2, 3, 4, 5, 6, 7, 8, 9, 10]. While it may not be feasible to experimentally test every single polymorphism for its function, a variety of bioinformatics tools is now available. These tools can reasonably predict the potential functions of genetic variants, including the likelihood of nsSNVs to disrupt protein structures and/or functions [11, 12, 13, 14, 15, 16, 17, 18, 19], SNVs that potentially modify splicing [20, 21] or transcription , and SNVs in 3′ untranslated regions (3′UTRs) with potential to alter miRNA target sites [23, 24, 25]. There are also comprehensive web tools for predicting various potential functions of both regulatory and coding SNVs, e.g. pfSNP  and PupaSNP finder . They can facilitate our understanding of how polymorphisms can lead to phenotype change and help us prioritize the potentially functional SNVs (pfSNVs) for further investigation.
In addition to the above-mentioned predictive bioinformatics tools, signatures of natural selections can also facilitate the identification of causal variants since variants under natural selection are likely to be functionally significant. Patterns of population differentiation were employed to identify 174 candidate gene loci showing signatures of purifying or positive selection . ‘Long-range haplotype’ methods have been employed to identify a list of targets under recent positive selection (RPS) . Another study utilizing HapMap Phase II data found that negative selection preferentially targets non-synonymous sites, while both non-synonymous and 5′ untranslated regions (5′UTRs) show an excess of highly differentiated SNVs, suggesting the evidence of positive selection as well. The authors also reported that variants under selective pressures (either positive or negative) occur more frequently in disease-related genes and are more likely to contribute to disease phenotypes .
Although previous reports examined the association of SNVs in regulatory regions with natural selection, these studies were limited. They either merely focussed on only one class of regulatory SNVs (e.g. SNVs within miRBS) , on SNVs residing in non-coding regions  or within regulatory elements  without predicting whether these SNVs alter function (e.g. if a SNV will abolish or create a regulatory site).
In this study, we present the architecture of all genetic polymorphisms of the human genome, focusing on SNVs that are potentially functional and/or positively selected and the pathways that they reside.
Polymorphisms are most constrained in coding regions
To further investigate the regions within genes that may be most subjected to negative selection pressure, the derived allele frequencies (DAFs) of SNVs in different regions are further compared using allele frequency data of the International HapMap Project individuals. As evident in Fig. 1d, coding regions (red) contain a higher percentage of rare SNVs, defined as having DAF < 0.05 in all the three population groups, namely African, East Asian and European populations. nsSNVs (brown) within the coding region are also enriched with rare alleles compared to sSNVs (orange) (Fig. 1e). As negative selection increases the fraction of rare alleles , our results from the analysis of allele frequency data again suggest that coding SNVs, especially nsSNVs, tend to be targeted by negative selection.
Signatures of natural selection are also examined through determining population differentiation using the FST statistics  across the different population groups (African, East Asian and European) since high FST is associated with a positive selection , while low FST is associated with a negative selection . As shown in Fig. 1f, coding SNVs have lower median FST than SNVs in other regions including 5′UTRs, 3′UTRs and introns (Bonferroni corrected p values < 0.001 by Mann-Whitney test). In fact, zero-FST SNVs are significantly over-represented in coding exons (Bonferroni corrected p value < 0.001 by Fisher’s exact test) (Fig. 1g, non-shaded bars). Patterns of RPS are examined using linkage disequilibrium (LD) and haplotype-based methods. As shown in Fig. 1g (shaded bars), exonic regions, i.e. 5′UTRs, coding regions and 3′UTRs, are significantly less enriched with RPS SNVs (Bonferroni corrected p values < 0.001 by Fisher’s exact test), while introns are more enriched with RPS SNVs (Bonferroni corrected p value < 0.001 by Fisher’s exact test).
Taken together, coding regions are generally under strong negative selection pressures as they show the lowest densities of SNVs and INDELs (especially frame-shift INDELs), the highest proportion of rare alleles with less enrichment of RPS SNVs. Notably, coding SNVs are also the least population differentiated.
Potentially functional SNVs are under natural selections
To evaluate if pfSNVs are selectively constrained, the proportions of rare alleles (DAF < 0.05) of pfSNVs and non-functional SNVs (nfSNVs) in a specified region are compared. As evident in Fig. 2c, except for pfSNVs predicted to alter TFBS, most pfSNVs are enriched with rare alleles. Notably, pfSNVs predicted to be deleterious to protein function are more than 1.5-fold more enriched with rare alleles compared to nfSNVs in coding regions indicating that these pfSNVs may be under the strongest negative selection pressure.
Conversely, pfSNVs predicted to be deleterious to protein function are found to be the least significantly enriched with RPS SNVs (Fig. 2d, brown) (Bonferroni corrected p value < 0.001 by Fisher’s exact test) consistent with the earlier observation indicating that these pfSNVs are under the strongest negative selection. The other pfSNVs are not under any significant RPS except for pfSNVs predicted to alter ISRE (Bonferroni corrected p value = 0.019 by Fisher’s exact test) (Fig. 2d). In addition, more than half of the RPS pfSNVs are predicted to affect ISRE (~ 53%) followed by TFBS (~ 30%) while least RPS pfSNVs (3%) are predicted to be deleterious to protein function (Fig. 2e).
Highly polymorphic vs conserved genes in the human genome
The density of SNVs within each gene, normalized against their length, is determined for all > 20,000 protein-coding genes in the human genome. Most genes have approximately four SNVs per kilobase. Although ~ 97% of genes carry at least one SNV, 149 genes do not contain any polymorphism as SNV or INDEL. More than half of these 149 non-polymorphic genes were yet to be annotated. Nonetheless, the annotated non-polymorphic genes are significantly over-represented in histone 2A and 2B families and involved in nucleosome assembly (Fig. 3b, grey shaded; Additional file 1: Table S1).
We then focus on polymorphisms within the coding region since this region encodes the functional protein. Nearly 20% (4389) of genes in the human genome are found to be functionally conserved with no nsSNVs nor coding INDELs. These genes are enriched in various categories including GTPases and translational elongation (Fig. 3b, unshaded; Additional file 1: Table S2). Seventy genes are found to carry ultra-conserved elements (UCEs)  in their coding regions; hence, these genes are evolutionarily conserved. These ultra-conserved genes include the homeobox proteins and are primarily involved in the transcription factor activity, RNA splicing and pattern specification (Fig. 3b, shaded black; Additional file 1: Table S3).
Taken together, genes involved in the basic fundamental biological process, for example, gene regulation, are highly conserved during evolution, being least polymorphic within the human species as well as between species.
Highly polymorphic genes are mainly involved in immune responses
A total of 512 highly polymorphic genes (see Additional file 1: Supplementary Methods) are identified. These genes are the most significantly over-represented in immune response pathways, as well as in the pathogenesis of a number of autoimmune diseases, including Graft-versus-host disease and type I diabetes mellitus (Fig. 3c, shaded grey; Additional file 1: Table S4). They are primarily in the MHC class I-related and class II-related protein families, involved in antigen presentation and processing. Majority of the MHC class I and class II genes are located on chromosome 6q21.3 which is the most polymorphic region in the human genome and facilitate the generation of diverse antigens to confer a selective advantage to fight infection . The number of pfSNVs in the MHC genes (2234) is higher than the average number of pfSNVs in the other human genes (306). To determine if pfSNVs are significantly over-represented in this MHC gene family of 23 genes, sampling of 23 random human genes with similar gene length is performed 1000 times. The number of pfSNVs in the 23 random genes for each cycle is plotted to obtain an empirical distribution. The MHC family of genes, with 2234 pfSNVs, is found to carry significantly more pfSNVs than 1000 different sampling of 23 random human genes (empirical p value < 0.001 by random sampling test) (Fig. 3d). Interestingly, despite the enrichment of pfSNVs in the MHC gene family, there are significantly fewer nsSNVs predicted to be deleterious in the MHC family of genes, compared to other human genes (Bonferroni corrected p value < 0.001 by Fisher’s exact test) (Fig. 3e). Hence, this family of proteins can nimbly respond to different infection, through a diversity of different regulatory mechanism, including differential transcription factor/miRNA binding/splicing.
Lastly, highly polymorphic genes are also significantly enriched in drug metabolism, cytochrome P450 (CYP450), arachidonic acid and caffeine metabolism pathways (Fig. 3c, Additional file 1: Table S4).
Drug response genes are most affected by potentially deleterious polymorphisms in coding regions
Notably, not only are the ABC transporters significantly enriched in predicted deleterious coding SNVs (Fig. 4b), they are also enriched with nsSNVs that are predicted to cause nonsense-mediated decay (NMD) resulting in the degradation of the mRNA transcripts with premature stop codon (Fig. 4c, clear bars; Additional file 1: Table S6). Genes containing nsSNVs predicted to cause NMD are also significantly enriched in cell cycle processes including mitosis (Fig. 4c, clear bars; Additional file 1: Table S6).
While the ABC transporters are significantly enriched with nsSNVs predicted to cause NMD, the other family of genes involved in drug response, the CYP450, is significantly enriched with genes having another form of deleterious polymorphism, namely, INDELs that cause frame-shift, which have deleterious effect on protein function (Fig. 4c, black bars; Additional file 1: Table S7). Taken together, genes involved in the xenobiotic response, including drug transport and metabolism, are significantly enriched with potentially deleterious coding polymorphisms.
Signatures of natural selection on the nsSNVs in drug-response genes are investigated. Interestingly, unlike the CYP450 family (5/57 genes), not only are the ABC transporters enriched with potentially deleterious coding polymorphisms, they are also significantly enriched (p value < 0.001 by Fisher’s exact test) with genes carrying nsSNVs under RPS (11/45 genes) (p value = 0.24 by Fisher’s exact test) compared to the other genes in the human genome (Fig. 4d). As genes under positive selection also show significant population differentiation , we evaluate if the drug response genes are also enriched with nsSNVs that show significant population differentiation (FST > 0.3). Similar to the above observations, Fisher’s exact test revealed that the ABC transporters (9/45) (p value < 0.001 by Fisher’s exact test) but not the CYP450 genes (3/57) (p value = 0.76 by Fisher’s exact test) are significantly enriched with nsSNVs that show significant population differentiation (Fig. 4d). Hence, the nsSNVs in the ABC transporter family are under strong positive selection pressure.
As evident from the table in Fig. 4e, all, except one (rs4968839), of the nsSNVs at the ABC transporter family, which showed evidence of RPS or significant population differentiation, are predicted to either have a potentially deleterious effect on protein function or alter ESE/ESS modulating the proportion of the different splice forms. Notably, > 40% of these nsSNVs have been reported to be significantly associated with various phenotypes including clinically relevant ones [43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59] (Fig. 4e), highlighting the functional importance of the nsSNVs under natural selection at the ABC transporter gene family.
In this study, we comprehensively investigate the architecture of genetic polymorphisms in the human genome and demonstrate that polymorphisms in coding regions, especially those affecting protein sequences and/or functions, are the most constrained in the human genome, consistent with previous observations . In particular, frame-shift INDELs in coding regions are under strong purifying selection, consistent with the previous observation of the strongest depletion of frame-shift INDELs in coding regions, which are enriched with gene expression association possibly contributed by NMD .
Through the interrogation of nine global populations, we demonstrate that the median FST of SNVs at the coding regions is lower than that of the other regions of the genome (Fig. 1f). Moreover, the observations that coding regions have the lowest SNV density (Fig. 1b), excess of rare alleles (Fig. 1d) and enrichment of SNVs with no population differentiation (Fig. 1g) all indicate that coding SNVs are constrained by purifying selections. This is further strengthened by the observation that potentially deleterious nsSNVs show enrichment of rare alleles, compared to non-deleterious nsSNVs (Fig. 2c). Furthermore, coding regions contain significantly fewer INDELs that cause frame-shift (Fig. 1c). Hence, polymorphisms predicted to be deleterious to protein functions are under the strongest purifying selection.
In addition to the potentially deleterious nsSNVs, the potential functions and signatures of natural selections in the other polymorphisms are also investigated. Through computational prediction of the potential functions of SNVs, we observe that significantly more SNVs are predicted to alter TFBS than to code for a potentially deleterious nsSNV (Fig. 2b). Additionally, except for SNVs affecting TFBS, the other pfSNVs show more significant enrichment of rare alleles than nfSNVs in the same regions (Fig. 2c). Hence, pfSNVs are more constrained than the other SNVs in the same region, perhaps because they affect the functionally important regulatory sites. This observation is congruent with previous studies that reported stronger negative selection on conserved miRBS than other conserved 3′UTR sequences , though different prediction algorithms and SNV data were used. Levenstien and Klein also reported similar observation that SNVs in a few functional classes, e.g. non-synonymous, methylation sites and miRBS, are under negative selection compared to genome, and suggested that they are promising candidates for functional characterization . While these previous studies examined SNVs residing within regulatory consensus sites of the promoter, this study focuses on SNVs that are predicted to either disrupt or create regulatory sites. Hence, the negative selective pressure on several different classes of pfSNVs suggests that pfSNVs are likely to influence gene functions and contribute to phenotypic changes.
This study also highlights that ‘master regulators’ of gene expression tend to be functionally conserved and maintained during evolution, while regulation of specific target genes is less constrained and flexible. This is evident from the observation that ‘master regulator’ involved in general gene regulation including epigenetics (e.g. histones), transcription/translation and splicing is significantly enriched with non-polymorphic, functionally conserved and ultra-conserved genes (Fig. 3b). In contrast, an average gene contains more SNVs affecting its own regulation than altering its function as evident from the observed enrichment of SNVs predicted to alter TFBS at promoters, ESE/ESS in coding regions and ISRE within introns compared to SNVs predicted to result in deleterious non-synonymous amino acid changes (Fig. 2b). This is consistent with previous observation that genetic variants occur more frequently in the miRNA target regions, compared to the functional regions within miRNAs . In addition, Hsiao et al. demonstrated that alternative splicing events regulated by intronic genetic variants tend to be under positive selection , which is consistent with our results that intronic SNVs that potentially affect splicing mechanisms show enrichment of RPS SNVs, compared to the other functional classes (Fig. 2d). On the other hand, splicing factors are more conserved during evolution , and our study demonstrates that UCEs were enriched in the genes involved in RNA splicing (Fig. 3b).
On the other hand, genes that modulate response to environmental changes are the most polymorphic. The immune response MHC class I and class II genes, implicated in the pathogenesis of several autoimmune diseases, reside in the most polymorphic region of the human genome [36, 64] and carry the highest density of SNVs (Fig. 3c). Notably, this family of genes is significantly enriched in SNVs predicted to alter various regulatory elements including TFBS, ESE/ESS, ISRE and miRBS rather than protein function (Fig. 3e). In fact, while none of the RPS pfSNVs in the MHC family is predicted to cause a deleterious effect on protein function, 87 pfSNVs are found to display signature of RPS (Additional file 1: Table S8). Hence, the regulatory regions of the MHC family of genes are likely to be under strong positive selection, as previously suggested , and are functionally significant, regulating gene expression to modulate phenotypes. For example, a very well-studied polymorphism, rs9378249 upstream of the HLA-B gene, has previously been associated with bipolar disorder [66, 67] and hypertension . This polymorphism is predicted to alter TFBS and exhibits the signature of RPS; hence, it may be a causal variant for the various diseases although the underlying molecular mechanism requires further validation.
Another class of genes that modulates response to the environment is the drug/xenobiotic response families of genes including the ABC transporter and the CYP450 metabolism families of genes. Unlike the MHC immune genes, which are significantly enriched in regulatory SNVs predicted to modulate gene expression, these drug response gene families are enriched in SNVs that affect the functions of the proteins, namely nsSNVs predicted to be deleterious (Fig. 4b). Previous reports also highlighted the high SNV density and excess of rare nsSNVs of the CYP450 pathway [68, 69] with 90–95% individuals carrying at least one actionable variant in CYP450 genes . Another report predicted that ~ 32% (1949/6165) of SNVs at the CYP450 loci are putatively functional with CYP4F12 carrying amongst the most novel putatively functional variants  which is consistent with our observations that CYP4F12 is enriched with the highest number of pfSNVs (Additional file 1: Table S9). In addition to the pfSNVs, RPS and highly population differentiated (FST > 0.3) SNVs are significantly represented in the ABC transporter genes but not in the CYP450 genes (Fig. 4d) suggesting that the ABC transporter genes may be under stronger positive selection than the CYP450 genes. For example, rs17822931, a coding variant at the ABCC11 earwax determinant gene , is found to be highly differentiated amongst populations, and the A allele is positively associated with adaptation to cold climate . Greater than 40% of these RPS and/or population differentiated SNVs in the ABC transporter genes have been associated with phenotype modulation and even diseases, e.g. Alzheimer’s and Schizophrenia (table in Fig. 4e). Nearly all the coding SNVs at the ABC transporter gene family that display the signature of RPS or are significantly population differentiated are also predicted to alter ESE/ESS suggesting that differential splicing may also play an important role in the ABC genes to generate diverse splicing forms to respond to different environment.
Hence, adaptive genes that respond to environmental changes are likely to be highly polymorphic and subjected to strong positive selection pressures consistent with previous reports that variants associated with inflammatory diseases show evidence of RPS , and genes associated with pharmacogenomics show higher level of population differentiation, as a signature of positive selection . Regulation of gene expression through variants that alter TFBS in the MHC gene family as well as modulation of protein function and/or splicing pattern in the ABC gene families highlight the different ways by different families of genes to adapt to the environment.
In conclusion, this study elucidates the overall architecture of the genetic polymorphisms, namely SNVs and INDELs, in the human genome. The coding region is found to be under strong negative selection, as being the least population differentiated, showing lowest densities of SNVs and INDELs (especially frame-shift INDELs), the highest proportion of rare alleles with less enrichment of RPS SNVs. SNVs predicted to be functional are found to be under negative selection with enrichment of rare alleles. Families of genes which are ‘master regulators’ of gene expression including those involved in epigenetics, transcription, translation or splicing are found to be least polymorphic, functionally conserved and/or enriched with ultra-conserved elements. Finally, genes that modulate response to the environment are the most polymorphic with the MHC gene family, which is involved in immune response, being the most polymorphic while genes involved in drug/xenobiotic response, including ABC transporter and CYP450 genes, are the most enriched with functional nsSNVs.
Polymorphisms in the human genome
Polymorphisms from the dbSNP database (Build 131) were mapped to different genic regions (5′UTRs, coding regions, introns and 3′UTRs) of the human genes (NCBI Genome Build 37.1) with those residing outside genes classified as intergenic variants.
To minimize false-positive SNPs originating from highly paralogous sequences, which were estimated to be ~ 8% of biallelic coding SNVs in dbSNP129 , only polymorphisms, which mapped to a single location in the genome and have been validated using a non-computational method or have allele frequency information (e.g. from 1000 Genomes project), were included in this study. In the 1000 genomes project, the variant assignment was restricted to ‘accessible genome’, whereby ambiguously placed reads or unexpectedly high or low numbers of aligned reads were excluded (~ 15% genome) to minimize the detection of false-positive variants . To evaluate if our data is valid, SNV density data of this study was compared and found to be comparable to the SNV density data calculated from whole-genome sequencing of 179 HapMap individuals  of 1000 genomes project. For example, similar to our observations using dbSNP data, the MHC gene loci from the 1000 genomes sequencing data were also found to be significantly more polymorphic than other human genes (p < 0.001 by Mann-Whitney test). Hence, results from sequencing data from the 1000 genomes project were consistent with the findings in this study using dbSNP data, suggesting that, in spite of the potential ascertainment biases and sequencing artefacts inherent in the dbSNP database, our findings about the enrichment of SNPs in MHC genes are valid.
Two major forms of genetic polymorphisms, SNVs and INDELs, were investigated. SNV/INDEL density within a particular genic region, e.g. 5′UTR, was calculated as the number of SNVs/INDELs divided by the length of that region. For genes with multiple transcripts, the mean densities were taken. Genes lacking polymorphism in all genic regions (promoter, 5′UTR, coding, intron, 3′UTR) were regarded as non-polymorphic genes. Highly polymorphic genes were identified based on a binomial model as described in Additional file 1.
Allele frequency of SNVs in the human genome was determined in the three population (East Asian, African and European) groups (HapMap release 28) as described in Additional file 1.
FSTstatistics  using the pooled allele frequencies in the three population groups was then calculated for each of the genotyped and polymorphic loci. Two groups of SNVs, namely, (1) zero-FST SNVs (FST = 0) and (2) high-FST SNVs (FST > 0.3), were further analysed. The fold enrichment of zero-FST or high-FST SNVs in a specific genic region (e.g. coding region) was determined by calculating the percentage of these SNVs in the coding region divided by the percentage of all the genotyped SNVs in the same region, and the significance of enrichment is determined using the Fisher’s exact test. Fold enrichment, which significantly deviates from one, indicates that these SNVs are under- or over-represented in these regions.
Genic regions that display signatures of negative selection were previously reported to have excess rare derived alleles . Hence, to identify the regions of genes subjected to negative selection, we determined if there is a statistical enrichment of rare SNVs (DAF < 0.05) in each genic region using the Fisher’s exact test.
SNVs displaying signatures of RPS were identified using LD- or haplotype-based methods as described in [26, 77]. To identify the regions enriched in RPS SNVs, the percentage of RPS SNVs within the region was compared with the percentage of RPS SNVs in the whole genome, and significance of difference was determined using the Fisher’s exact test.
UCEs are sequences within the genome that are 100% identical to the sequences with the mouse and the rat genomes , hence displaying evolutionary conservation and signatures of strong negative selection. A total of 481 UCEs have been identified , of which 70 are evolutionarily conserved coding sequences, overlapping with coding regions.
Potential functions of SNVs
The pfSNP database (http://pfs.nus.edu.sg/) , which integrates a variety of bioinformatics prediction algorithms, was used to evaluate potential functions of all the SNVs in the human genome that alter TFBS, protein functions, splicing events and miRBS. The prediction algorithms employed in this study are described in  and Additional file 1.
The Database for Annotation, Visualization and Integrated Discovery [78, 79] was utilized for functional annotation of the genes of interest. The enrichment of the genes in PANTHER protein family, GO-molecular function, GO-biological process and KEGG pathway was investigated. Benjamini-Hochberg-corrected p value < 0.05 signifies statistical significance.
This work was supported by a grant from the National Medical Research Council (NMRC) 1131/2007 and BioMedical Research Council - Science and Engineering Research Council (BMRC-SERC) 112 148 0008, as well as Block Funding from the National Cancer Centre Singapore and Duke-NUS Graduate Medical School Singapore to A/P GL Lee.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
CGLL and SSC conceived and designed the study. YJ conducted the statistical analyses. JW predicted the potential functions of the SNVs and identified the SNVs under RPS. MB analysed the population differentiation levels of the SNVs. YJ and CGLL wrote the manuscript, and SSC helped edit the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 43.Abellan R, Mansego ML, Martinez-Hervas S, Martin-Escudero JC, Carmena R, Real JT, Redon J, Castrodeza-Sanz JJ, Chaves FJ. Association of selected ABC gene family single nucleotide polymorphisms with postprandial lipoproteins: results from the population-based Hortega study. Atherosclerosis. 2010;211:203–9.CrossRefGoogle Scholar
- 48.Jordan de Luna C, Herrero Cervera MJ, Sanchez Lazaro I, Almenar Bonet L, Poveda Andres JL, Alino Pellicer SF. Pharmacogenetic study of ABCB1 and CYP3A5 genes during the first year following heart transplantation regarding tacrolimus or cyclosporine levels. Transplant Proc. 2011;43:2241–3.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.