figure bfigure b

Introduction

Diabetes is a leading health issue that causes severe disease and has a huge economic burden worldwide [1, 2]. Many epidemiological studies have assessed the causes of diabetes to provide an evidence base for disease prevention. For example, in type 2 diabetes, an exposure-wide umbrella review including 142 factors identified a wide range of biomarkers, medical conditions and dietary, lifestyle, environmental and psychosocial factors that were associated with the risk of disease [3]. The picture is somewhat different for type 1 diabetes owing to the strong genetic contribution and less influence of external factors. In addition to genetic factors, only a few environmental factors, including birthweight and childhood obesity, have been linked to type 1 diabetes [4]. While results from observational studies have provided initial evidence of potential exposures associated with diabetes, residual confounding and reverse causation limit our understanding of the complex set of factors underlying the development of diabetes. Thus, whether the factors observed in previous observational studies are causally associated with the risk of diabetes remains unconfirmed. A clear appraisal of the causal risk factors for diabetes is of great importance for disease prevention.

Mendelian randomisation (MR) is an epidemiological method that can strengthen causal inference by using genetic variants as instrumental variables [5]. An instrumental variable is a variable that satisfies three main conditions: (1) it is associated with the exposure (relevance assumption); (2) it does not share a common cause with the outcome (independence assumption); and (3) it is related to the outcome only through the exposure (exclusion restriction assumption) (Fig. 1). The text box summarises the common terms used in MR studies and their key concepts and limitations. As genetic variants are randomly assorted at conception and thus are generally unassociated with environmental and self-adopted factors, MR is believed to be less affected by measured and unmeasured confounding factors. This narrative review aims to summarise the evidence on potential causal risk factors for diabetes by integrating published MR studies on type 1 and 2 diabetes, and to reflect on future perspectives of MR studies on diabetes.

Fig. 1
figure 1

Study design and assumptions of MR analysis. The process of MR analysis is shown from top to bottom. In detail, MR analysis is based on genome-wide association analyses of the exposure and outcome. Genetic instruments for the exposure are independent SNPs that are strongly associated with the exposure of interest in a genome-wide association analysis in an unselected sample, such as a general population. Likewise, summary-level data on the outcome are obtained from a genome-wide association analysis of a binary phenotype that defines the population into cases and controls. The directed acyclic graph represents the study design and assumptions of MR analysis; G indicates the genetic instruments, X indicates the exposure of interest, Y indicates the outcome of interest, and U indicates the confounders. There are three important assumptions in MR analysis. Assumption 1 indicates that the genetic variants used as the instrumental variable should be robustly associated with the exposure. Assumption 2 indicates that the instrumental variable should not be associated with any confounders. Assumption 3 indicates that the instrumental variable used should affect the risk of the outcome only through the risk factor, not through alternative pathways. Regarding causal inference, the MR design resembles that of an RCT; specifically, the random allocation of genetic variants in MR mimics the randomisation process of RCTs, which minimises confounding effects. Source: Manhattan plot reproduced from Ikram et al [75], available under a CC BY 2.5 licence (https://creativecommons.org/licenses/by/2.5/). This figure is available as a downloadable slide

Causal exposures and risk factors for type 1 diabetes

Because there is a strong genetic component in type 1 diabetes, MR studies of type 1 diabetes are limited and only a few potentially modifiable risk factors have been identified (Table 1). Low birthweight [6], childhood obesity [6, 7] and a higher abundance of the Bifidobacterium genus [8] have been associated with an increased risk of type 1 diabetes. MR studies have found no associations of adult body size [6], features of the liver or pancreas [9] and serum 25-hydroxyvitamin D levels [10] with type 1 diabetes.

Table 1 MR studies on the causes of type 1 diabetes

A protein-wide MR study examined the associations of 1611 circulating protein biomarkers with the risk of type 1 diabetes and identified associations for signal regulatory protein gamma, IL-27 Epstein–Barr virus-induced 3 and chymotrypsinogen B1 [11]. These findings linking certain viral infections, particularly by enteroviruses (e.g. coxsackievirus), with the risk of type 1 diabetes are consistent with recent observational studies [12], thus providing an avenue to better understand and prevent this disease.

Causal exposures and risk factors for type 2 diabetes

Most MR studies on glycaemic outcomes have focused on type 2 diabetes. Our previous exposure-wide MR study examined the associations of 97 exposures with risk of type 2 diabetes using data from the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) consortium (74,124 cases and 824,006 controls). In total, 34 factors that were possibly causally associated with the risk of type 2 diabetes were identified [13]. In Table 2 we summarise and update the associations of a wide range of exposures with type 2 diabetes from MR studies on diabetes.

Table 2 MR studies on the causes of type 2 diabetes

Somatic and psychological health status

The results of MR studies of somatic and psychological health status in relation to type 2 diabetes are summarised in Table 2. Contradictory associations were reported for LDL-cholesterol and type 2 diabetes, with an inverse association observed in a European population and a positive association in an African population [13,14,15]. A recent study further identified that the diabetogenic effect of low levels of LDL-cholesterol might be mediated by increased BMI [16]. Lower levels of bilirubin (a marker of liver function) [17], testosterone [18] and thyrotropin [19] were associated with an increased risk of type 2 diabetes in some MR studies, but not all [13, 20,21,22]. Sex-specific associations were observed for testosterone [23, 24], with an increased risk of type 2 diabetes in women but a decreased risk in men with higher testosterone levels [23]. Insomnia, but no other sleep-related traits, was associated with type 2 diabetes [13].

Adiposity-related factors

Similar to the large body of evidence from prospective observational studies, childhood obesity, adulthood overall obesity and central obesity, excessive liver fat and whole-body and visceral fat mass were all associated with an increased risk of type 2 diabetes [9, 25,26,27,28,29]. Plasma levels of adiponectin, an adipocyte-secreted hormone, are decreased in individuals with obesity, which was associated with an increased risk of type 2 diabetes [30]. However, this association was inconsistent in MR sensitivity analyses [30], suggesting that the association may be biased by pleiotropy (e.g. from fat mass). Several MR studies have found that lower birthweight, independent of adult body weight, is associated with a higher risk of type 2 diabetes [31], which may suggest a role of the uterine environment and fetal development in the development of type 2 diabetes.

Lifestyle and nutritional factors

MR studies have strengthened the causal role of cigarette smoking in type 2 diabetes and failed to convincingly confirm the effects of physical activity and alcohol and coffee consumption on type 2 diabetes risk [13, 32, 33]. Although alcohol consumption instrumented by 83 SNPs was not associated with type 2 diabetes, the main SNP that associates with higher alcohol consumption and alcohol abuse in European populations (i.e. rs1229984 in the ADH1B gene) was significantly associated with an increased risk of disease [13]. A robust inverse association between coffee consumption and type 2 diabetes risk has been reported in many observational studies [34]. However, genetically predicted higher coffee consumption was not associated with a decreased risk of type 2 diabetes in MR studies [13, 35]. Pleiotropic effects of the SNPs used may cause this lack of association (e.g. from fat mass or other hot beverages or caffeine-containing drinks) and the inverse relationship between genetically proxied coffee consumption and plasma caffeine levels (i.e. the genetic variants with the strongest association with higher coffee consumption are associated with lower plasma caffeine levels) [36].

An MR study found an inverse association between circulating 25-hydroxyvitamin D levels and type 2 diabetes risk [37], and this association might be driven by the vitamin D synthesis pathway [37,38,39]. Lower levels of vitamin K1 (phylloquinone) [40] and higher levels of iron [41] were associated with an increased risk of type 2 diabetes. Eight out of ten plasma fatty acids were found to be associated with type 2 diabetes; however, the associations, with the exception of palmitoleic acid, were driven by SNPs in the FADS1/2 genes [42]. Thus, whether these associations were biased by this pleiotropic gene, which encodes a key enzyme in fatty acid metabolism, remains unknown [43].

Despite the popularity of MR studies for investigating dietary and lifestyle exposures in diabetes and cardiometabolic diseases, there are unique challenges in such studies of these time-varying, compositional and intercorrelated exposures [44]. For example, MR analyses of nutritional exposures based on genetic instruments for a single measure of diet collected in midlife bear an underlying assumption that, on average, the dietary assessment tool is representative of long-term habitual intake. Furthermore, like many behavioural exposures, nutrition is intercorrelated with numerous other lifestyle and environmental factors. Recent studies have documented that confounding and reverse causation affecting traditional epidemiological studies may also impact genetic associations [45]. A recent study has shown that half of the genetic variants associated with diet are the consequence of increased BMI and that it is possible to use genetics to correct for confounding and reverse causation to strengthen genetic correlations and causal inference [45].

IGF-1 and inflammatory biomarkers

Genetically predicted elevated levels of IGF-1, a peptide hormone similar in molecular structure to insulin, were positively associated with the risk of type 2 diabetes [46]. Given the heterogeneous effects of IGF-1-associated SNPs on type 2 diabetes, a recent MR analysis examined several clusters of IGF-1-associated SNPs in relation to type 2 diabetes and specified that this overall positive association might be explained by pathways related to amino acid metabolism and genomic integrity [47]. However, the main cluster of IGF-1-associated SNPs that were associated with a decreased risk of type 2 diabetes mapped to the growth hormone signalling pathway [47], possibly mediated by pleiotropic effects from fat mass, as growth hormone secretion is decreased in obesity [48].

As for inflammatory biomarkers, the IL-1 and IL-6 pathways may be involved in the development of type 2 diabetes [13, 49], even though the evidence is weak. One additional minor allele of the IL6R SNP rs7529229 (corresponding to the effect of taking tocilizumab 4–8 mg/kg every 4 weeks) was suggestively associated with a reduced risk of type 2 diabetes (OR 0.97, 95% CI 0.94, 1.00), which implied a possible role of IL-6 receptor blockade in type 2 diabetes prevention.

Circulating metabolites and proteins

One of the first demonstrations of the use of MR to study circulating metabolites was in relation to the previously reported epidemiological association between plasma levels of branched-chain amino acids (BCAAs) and the risk of type 2 diabetes [50]. In an MR analysis using genetic variation at the PPM1K locus (which encodes a mitochondrial phosphatase that activates branched-chain α-ketoacid dehydrogenase [BCKD]), an increase in leucine, isoleucine and valine levels was associated with an increased odds of type 2 diabetes [50]. However, given that BCKD has a range of substrates besides leucine, isoleucine and valine, untangling which of these substrates causes type 2 diabetes is challenging. A separate MR analysis of BCAAs showed that higher BCAA levels have no causal effects on insulin resistance but, rather, genetically raised insulin resistance drives higher circulating fasting BCAA levels [51]. A metabolome-wide MR approach confirmed evidence of the strong reverse causal effect, indicating that the genetic predisposition to type 2 diabetes may trigger early changes in valine and leucine [52]. Other products of amino acid catabolism, such as 2-aminoadipic acid (2-AAA) or α-hydroxybutyrate, are strongly associated with incident type 2 diabetes in observational studies [53], but MR studies have failed to demonstrate evidence of causality [54]. There are many reasons for the discrepancies between observational studies and MR studies, but the fact that observational studies have been conducted in a mixture of individuals with normoglycaemia and impaired glucose tolerance could explain these differences. A study in the Framingham cohort restricted to individuals with strict normoglycaemia at baseline (fasting glucose <5.6 mmol/l) provided evidence of a subset of 19 metabolites associated with the risk of diabetes among apparently healthy individuals [55]. Pathway enrichment analyses and MR showed that metabolites in the nitrogen metabolism pathway are causally related to the development of diabetes [55].

Integration of genomic and small molecule data across platforms enables the discovery of regulators of human metabolism and translation into clinical insights. A recent genome-wide meta-analysis of 174 metabolite levels across six cohorts, including up to 86,507 participants, identified ~500 genetic loci influencing metabolite levels [56]. Among many relevant findings for dysglycaemia, the study provided evidence that a missense p.Asp470Asn (rs17681684) variant in the GLP2R gene, which encodes the receptor for glucagon-like peptide 2, was associated with a 4% higher type 2 diabetes risk. Findings from a metabolome-wide MR analysis further identified new metabolites that potentially play a causal role in type 2 diabetes, including betaine, glutamic acid, lysine, alanine and mannose [52].

High-throughput detection and quantification of serum proteins in a large human population can provide insight into the molecular processes underlying diabetes risk. A protein-wide MR study examined the associations of 164 proteins with genome-wide association summary statistics available from the independent INTERVAL study and identified 16 proteins as potentially having a causal effect on the development of type 2 diabetes [57]. A recent protein-wide MR study examined the associations of 1089 circulating protein biomarkers with the risk of type 2 diabetes [58]. The analyses identified 20 proteins that might be causally associated with type 2 diabetes. These findings may provide evidence to support therapeutic development in type 2 diabetes.

MR studies on circulating metabolites and proteins usually employ a cis-variant located in an encoding gene region as the instrumental variable, which satisfies three key assumptions of MR. However, these MR associations can still be influenced by the genome-wide associations analyses on metabolites and proteins as well as corresponding profiling process (possible bias caused by batch effects) [59] and different high-throughput platforms [60]. Of note, using cis-variants as instrumental variables may not always completely rule out horizontal pleiotropy, especially when one gene regulates several metabolites and proteins that are not in a common pathway. In this case, multivariable MR analysis or removing the pleiotropic SNPs may help reduce this bias.

Gut microbiota and related metabolites

With increasing evidence suggesting that the human gut microbiome plays a role in immune function and metabolic disease, there is a need to discriminate between microbiome features that are causal for disease and those that are a consequence of disease or its treatment. A study including genome-wide genetic data, gut metagenomic sequencing and measurements of faecal short-chain fatty acids showed that a host genetic-driven increase in gut production of butyrate was associated with improved insulin response following an oral glucose test. In contrast, abnormalities in the production or absorption of propionate were causally related to an increased risk of type 2 diabetes [61]. Another two-sample MR study identified seven genera of gut microbiota nominally associated with type 2 diabetes [62]. For gut microbiota-related metabolites, a separate study found that genetically predicted higher trimethylamine N-oxide and carnitine levels were not associated with higher odds of type 2 diabetes. However, the study found possible associations of high choline and low betaine levels with an increased risk of type 2 diabetes [63]. Of note, although many genome-wide association analyses of the gut microbiome have been carried out, high-quality MR studies on the gut microbiome in relation to diabetes are limited [8]. This may raise doubt over the applicability of host genetic variants as an instrumental variable to mimic the function of the gut microbiome.

Assessment of included MR studies on diabetes

The overall quality of the MR studies included was satisfactory, with careful genetic instrument selection criteria, comparatively large sample sizes and different approaches to testing the robustness of the findings. As for the examination of the assumptions of MR, assumption 1 was usually found to be satisfied by using genetic variants associated with the exposure of interest at the genome-wide significance level. However, there was no unified threshold for linkage disequilibrium of SNPs. Using a high or low threshold of linkage disequilibrium could lead to an inflated rate of type 1 and 2 errors, respectively. As MR analysis can minimise confounding, the associations are less likely to be biased by confounding but cannot be completely immune to this bias, especially when genetic instruments have large pleiotropy effects. Except for studies using individual-level data, whether genetic instruments were primarily associated with other phenotypes or were associated with confounders was rarely examined in these MR studies. The most common bias in MR analysis is horizontal pleiotropy caused by violation of assumption 3, the exclusion restriction assumption, which means that genetic variants affect the outcome through alternative pathways, not only through the exposure of interest. The associations with type 1 and 2 diabetes summarised in this review were robust in sensitivity analyses, and most studies used MR-Egger or MR pleiotropy residual sum and outlier (MR-PRESSO) to detect potential horizontal pleiotropy. Of note, even though statistical methods can detect and minimise the influence of horizontal pleiotropy, instrumental variable selection is a crucial process for reducing the bias. Using genetic variants in genes with well-understood biological functions as instrumental variables usually satisfies the assumptions of MR analysis and thus generates precise and correct associations. However, it is difficult to identify specific genetic variants for certain exposures, especially for health behaviours and complex phenotypes. Therefore, a thoughtful examination of pleiotropy should be conducted in analyses using multiple genetic instruments. Evidence from observational studies and clinical trials should be used in interpreting MR findings. Robust MR findings, in turn, should be examined in clinical trials. In addition, it is tricky to interpret MR results, especially for binary exposures. Given that the exposure in MR analysis is not an exact phenotype but is proxied by the effects of genetic variants on a certain trait, this genetically proxied exposure usually mimics a lifetime chronic effect, which hinders the exploration of time-specific associations.

Future perspectives

  • The null findings in previous MR studies may have been caused by inadequate power, particularly for weak associations of exposures proxied by a few SNPs that explained a small phenotypic variance. For exposures with robust associations in traditional observational studies, the neutral associations in MR studies deserve to be re-examined in well-powered studies with robust genetic instruments for the exposures and large sample sizes for the diabetes outcomes.

  • Most previous MR studies were based on summary-level data, which do not allow the exploration of potential non-linear associations (e.g. J- or U-shaped); rather, it can only be assumed that the association is linear without a threshold effect. MR analysis using individual-level data from large-scale biobanks and studies is needed to examine the non-linearity of the associations.

  • More effort should be put into MR studies on non-heritable exposures or exposures without genetic association information. For example, MR analyses of the association of diet and physical activity with diabetes risk are warranted.

  • Most MR studies have been based on data from European populations. With more and more data available from other populations, such as Asian and African populations, future MR studies are encouraged to include data from multi-ancestry cohorts.

  • Even though the associations between protein biomarkers and diabetes risk were examined in a few MR studies [11, 58], more independent verification is needed to confirm these findings. In addition, the intermediate roles of blood proteins and metabolites in the pathways from environmental exposure to diabetes should be investigated to provide evidence for treatment and intervention.

  • Even though many statistical approaches, such as the weighted median, MR-Egger, MR-PRESSO, MR-Cluster and contamination mixture methods, have been developed to detect pleiotropy and verify the association with different assumptions, more efforts are needed to generate new statistical approaches to handle pleiotropy and other limitations.

Conclusion

This review has integrated data from published MR studies on type 1 and 2 diabetes to highlight the many possible causal risk factors for dysglycaemia. While few studies have been conducted for type 1 diabetes, most MR analyses support that social, demographic, metabolic and lifestyle factors are causally associated with the development of type 2 diabetes. More MR studies in multi-ancestry cohorts are needed to examine the role of diet in the development of diabetes. MR investigations based on data on metabolites, protein biomarkers and the gut microbiome may help to illustrate the pathological molecular basis of diabetes.