Single-cell gene expression analysis reveals β-cell dysfunction and deficit mechanisms in type 2 diabetes
Type 2 diabetes (T2D) is one of the most common chronic diseases. Studies on T2D are mainly built upon bulk-cell data analysis, which measures the average gene expression levels for a population of cells and cannot capture the inter-cell heterogeneity. The single-cell RNA-sequencing technology can provide additional information about the molecular mechanisms of T2D at single-cell level.
In this work, we analyze three datasets of single-cell transcriptomes to reveal β-cell dysfunction and deficit mechanisms in T2D. Focused on the expression levels of key genes, we conduct discrimination of healthy and T2D β-cells using five machine learning classifiers, and extracted major influential factors by calculating correlation coefficients and mutual information. Our analysis shows that T2D β-cells are normal in insulin gene expression in the scenario of low cellular stress (especially oxidative stress), but appear dysfunctional under the circumstances of high cellular stress. Remarkably, oxidative stress plays an important role in affecting the expression of insulin gene. In addition, by analyzing the genes related to apoptosis, we found that the TNFR1-, BAX-, CAPN1- and CAPN2-dependent pathways may be crucial for β-cell apoptosis in T2D. Finally, personalized analysis indicates cell heterogeneity and individual-specific insulin gene expression.
Oxidative stress is an important influential factor on insulin gene expression in T2D. Based on the uncovered mechanism of β-cell dysfunction and deficit, targeting key genes in the apoptosis pathway along with alleviating oxidative stress could be a potential treatment strategy for T2D.
KeywordsSingle-cell Hyperglycaemia Type 2 diabetes β-cell dysfunction β-cell deficit Insulin expression Apoptosis Oxidative stress
Free fatty acids
Principal component analysis
Reactive oxygen species
Reads per kilobase of transcript per million mapped reads
single-cell RNA sequencing
Support vector machine
Type 2 diabetes
Total expression of the death executioner caspases
Transcripts per million
Type 2 diabetes (T2D) is one of the major causes of death worldwide . It is characterized by insulin resistance and impaired insulin secretion [2, 3]. Insulin resistance denotes declined insulin sensitivity in insulin targeted cells or tissues, while insufficient insulin secretion is related to pancreatic β-cell dysfunction and the loss of β-cell mass [4, 5]. β-cells are located in the islets of Langerhans, i.e. endocrine regions of pancreas. The main function of β-cells is to synthesize, store and secrete insulin, which is a peptide hormone and takes effects in decreasing the blood glucose level. It is reported that β-cell function declines even before the diagnosis of T2D . In addition, β-cell deficit of about 20% ∼65% was demonstrated for T2D in several studies [7, 8, 9]. Kahn  investigated the contribution of insulin resistance and β-cell dysfunction to the pathophysiology of T2D. Yoon et al.  measured β-cell mass in T2D. Although these works intended to study the β-cell dysfunction and deficit mechanisms in T2D, they were mainly built upon bulk-cell analysis which can only provide average information about a population of cells.
Since the transcriptomes were firstly measured at single-cell level by Tang et al. in 2009, the technique of single-cell RNA sequencing (scRNA-seq) has experienced an explosive development in the past 10 years [12, 13, 14, 15]. Compared with bulk-based approaches, scRNA-seq can provide crucial insights into cellular heterogeneity and bring profound new discoveries in biology [16, 17, 18, 19, 20, 21]. For example, Deng et al. reported stochastic expression of monoallelic genes in mammalian cells ; Buettner et al. detected hidden subpopulations of cells by analyzing scRNA-seq data . The technique of scRNA-seq has also been applied to transcriptome profiling of human pancreatic cells for both healthy and T2D donors [24, 25, 26, 27]. Xin et al.  and Segerstolpe et al.  showed the expression heterogeneity of human islet cells (e.g. α-cells, β-cells and δ-cells). They also analyzed the alterations of gene expression patterns as well as the enriched signaling pathways in T2D compared with healthy people.
In this study, we aim to unravel the β-cell dysfunction and deficit mechanisms in T2D by analyzing the single-cell transcriptomic data of β-cells. Three single-cell transcriptomic datasets were adopted because each of the datasets contains more than 100 of β-cells (we have found but not used a few other available datasets because they only contain limited numbers of β-cells). We named the three datasets as dataset 1, dataset 2 and dataset 3, respectively. All the three datasets consist of β-cells obtained from T2D donors and healthy donors. The analysis was carried out from three aspects, i.e. β-cell dysfunction, β-cell deficit and personalized analysis. Firstly, we focused on the mechanisms of β-cell dysfunction in T2D. It is well known that the major role of pancreatic β-cells is to produce insulin. Thus, we analyzed the expression levels of INS (i.e. the gene that encodes the preproinsulin precursor of active insulin) in β-cells belonging to healthy and T2D donors of each dataset. Different patterns of INS expression were detected in the three datasets. To explore the reasons, we examined the cellular stress in β-cells of the three datasets, and applied different machine learning algorithms to discriminate healthy and T2D β-cells by using the stress related features. Modeling the vulnerability of T2D β-cells to cellular stress, we found that oxidative stress could be a major influential factor on INS expression. Secondly, to study the mechanisms of β-cell deficit in T2D, we investigated the expression levels of the genes in the apoptosis pathway, conducted principle component analysis and carried out mutual information calculation. As a result, genes and pathways that are crucial for β-cell apoptosis in T2D were detected. In the last part, we performed personalized analysis of INS expression and the expression of death executioner caspases.
Based on the analysis of the three datasets of β-cell transcriptomes, we obtained the following main results. Some β-cells in T2D donors have comparable INS expression levels with those in healthy donors; β-cells in T2D have normal INS expression under low cellular stress, but they have dysfunction under high cellular stress; β-cells in healthy people can deal with the cellular stress, maintaining normal INS expression; oxidative stress could be a major influential factor on INS expression; TNFR1-, BAX-, CAPN1- and CAPN2-dependent pathways may be curial for β-cell apoptosis in T2D; INS and death executioner caspases are differentially expressed among donors. Note that some of the above results could hardly be obtained from bulk-cell analysis.
Pancreatic β-cell dysfunction in T2D
Single-cell INS expression
The numbers of donors and β-cells of each dataset
Gene expression dataset of β-cells
Number of donors
Number of β-cells
Plenty of evidence indicates that prolonged exposure of β-cells to hyperglycemia and high free fatty acids (FFA) causes deleterious effects of endoplasmic reticulum (ER) stress, oxidative stress, and increase of β-cell apoptosis [6, 30, 31, 32, 33, 34]. ER stress is developed as the continuous demand of insulin, leading to the increased burden of β-cell and the accumulation of misfolded proteins in the ER lumen. ER stress is mediated by IRE1, EIF2AK3 and ATF6 [35, 36]. Reactive oxygen species (ROS) are accumulated to cytotoxic level during chronic glucose and fatty acids metabolism [5, 37, 38, 39]. Besides, hyperglycemia may disrupt the electron transport chain in mitochondria, which is also a main source of free radicals [40, 41]. The oxidative stress (cumulative ROS) promotes the activation of ASK1, JNK and P38ALPHA. Hyperglycemia and high FFA induce increased β-cell apoptosis by several mechanisms, including promoting proapoptotic gene expression, and increasing ER stress, oxidative stress as well as inflammation stress. We use the expression levels of CASP3, CASP6, and CASP7 (i.e. death executioner caspases) to represent the rate of apoptosis.
Discrimination of healthy and T2D β-cells
In order to test the above analysis results, we employed different classifiers to discriminate healthy and T2D β-cells for each dataset, using expression data of genes that are related to ER stress, oxidative stress and apoptosis. In addition, the INS expression related genes were also included as features of the two groups of cells [42, 43]. Overall, 45 genes were selected (Additional file 1: Table S1) [35, 36, 37, 38, 39, 40, 41]. Then, we chose genes that were expressed in more than 35% of all the cells in each dataset (40% for dataset 1, these values were derived according to the proportion of healthy β-cells in all the cells of each dataset). We conducted this step because a feature (gene) cannot contribute to distinguishing healthy and T2D β-cells if it is barely expressed in the cells. Here, expressed genes are those with expression levels no less than 1. Then 17, 25 and 31 genes met the conditions in datasets 1, 2 and 3, respectively. For fair comparison, we also applied the 25 genes selected in dataset 2 to datasets 1 and 3.
Vulnerability of T2D β-cells
Major influential factors for INS expression
Entropy, joint entropy and mutual information
INS and OXID
INS and ER
INS and CASP
INS and OXID
INS and ER
INS and CASP
β-cell apoptosis in T2D
Expression of death executioner caspases
Pathway analysis of apoptosis in T2D
To summarize the above analyses, the death receptor TNFR1-mediated pathway, mitochondrial BAX-related pathway, as well as the CAPN1- and CAPN2-dependent pathway may be crucial in T2D.
We also analyzed the genes related to apoptosis in dataset 3. Different from dataset 2, in dataset 3 TP53 is highly expressed. The results are provided in the supplementary documents (Additional file 2: Table S3, Additional file 3: Figures S2 and S3).
Personalized analysis of β-cells
We also analyzed the TEDECs of each donor (Additional file 3: Figure S4). Similar to the INS expression, the TEDECs are also different among donors. Note that, in dataset 2, the median values of the TEDECs of T2D patients are all larger than those of the healthy donors.
In this work, we conducted single-cell data analysis to decipher pancreatic β-cell dysfunction and deficit mechanisms in T2D. Three single-cell transcriptomic datasets were employed in our study. Different from bulk-cell data analysis, single-cell data analysis allows us to capture inter-cell heterogeneity and explore the data deeply to unravel the mechanisms of diseases. It is well known that a major function of pancreatic β-cells is to produce secretory insulin. Thus, we firstly examined the INS expression levels in the β-cells of each dataset. In datasets 2 and 3, INS expression in the T2D β-cells is generally lower than that in the healthy cells, but the expression is similar in healthy and T2D β-cells of dataset 1. To explain the observation of INS expression, we checked genes that are related to cellular stress, and found that these genes were lowly expressed in both the healthy and T2D groups of β-cells of dataset 1. In dataset 2, T2D β-cells had high cellular stress while healthy β-cells experienced low stress. Moreover, in dataset 3, the cellular stress in both groups of cells was high.
Considering the INS expression levels and cellular stresses of the three datasets, we obtained the following results. T2D β-cells perform normally in INS expression under low cellular stress (dataset 1), but they behave dysfunctionally under high cellular stress (dataset 2); healthy β-cells can deal with high cellular stress, maintaining INS expression at normal levels despite the stress (dataset 3). To further validate our analysis results, we employed five classifiers to predict the cellular conditions (i.e. healthy or T2D) of the β-cells, using the expression levels of stress- and INS-related genes. We also proposed that β-cells in T2D are vulnerable to stress-induced dysfunction. In other words, under similar cellular stresses, T2D β-cells have abnormal INS expression while healthy β-cells perform normally. This may be caused by the toxic effects of hyperglycaemia and high FFA. Besides, our analysis showed that oxidative stress could be an important influential factor on INS expression. This is consistent with the experimental results in , which show that MAFA and PDX1 are inactivated under oxidative stress, resulting in the decrease of insulin secretion of T2D β-cells. Meanwhile, the impaired β-cell function can be repaired by relieving oxidative stress. For instance, as reported in [47, 48], insulin secretion can be improved in vitro upon treatment with an antioxidant of bis (1-hydroxy-2,2,6,6-tetramethyl-4-piperidinyl) decandioate di-hydrochloride (IAC) in T2D.
T2D is also characterized by a relative deficit of pancreatic β-cells [9, 33]. It has long been demonstrated that β-cell apoptosis would increase in T2D patients and T2D mouse models [33, 34]. As the apoptosis measurements are not available for the three single-cell datasets, we used the TEDECs (i.e. CASP3, CASP6 and CASP7) to estimate the rate of β-cell apoptosis. However, increased apoptosis of T2D β-cells is only observed in dataset 2, whereas in dataset 3 the median value of TEDECs of the healthy β-cells is higher than that of the T2D β-cells. This striking observation needs further clarification as a future work. In addition, we conducted personalized analysis of INS and TEDCEs, and showed that INS and TEDCEs are different among donors, with T2D patients having lower INS expression and higher apoptosis in β-cells than healthy donors.
In this work, to uncover the mechanisms of β-cell dysfunction and deficit in T2D, we conducted single-cell transcriptomic data analysis. By analyzing the INS expression levels and cellular stresses of three β-cell transcriptomic datasets, we observed that the T2D β-cells perform normally in INS expression in the condition of low cellular stress but behave dysfunctionally under high stress. The healthy β-cells can deal with high cellular stress and keep INS expression at normal levels. In addition, analyses of correlation and mutual information showed that oxidative stress could be a critical influential factor on INS expression in T2D. This is consistent with some experimental results in the literature. Moreover, we analysed genes related with β-cells death regulation and observed increased apoptosis in T2D cells only in dataset 2, when adopting the TEDECs as an estimation of apoptosis rate. The TNFR1-mediated pathway, mitochondrial BAX-related pathway, as well as the CAPN1- and CAPN2-dependent pathway may play important roles in T2D. Finally, personalized analysis showed some diversity of INS expression among donors.
Materials and methods
Experimental data of single-cell transcriptomes
The data we analysed here were obtained from three published works of Xin et al. , Segerstolpe et al.  and Lawlor et al. . The gene expression levels in  and  were reported in reads per kilobase of transcript per million mapped reads (RPKM), while the records in  was quantified as transcripts per million (TPM). Due to the different measurements, we only compared gene expression of cells within the same dataset.
Classification of cells
In order to predict cellular states (i.e. healthy or T2D β-cells), we employed five classifiers: Bayesian network, support vector machine (SVM), random forest, logistic regression and neural network (NN). Bayesian network is a probabilistic graphical model represented by a directed acyclic graph. It contains a set of variables as well as their conditional probability distributions. SVM maps the features into a high-dimensional space and conducts classification using hyperplane(s). Random forest is composed of an ensemble of decision trees, and a voting strategy is employed for the final prediction. In logistic regression, a logistic function is used to compute the probability of the dependent variable and to determine the potential class of a sample. NN is constructed with a group of interconnected neurons, which are organized as the input layer, hidden layer and output layer. Detailed information of these algorithms is provided in [49, 50, 51, 52, 53, 54]. We employed the algorithms in Weka 3.8.1 to conduct the classification , and leave-one-out cross-validation was used for model validation.
where precision = TP/(TP + FP) and recall = TP/(TP + FN).
H(X) and H(Y) are the entropies of X and Y, while H(X, Y) stands for the joint entropy of the two variables. The derivation of Eq. (4) from Eq. (3) can be found in . We discretized the gene expression data by taking the floor of the values, as we calculated the entropy in a discrete way. In addition, base 2 was employed for the logarithms to compute entropy, implying that the unit of bit was used for measuring the mutual information.
Spearman’s rank correlation coefficient
To measure the monotonic relationship between cellular stress and INS expression, we calculated the Spearman’s rank correlation coefficient, following the steps given in .
Principal component analysis (PCA)
PCA was implemented based on an orthogonal linear transformation, which decorrelates samples of possibly correlated variables. After the transformation, the first principle component has the largest variance, the second one holds the second largest variance, and so on. Thus, the fundamental goal of PCA is the change of basis, after which a small number of principal components can be identified to provide a reasonable description of the original data. The derivation and instructions for implementation of PCA are available in .
Comparison of INS expression
We compared the INS expression levels between two groups of cells using Student’s t-test. The difference is considered as statistically significant if the p-value is less than 0.05.
Publication costs are funded by Start-Up Grant of ShanghaiTech University.
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 19, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-19.
L.M. performed β-cell gene expression analysis, decoded β-cell dysfunction and deficit mechanisms in T2D, and drafted the manuscript. J.Z. initiated the project, participated in the design of the study, and helped draft the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable. We used three published datasets in this article.
Consent for publication
The authors declare that they have no competing interests
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.Organization WH, et al. Global Report on Diabetes. France: World Health Organization; 2016.Google Scholar
- 6.Popa S, Mota M. Type 2 Diabetes In: Masuo K, editor. Rijeka: IntechOpen: 2013.Google Scholar
- 20.Janiszewska M, Liu L, Almendro V, Kuang Y, Paweletz C, Sakr RA, Weigelt B, Hanker AB, Chandarlapaty S, King TA, et al. In situ single-cell analysis identifies heterogeneity for PIK3CA mutation and HER2 amplification in HER2-positive breast cancer. Nat Genet. 2015; 47(10):1212.PubMedPubMedCentralCrossRefGoogle Scholar
- 27.Lawlor N, George J, Bolisetty M, Kursawe R, Sun L, Sivakamasundari V, Kycia I, Robson P, Stitzel ML. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 2017; 27(2):208–22.PubMedPubMedCentralCrossRefGoogle Scholar
- 35.Cnop M, Ladriere L, Hekerman P, Ortis F, Cardozo AK, Dogusan Z, Flamez D, Boyce M, Yuan J, Eizirik DL. Selective inhibition of eukaryotic translation initiation factor 2 α dephosphorylation potentiates fatty acid-induced endoplasmic reticulum stress and causes pancreatic β-cell dysfunction and apoptosis. J Biol Chem. 2007; 282(6):3989–97.PubMedCrossRefGoogle Scholar
- 44.Apoptosis KEGG Pathway Database. http://www.kegg.jp/kegg-bin/highlight_pathway?scale=1.0%26map=map04210%26keyword=apoptosis Accessed 15 Mar 2017.
- 45.GeneGo. https://portal.genego.com Accessed 02 Feb 2017.
- 47.Lupi R, Del Guerra S, Mancarella R, Novelli M, Valgimigli L, Pedulli G, Paolini M, Soleti A, Filipponi F, Mosca F, et al. Insulin secretion defects of human type 2 diabetic islets are corrected in vitro by a new reactive oxygen species scavenger. Diabetes Metab. 2007; 33(5):340–5.PubMedCrossRefGoogle Scholar
- 48.D’Aleo V, Del Guerra S, Martano M, Bonamassa B, Canistro D, Soleti A, Valgimigli L, Paolini M, Filipponi F, Boggi U, et al. The non-peptidyl low molecular weight radical scavenger IAC protects human pancreatic islets from lipotoxicity. Mol Cell Endocrinol. 2009; 309(1-2):63–6.PubMedCrossRefGoogle Scholar
- 50.Cortes C, Vapnik V. Support-vector networks. Mach Learning. 1995; 20(3):273–97.Google Scholar
- 55.Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Burlington: Morgan Kaufmann; 2016.Google Scholar
- 56.Cover TM, Thomas JA. Elements of Information Theory. New Jersey: Wiley; 2012.Google Scholar
- 58.Shlens J. A tutorial on principal component analysis. 2014. arXiv preprint arXiv:1404.1100.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.