Deep proteomic network analysis of Alzheimer’s disease brain reveals alterations in RNA binding proteins and RNA splicing associated with disease
The complicated cellular and biochemical changes that occur in brain during Alzheimer’s disease are poorly understood. In a previous study we used an unbiased label-free quantitative mass spectrometry-based proteomic approach to analyze these changes at a systems level in post-mortem cortical tissue from patients with Alzheimer’s disease (AD), asymptomatic Alzheimer’s disease (AsymAD), and controls. We found modules of co-expressed proteins that correlated with AD phenotypes, some of which were enriched in proteins identified as risk factors for AD by genetic studies.
The amount of information that can be obtained from such systems-level proteomic analyses is critically dependent upon the number of proteins that can be quantified across a cohort. We report here a new proteomic systems-level analysis of AD brain based on 6,533 proteins measured across AD, AsymAD, and controls using an analysis pipeline consisting of isobaric tandem mass tag (TMT) mass spectrometry and offline prefractionation.
Our new TMT pipeline allowed us to more than double the depth of brain proteome coverage. This increased depth of coverage greatly expanded the brain protein network to reveal new protein modules that correlated with disease and were unrelated to those identified in our previous network. Differential protein abundance analysis identified 350 proteins that had altered levels between AsymAD and AD not caused by changes in specific cell type abundance, potentially reflecting biochemical changes that are associated with cognitive decline in AD. RNA binding proteins emerged as a class of proteins altered between AsymAD and AD, and were enriched in network modules that correlated with AD pathology. We developed a proteogenomic approach to investigate RNA splicing events that may be altered by RNA binding protein changes in AD. The increased proteome depth afforded by our TMT pipeline allowed us to identify and quantify a large number of alternatively spliced protein isoforms in brain, including AD risk factors such as BIN1, PICALM, PTK2B, and FERMT2. Many of the new AD protein network modules were enriched in alternatively spliced proteins and correlated with molecular markers of AD pathology and cognition.
Further analysis of the AD brain proteome will continue to yield new insights into the biological basis of AD.
KeywordsAlzheimer’s disease Proteomics Proteogenomics RNA binding protein RNA splicing
alternative exon-exon junction
Myc box-dependent-interacting protein 1
Baltimore Longitudinal Study of Aging
Consortium to Establish a Registry for Alzheimer’s Disease
Dorsolateral prefrontal cortex
Electrostatic repulsion—hydrophilic interaction chromatography
Fatty acid-binding protein, brain
Fermitin family homolog 2
Global internal standard
Glutathione S-transferase mu 1
Genome-wide association study
High pressure liquid chromatography
International Genetics of Alzheimer’s Project
Liquid chromatography tandem mass spectrometry
Mild cognitive impairment
Mini-mental status examination
Phosphatidylinositol-binding clathrin assembly protein
Protein-tyrosine kinase 2-beta
SPARC-related modular calcium-binding protein 1
small nuclear ribonucleoprotein
Synchronous precursor selection-MS3
TAR DNA-binding protein 43
Transmembrane protein 106B
Tandem mass tag
Tandem mass tag mass spectrometry
Triosephosphate isomerase 1
U1 small nuclear ribonuclearprotein 70 kDa
Neurosecretory protein VGF
Weighted gene co-expression network analysis
Weighted protein co-expression network analysis
Alzheimer’s disease (AD) is the most common age-related neurodegenerative disease, and currently affects more than 46 million people worldwide . The burden of this disease is rapidly growing as the population ages, and interventions to treat or prevent the disease are urgently needed. While AD is currently defined by cognitive decline in the presence of amyloid plaque and tau tangle accumulation within the brain, the altered biochemical and cellular processes that eventually lead to changes in cognition and pathology are not well understood. A better understanding of these altered processes may yield insight into new drug targets and biomarkers for AD. Systems-based approaches such as weighted gene co-expression network analysis (WGCNA) can be used to analyze biochemical and cellular changes in brain, and are useful to help capture the complexity of perturbations in biological networks that are related to disease [2, 3, 4]. We recently described a weighted protein correlational network analysis (WPCNA) of post-mortem brains from patients with AD, asymptomatic AD (AsymAD), and controls . We found protein network modules that correlated with both cognition and AD pathology. These modules were enriched for AD risk loci identified by genome-wide association studies (GWAS), and contained a large number of glial proteins. Many of the modules we identified were distinct from mRNA network modules generated from a separate AD post-mortem brain cohort, suggesting that mRNA and protein network analyses can generate both complementary and unique information.
The number of proteins that can be quantified in a sample cohort is a fundamental limiting factor in the depth and complexity of any network built from proteomic data, and consequently the amount of information that can be gleaned from such networks. In our previous analysis of AD, AsymAD, and control brains from the Baltimore Longitudinal Study of Aging (BLSA)  cohort, we were able to quantify only 2,736 proteins across 97 dorsolateral prefrontal cortex (DLPFC) and precuneus brain tissues using label-free quantification (LFQ) by liquid chromatography tandem mass spectrometry (LC-MS/MS), despite the fact that we were able to identify > 5000 proteins by LC-MS/MS across the set of brain samples . This reduction in quantifiable proteins by LFQ LC-MS/MS is a consequence of the stochastic nature of data-dependent acquisition techniques that leads to the well-known “missing value” problem , where the same ions are not consistently chosen for MS/MS analysis across all runs, or the peptide precursor ions are not matched effectively across runs. One strategy to minimize the missing value problem is to measure peptide and protein levels using a multiplex tagging approach with isobaric tandem mass tags (TMTs) [8, 9, 10, 11]. The most recent generation of TMTs can be used to report the relative levels of a given peptide from a pool consisting of up to 11 separate and independent samples . Using an appropriate pooled sample study design and mass spectrometry instrumentation that can perform MS3 reporter quantitation, missing values can be minimized within an experimental cohort using a TMT approach while avoiding dynamic range compression effects . In this study, we used a new pipeline with TMTs, coupled with offline prefractionation, to profile a much deeper proteome in the same BLSA DLPFC tissues previously analyzed by online “single-shot” LFQ. This approach allowed us to quantify 6,533 proteins across the entire cohort—over double the depth achieved in our previous study. The increased depth of proteome coverage allowed us to build a protein network that consisted of approximately threefold more protein modules, two-thirds of which shared little overlap with the modules previously identified in our LFQ network. One of the most unique modules contained strong enrichment in AD risk loci identified by the International Genetics of Alzheimer’s Project (IGAP) GWAS , correlated with tau tangle burden, and contained more glial than neuronal proteins. We also used differential expression analysis on the enlarged proteomic dataset to identify proteins that have altered levels among AD, AsymAD, and control brains, even after accounting for changes in cellular abundance. RNA binding proteins emerged as a family of proteins that was increased in abundance in AD, and these proteins were enriched in modules that correlated with tau tangle pathology. Based on this finding, we explored changes in RNA splicing manifested at the protein level that may occur due to potential RNA binding protein dysfunction in AD. To do so, we developed a new proteogenomic pipeline that used RNA-seq data from control and AD brain to predict alternative exon-exon junction splicing events not present in conventional protein databases. This proteogenomic approach, coupled with the increased depth of proteome coverage and superior quantitation afforded by our TMT pipeline compared to our previous LFQ approach, allowed us to identify and quantify a number of alternative exon-exon splicing events in brain at the protein level, including alternative exon-exon junctions in AD risk factor proteins such as BIN1, PICALM, PTK2B, and FERMT2. Many of the identified alternative exon-exon junction splicing events were highly enriched in modules unique to the TMT network, and correlated with disease, suggesting a potential role for aberrant RNA splicing in AD pathogenesis.
Fresh frozen brain tissue blocks from dorsolateral prefrontal cortex (Brodmann area 9) were used for analysis, as described previously . Frozen aliquots from the same brain homogenate were used for LFQ and TMT analysis. Symptomatic AD (n = 20), asymptomatic AD (AsymAD) (n = 14), and control (n = 13) cases were processed and analyzed. In addition to these n = 47 cases, mild cognitive impairment (MCI) cases (n = 11) were homogenized separately on a different day and included in the batched TMT-MS design, but were later excluded from the analysis due to a preparation batch effect that was refractory to post-hoc correction. Sample information is given in Additional file 1: Table S1 and Additional file 2: Table S2. The TMT-MS experimental design is shown in Additional file 3: Table S3.
Each tissue piece (approx. 100 mg wet weight) was homogenized in 500 μL of urea lysis buffer (8 M urea, 100 mM NaH2PO4, pH 8.5), supplemented with 5 μL (100× stock) HALT protease and phosphatase inhibitor cocktail (Pierce) using a Bullet Blender (Next Advance) and 750 mg of steel beads (Next Advance). Protein supernatants were then transferred to new 1.5 mL Eppendorf tube and sonicated (Sonic Dismembrator, Fisher Scientific) 3 times for 5 s with 15 s intervals of rest at 30% amplitude. Protein concentration was determined by the bicinchoninic acid (BCA) method, and samples were frozen in aliquots at − 80 °C. Protein integrity was checked by one-dimensional SDS-PAGE (Additional file 8: Figure S1). The MCI case samples were homogenized on a later day than the control, AsymAD, and AD cases, but digestion prior to TMT labeling was performed at the same time.
Protein homogenates (100 μg) were mixed with Laemmli sample buffer and β-mercaptoethanol (3% v/v), and incubated for 5 min at 95 °C. After cooling, 10 μg protein was loaded into Bolt 10% Bis-Tris Plus gels (Invitrogen) and electrophoresed for 30 min at 160 V. Gels were then stained with Coomassie Blue for protein visualization.
Protein digestion, TMT labeling, and ERLIC fractionation
Protein homogenates (100 μg) were treated with 1 mM dithiothreitol (DTT) at 25 °C for 30 min, followed by 5 mM iodoacetimide (IAA) at 25 °C for 30 min in the dark. Protein was digested with 1:100 (w/w) lysyl endopeptidase (Wako) at 25 °C overnight. Resulting peptides were desalted with a Sep-Pak C18 column (Waters). All samples were dried down completely using a Savant SpeedVac (ThermoFisher Scientific). In addition to the 58 case samples, a global internal standard (GIS) mixture of case sample homogenates (n = 60, 30 control and 30 AD) taken from multiple different patient cohorts was generated by mixing each sample equally by protein amount prior to TMT labeling on a designated reporter channel. TMT labeling was performed per the manufacturer’s protocol and as previously described . Briefly, the reagents were equilibrated to room temperature. Dried peptide samples (100 μg each) were resuspended in 100 μl of 100 mM TEAB buffer (supplied with the kit). Anhydrous acetonitrile (ACN) (41 μl) was added to each labeling reagent tube and the peptide solutions were transferred into their respective channel tubes. The reaction was incubated for 1 h and quenched for 15 min afterward with 8 μl of 5% hydroxylamine. Samples were combined according to the batch design shown in Additional file 3: Table S3, and dried down to 100 μl to remove ACN. The combined samples were then desalted using a Sep-Pak C18 column (Waters) and dried down to approximately 5 μl. The labeled peptide sample batches were each further diluted with 100 μl of 90% ACN and 0.1% acetic acid (buffer A) and loaded onto an offline electrostatic repulsion–hydrophilic interaction chromatography (ERLIC) fractionation HPLC system [10, 13]. A total of 40 fractions were collected over a 40-min gradient from 0 to 28% Buffer B (30% ACN and 0.1% formic acid). The 40 fractions were combined down to 20 and dried down to completeness.
Dried peptide fractions were resuspended in 30 μl of peptide loading buffer (0.1% formic acid, 0.03% trifluoroacetic acid, 1% acetonitrile). Peptide mixtures (2 μl) were separated on a self-packed C18 (1.9 μm Dr. Maisch, Germany) fused silica column (25 cm × 75 μM internal diameter; New Objective) by a Dionex Ultimate 3000 RSLCNano and monitored on a Fusion mass spectrometer (ThermoFisher Scientific). Elution was performed over a 140-min gradient at a rate of 300 nl/min with buffer B ranging from 3 to 80% (buffer A: 0.1% formic acid in water, buffer B: 0.1% formic acid in acetonitrile). The mass spectrometer was programmed to collect at the top speed for 3 s cycles in synchronous precursor selection (SPS)-MS3 mode [10, 14]. The MS scans (380–1500 m/z range, 200,000 AGC, 50 ms maximum ion time) were collected at a resolution of 120,000 at m/z 200 in profile mode. CID MS/MS spectra (2 m/z isolation width, 35% collision energy, 10,000 AGC target, 35 ms maximum ion time) were detected in the ion trap. HCD MS/MS/MS spectra (2 m/z isolation width, 65% collision energy, 100,000 AGC target, 120 ms maximum ion time) of the top 5 MS/MS product ions were collected in the Orbitrap at a resolution of 60000 . Dynamic exclusion was set to exclude previous sequenced precursor ions for 30 s within a 10 ppm window. Precursor ions with + 1 and + 8 or higher charge states were excluded from sequencing.
Database search and quantification via TMT SPS-MS3 intensities
MS/MS spectra were searched against a Uniprot human database (downloaded on 04/15/2015 with 90,411 target sequences) with Proteome Discoverer 2.1 (ThermoFisher Scientific). The database included all Swiss-Prot-curated (canonical) plus TrEMBL (unreviewed) sequences, totaling 90,411 FASTA sequence entries. Methionine oxidation (+ 15.9949 Da), asparagine, and glutamine deamidation (+ 0.9840 Da) and protein N-terminal acetylation (+ 42.0106 Da) were variable modifications (up to 3 allowed per peptide); static modifications included cysteine carbamidomethyl (+ 57.0215 Da), peptide N-terminus TMT (+ 229.16293 Da), and lysine TMT (+ 229.16293 Da). Only peptides resulting from LysC digestion were considered, with up to two miscleavages, in the database search. A precursor mass tolerance of ±20 ppm and a fragment mass tolerance of 0.6 Da were applied. Spectra matches were filtered by Percolator  to a peptide-spectrum match false discovery rate of < 1%. Strict parsimony was observed for peptide to protein matching, and only razor and unique peptides were used for abundance calculations. Log2 ratio of sample over the GIS was used for comparison across all samples.
TMT quantitative data normalization
GIS mixture (MS3 TMT reporter channel m/z 126) provided as Proteome Discoverer 2.1 script output was checked for extreme outlier values of log2(0.01) and log2(100), i.e. ±6.64; these values were excluded from analysis. Furthermore, proteins with more than 4 unquantifiable batches (out of a total of 8 batches) due to 0 or NA value for the GIS channel 126 reporter Proteome Discoverer 2.1-normalized value (pre-ratio calculation) were excluded from consideration. Finally, proteins with more than 23 missing log2(ratio) values were excluded from analysis, and then 11 MCI cases were dropped, leaving a matrix of n = 47 control, AsymAD, and AD cases with no more than 23 missing values (< 50%) per protein measurement, for a total of 6532 proteins. Amyloid-β log2(ratio) represented by TMT peptide level quantitation of the APP LVFFAEDVGSNK peptide was added to the final 6533 × 47 protein abundance matrix.
Digital sorting algorithm for cell type weight analysis of tissue proteomes
The covariate-unregressed, normalized abundance matrix described above was collapsed to average protein abundance measurements for unique gene symbols (n = 5,839) using WGCNA::collapseRows() function . Two thousand one hundred thirty two cell type marker gene symbols from pure cell types of mouse brain  (referred to as the Sharma dataset) which we previously defined via thresholding used for cell type enrichment analyses of human proteome coexpression modules [5, 17] were converted from mouse to human gene symbols using biomaRt R interface to the public Ensembl datamart as of July 2017 . From this set, 895 gene symbols representing collapsed and averaged protein abundances with no missing quantification values across the 47 BLSA case tissue samples overlapped the Sharma quantitative dataset. The overlapping marker measurements from Sharma purified brain cell types and our BLSA middle frontal gyrus samples were input into the DSA v1.0 R package  and estimated weights were found using the DSA::EstimateWeight() function.
Regression for covariates
A naïve first pass regression was performed by considering age, sex, post-mortem interval (PMI), and disease status group contributions to each sample-specific protein abundance measurement set (n = 47), explicitly modeled using 1000 iterations of ordinary nonparametric bootstrap regression. Then age, sex, and PMI covarying components of the measurement were subtracted to arrive at a regressed protein abundance measurement set. This approach was repeated for all 6,533 proteins in the abundance matrix.
A second, two-pass regression scheme was performed by first considering DSA estimated cell type weight for the four Sharma dataset brain cell types (microglia, astrocytes, neurons, and oligodendroglia) as four sets of variables for regression. Following normalization of cell type abundance variation across the samples, the prior age, sex, and PMI regression scheme was used to remove these covariate effects. Only the first pass regressed protein abundance matrix was used for WPCNA. Importantly, missing values did not require imputation for bootstrap covariate regression.
Weighted protein correlation network analysis (WPCNA)
Threshold power Beta for reduction of false positive correlations (i.e. the beneficial effect of enforcing scale free topology) was sampled in increments of 0.5 and selected as the lowest power at which scale free topology R2 was approximately 0.80, or in the case of the cell type weight-regressed network, the power at which a horizontal asymptote (plateau) was nearly approached, near a scale free topology R2 of (0.80). Other parameters were selected as previously optimized for protein abundance networks . Thus, for the signed network build on protein abundances after naïve age, sex, and PMI regression, parameters were input into the WGCNA::blockwiseModules() function as follows: Beta (power) 8.0, mergeCutHeight 0.07, pamStage TRUE, pamRespectsDendro TRUE, reassignThreshold p < 0.05, deepSplit 2, minModuleSize 17, replaceMissing TRUE, corType bicor, maxBlockSize greater than the total number of proteins (6,533), and TOMDenominator mean.
Gene ontology (GO) functional analysis of WPCNA modules
GO analysis for module membership was performed using GO-Elite  with the background set to all 5,839 gene symbols quantified in this study. Gene lists per module were subjected to Fisher exact overlap test in the python command line version of GO-Elite v1.2.5 for species setting Hs against the current (downloaded June 2017) annotation database for Biological Process, Molecular Function, and Cellular Component terms. Cytoscape with the EnrichmentMAP app  was used to visualize ontology representation, overlap, and relatedness.
Differential expression analysis was performed as previously described . Briefly, differentially expressed proteins were found using one-way ANOVA followed by Tukey’s comparison post hoc (p value < 0.05). Volcano plots were generated with the ggplot2 package in R. Custom R scripts were used to visualize overlap of differentially expressed targets with WPCNA modules.
MAGMA  for p value calculation of GWAS target enrichment in WPCNA modules was performed as previously described . Hypergeometric overlap significance tests, namely one-tailed Fisher exact and two-tailed overrepresentation analysis, were performed as previously described .
Proteogenomic RNA alternative splicing analysis based on gapped transcriptome reads
The GSNAP algorithm with novel splicing flag (-N) on  was used to realign raw short paired end RNA-Seq reads of 3 control and 3 AD cases from the University of Kentucky brain bank originally published in Bai, et al.  to the GRCh37 human genome build with contigs and the 16,569 nucleotide (nt) mitochondrial genome. Then all exon-exon junctions represented by 2 or more gapped reads across the 6 case sample cDNA libraries, with a minimum exonic overlap of 4 nt, were summarized using the R spliceSites bioconductor package. A custom R script and Excel formulas for string manipulation were used to extract LysC [K|P] peptides spanning exon-exon junctions (both with and without miscleavage at proline). All junction-spanning peptides considered were ones that had alternative events represented by other gapped reads that shared a left (5′) or right (3′) end with another set of gapped reads, and not “singleton” or brain constitutive exon-exon junctions. Peptides from different genomic sites that were 100% homologous to the junction-spanning peptides were considered duplicates and were removed from consideration. The resulting list of annotated alternative exon-exon junction-spanning peptides (N > 58,319) detected in brain transcriptome were concatenated as FASTA entries to the April 2015 human Uniprot database, and then Proteome Discoverer 2.1 was used to search and quantify peptide reporter channels across all 8 batches of TMT data with parameters otherwise as described above for the initial search. Peptide summary output for each of the 8 batches was opened in Excel, and all peptides annotated in the expanded human database as brain-specific alternative exon-exon junction peptides—including different modified forms of the same fully LysC digested peptides—were found and summed using the Excel sumif() function. These unified quantitations were performed over the different post-translationally modified states of the same peptide (e.g., N-terminal acetylation, or N/Q deamidation, or M oxidation) for all alternative exon-exon junction peptides in the peptide-level summary output for each of the 8 10-plex batches of ERLIC fractions. Quantitations of within-batch normalized abundances were then scaled across batches to set the average of all GIS measurements within batch to be identical across batches. The scaled, normalized, summed peptide abundances were log2-transformed; 9 negative values (< 1 before log2 transformation) were removed from the matrix. Regression for age, sex, and PMI covariation was performed in R on all log2 transformed values except for 781 that could not be regressed due to a high number of missing values. After regression, ANOVA with Tukey post hoc correction was performed on both regressed and unregressed values. The regressed alternative exon-exon junction peptide abundances were matched to the 50 WPCNA eigenproteins by calculating kME (correlation to module eigenprotein) for each peptide and assigning the peptide to the module with the highest correlation. For the purposes of avoiding spurious correlations, no more than 25 out of 47 missing values were allowed for any peptide. Venn and volcano plots were produced in R using vennDiagram, ggplot2, and/or plotly R packages.
TMT quantification pipeline increases the depth of proteomic network analysis of human brain tissues
In our previous analysis of dorsolateral prefrontal cortex (DLPFC) brain tissue from AD, asymptomatic AD (AsymAD), and control cases from the Baltimore Longitudinal Study of Aging (BLSA)  cohort, we were able to identify 3,069 proteins with 10 % or less missing values across 47 DLPFC brain samples (excluding precuneus samples) using “single-shot” one-dimensional online reverse-phase HPLC fractionation and label-free quantitation (LFQ) . This represented a reduction from 5138 total proteins identified across all DLPFC samples due to missing peptide quantitative values in greater than 10% of the samples. In order to address the limitation of LFQ by data-dependent LC-MS/MS when analyzing protein levels across multiple samples, we reprocessed and reanalyzed the same DLPFC homogenates using a multiplex isobaric tandem mass tag (TMT) labeling approach and synchronous precursor selection-based mass spectrometry (SPS-MS3) quantitation on a tribrid mass spectrometer, coupled with orthogonal offline prefractionation [8, 10]. As part of the new analysis approach, we also relaxed the data inclusion criteria to require missing values in < 50% rather than < 10% of the samples, given that the WGCNA algorithm for coexpression network analysis well-tolerates missing values up to 50%. We subsequently refer to this quantitation and analysis approach as our “TMT pipeline.” Using the TMT pipeline, we were able to identify and quantify 6,533 proteins, compared to 3,069 proteins using the previous single-shot LFQ strategy. The large majority of the increase in protein coverage was due to the superior quantitation provided by TMT labeling and prefractionation rather than the relaxed missing values tolerance threshold (Additional file 9: Figure S2). To validate that protein quantitation was similar using the two different quantitation approaches, we compared the relative levels of the amyloid-β (Aβ)17–28 peptide in each sample quantified by LFQ and TMT. The Aβ17–28 peptide is a proteolytic fragment of Aβ generated by both trypsin and LysC enzymatic digestion of the full-length Aβ peptide, and therefore represents a peptide with a very large change in abundance across the sample cohort due to aggregation of Aβ into amyloid plaques in AsymAD and AD cases . An illustration of Aβ17–28 quantitation by TMT is shown in Additional file 10: Figure S3A, with correlation of this Aβ peptide measurement to cerebral amyloid plaque load in each case shown in Additional file 10: Figure S3C. We found a strong correlation (r = 0.85) between Aβ levels measured by LFQ and TMT quantitation approaches (Additional file 10: Figure S3B), suggesting that TMT with SPS-MS3 quantification was able to reliably quantify proteins over a large dynamic range, similar to the LFQ approach employed in our previous analysis.
AD genetic risk factors cluster in glial modules
Brain cell type changes and protein abundance differences between asymptomatic and symptomatic AD
We next asked whether these changes in cell type abundance are the primary drivers of changes in protein abundance among control, AsymAD, and AD, or whether there are changes in protein abundance by disease state that are independent of changes in cell type. TMT proteomic analysis allowed us to identify 1147 proteins that showed changes in abundance among control, AsymAD, and AD cases (Fig. 3b). Most of the proteins with altered abundance were observed when comparing control with AD cases, or AsymAD with AD cases, with relatively fewer proteins that differed between control and AsymAD. To account for changes in cell type on changes in protein abundance between groups, we used our estimates of cell type changes to deconvolute this effect from changes in protein abundance [19, 29], and then reanalyzed our pairwise group comparisons of differentially abundant proteins after deconvolution. This approach has previously been applied to transcriptomic data to remove the confounding effects of cell type changes on gene expression , but to our knowledge has not previously been applied to proteomic data. Deconvolution of cell type changes reduced the number of proteins with significantly different abundance levels between disease states (Fig. 3c). The number of proteins with different abundance levels between control and AD was reduced after deconvolution by approximately a factor of six, suggesting that most of the changes in protein abundance observed between control and AD are driven by changes in brain cell type. A similar reduction in abundance changes was observed between control and AsymAD after deconvolution. Notably, however, the number of proteins with unique changes in abundance between AsymAD and AD showed only a small reduction—from 290 to 263 proteins—after deconvolution for cell type, suggesting that most of the changes in protein abundance between AsymAD and AD are not driven primarily by changes in brain cell type. Instead, these changes may reflect a “biochemical phase” of AD . There were slightly more proteins that were significantly lower in abundance compared to those that were higher in abundance in AD compared to AsymAD after cell type deconvolution (Additional file 14: Figure S7). Proteins that were elevated in AD compared to AsymAD included FABP7, SMOC1, and LTF, and tended to cluster in modules M4, M7, and M8. Those that were lower in AD compared to AsymAD included NPTX2, VGF, and GSTM1, and tended to cluster in modules M1, M2, and M3. Most of the modules in which the differentially abundant proteins between AsymAD and AD tended to cluster correlated with case status or AD pathology (Supplementary Data). GO network analysis of differentially abundant proteins between AsymAD and AD showed that many more protein ontologies became significant after cell type deconvolution, and existing ontologies identified in the unregressed analysis such as “cytoskeleton” became more significant (Fig. 3d and e). A GO network analysis of differentially abundant proteins between control and AsymAD cases before and after cell type deconvolution, representing protein changes early in the AD process, is provided in Additional file 15: Figure S8. In summary, these findings suggest that a majority of the differences in protein abundance between AsymAD and AD appear to be independent of simple brain cell type abundance changes, in contrast to the protein abundance differences between control and AsymAD and control and AD. Furthermore, proteins that change in abundance between AD and AsymAD are contained within modules that correlate with AD traits.
RNA binding protein enrichment in the AD brain TMT proteomic network
After cell type deconvolution of protein abundance changes, we noted with keen interest the preservation of RNA binding proteins as hubs of differentially abundant proteins between control and AsymAD (Additional file 15: Figure S8), and between AsymAD and AD (Fig. 3e and Additional file 16: Figure S9). We have previously reported that aggregation of RNA binding proteins that are a part of the cellular pre-mRNA splicing machinery, especially the U1 small nuclear ribonucleoproteins (snRNPs) such as U1-70K, is an early event in AD pathogenesis . The observation that RNA binding proteins emerged as hubs of differentially abundant proteins after cell type deconvolution prompted us to investigate whether certain TMT network modules were enriched in RNA binding proteins, and if so, whether these modules were associated with AD pathology. Upon examination of a number of different classes of RNA binding proteins, we found that modules 10, 15, 17, 18, 29, and 40 were significantly enriched with RNA binding proteins (Additional file 17: Figure S10A). Most of the RNA binding protein-enriched modules correlated with tau tangle burden as measured by Braak stage (Additional file 17: Figure S10B). Interestingly, our previous studies demonstrated that many of the U1 snRNPs colocalize with neurofibrillary tangles and paired helical filaments in AD brain [24, 32, 33, 34], and that accumulation of insoluble snRNPs correlates strongly with both amyloid and Tau pathology [32, 33, 34, 35, 36, 37]. Collectively, these data support the relevance of this class of proteins to AD pathogenesis. The finding of strong RNA binding protein enrichment in certain modules within the AD TMT network led us to question whether these same modules contained more alternatively spliced proteins whose abundances may change as a consequence of AD pathophysiology. Changes in RNA splicing leading to the expression of different protein isoforms may be a useful indicator of AD pathology and cause downstream cellular and network dysfunction leading to cognitive decline.
A Proteogenomic approach for the identification and quantification of alternative RNA splicing events in AD brain
Alternative splicing events associated with AD pathology and cognitive function
In order to examine which alt-EEjxn splicing events may be associated with progression of cognitive dysfunction from AsymAD to AD given the RNA binding protein abundance differences after cell type deconvolution between these two disease states, we performed a differential abundance analysis of alt-EEjxn peptides between AsymAD and AD. As shown in Fig. 4d, we found there were more alt-EEjxn peptides that were reduced in AD compared to AsymAD, similar to the total protein abundance differences between AD and AsymAD. Alt-EEjxns that were increased in AD were enriched in modules M4, M7, and M35, with M35 containing alt-EEjxns with the largest average change from AsymAD (Additional file 21: Figure S14). All of these modules were strongly glial in nature, with M35 a strongly astrocytic module. Tau had a number of alt-EEjxn peptides that were significantly increased in AD, and these mapped to the 3- and 4- microtubule binding domain repeat isoforms of the protein in this analysis because either isoform can be considered constitutively expressed in humans. Alt-EEjxns that were decreased in AD were most abundant in module M36—a module unique to the TMT network and without strong cell type character. We also analyzed specifically alt-EEjxns derived from the top twenty most significant common variant AD risk factor proteins identified from GWAS . We observed alt-EEjxn peptides from a total of five of these proteins (Additional file 6: Table S6). Three of the five GWAS proteins had alt-EEjxns that were different in abundance by case status, and included BIN1, PTK2B, and FERMT2 (Additional file 7: Table S7). In summary, we identified a number of alternative splicing decisions at the protein level in brain that significantly change in AD, including in AD risk factor proteins identified from GWAS. Those that were increased in AD tended to cluster in astroglial modules.
Alternative splicing events associated with modules enriched in AD risk factor proteins
Correlation of alternatively spliced proteins with TMT protein network modules enriched in AD risk factors
In this study we extended the depth of our proteomic network analysis of AD brain by approximately a factor of three using a new TMT-based analysis pipeline. The deeper protein coexpression network analysis revealed new protein modules that correlated with pathological measures of AD and were enriched in AD risk factors identified by GWAS. With this improved proteome coverage we were able to estimate the percentage of four different cell types within the brain and analyze how the abundance of these cell types changes in asymptomatic and symptomatic AD. We were also able to use these estimations of cell type changes to remove this potential confound from analysis of differential protein abundance changes in AsymAD and AD, and observed that most protein abundance changes between AsymAD and AD are not due to cell type changes. From this differential protein abundance analysis between AsymAD and AD we observed that RNA binding proteins were differentially altered between these two disease states, which led us to further analyze RNA binding proteins and alternatively spliced proteins within the TMT protein network. We found that RNA binding proteins clustered within specific network modules, and that some of these modules strongly correlated with molecular markers of AD and cognitive decline. Alternative exon-exon splicing events also tended to cluster within certain network modules, and some of these modules correlated with molecular markers of AD and cognitive decline. We identified a number of alt-EEjxn splicing events in AD GWAS risk factor proteins that were significantly altered in AD, as well as splicing events in other proteins that correlated with network modules enriched in AD risk factor proteins and were altered in AD.
The use of TMTs allowed us to perform an orthogonal offline prefractionation step prior to LC-MS/MS analysis while keeping MS analysis time within reasonable parameters through the ability to pool up to 11 tagged samples into a single batch prior to LC-MS/MS analysis. This approach has distinct advantages over standard “single-shot” LFQ analysis. Prefractionation significantly increases the depth of proteome coverage achievable by LC-MS/MS of complex tissues such as brain. TMTs also allow for relative protein measurements across multiple case groups within a single batch, minimizing the missing value problem for quantification across case groups. However, missing values are not eliminated in the TMT approach as it still relies on data-dependent acquisition techniques within each batch, and therefore not all batches contain the protein measurement of interest. Alternative approaches to protein quantification by mass spectrometry, such as data-independent acquisition [7, 40], may soon help to address the limitations on protein quantitation posed by data-dependent approaches. Nevertheless, we anticipate that further increases in the depth of proteome coverage in brain will be possible using a data-dependent TMT approach through advances in chromatography techniques and mass spectrometry instrumentation.
The increased depth of proteome coverage allowed us to build a protein coexpression network of AD brain that was significantly larger than our previous LFQ-based network . It is notable that the TMT-LysC protein coexpression network nearly completely recapitulated the LFQ network we previously published, despite the fact that the TMT-LysC network was generated using an entirely different analysis pipeline with a new quantification approach and different mass spectrometry instrumentation. This finding lends validity to the previous LFQ network generated from the BLSA cohort, and by extension to LFQ-based networks of other cohorts we have analyzed (, unpublished data). Many of the new modules in the TMT network were not strongly associated with a particular cell type, indicating that most cell type specific modules were captured in the previous LFQ network. However, a few unique modules did show significant cell type character, including M27, which was largely microglial in cell type character, and by protein membership was the most unique module in the TMT-LysC network compared to the previous LFQ-trypsin network. This module was also enriched for AD GWAS risk factors and correlated with AD pathology, demonstrating that further increases in the depth of brain proteome coverage have the potential to reveal additional protein coexpression modules that are relevant to AD pathophysiology. In the TMT network we also observed a number of new modules that appeared to be anti-correlated with disease, potentially reflecting AD “resilience” modules. One such area of the network was the related cluster of modules M47 to M26. This cluster tended to be associated with improved cognition and lower levels of tau tangles, p-tau, U1-70K, and TDP-43. Further mechanistic investigation into the drivers of these protein coexpression changes may provide insights into factors that protect against AD.
From the cell type analysis, we found that astrocytes and microglia increase in relative proportion between AsymAD and AD, suggesting that immune system activation or dysfunction may be a primary driver of cognitive decline in the setting of AD pathology. Astrocytes and microglia also correlated more strongly with tau tangle burden than with amyloid-β plaque load, illustrating the connection between inflammation and tangle formation. The correlation between inflammation and tangle formation has also been noted in other tauopathies, such as frontotemporal dementia and chronic traumatic encephalopathy [42, 43, 44, 45]. Interestingly, the neuron population decreased between control and AsymAD, with a further decrease between AsymAD and AD. It is not clear if synaptodendritic rarefaction may be driving this decreased measurement in cell population, or if it is actual neuron loss. Frank neuronal loss is often associated with late stages of the disease, and synapse loss in AD is thought to correlate with cognitive dysfunction. We expected the neuron population to correlated more strongly with tau tangle burden than with amyloid-β plaque burden given that tangle burden is more closely correlated with cortical atrophy and cognitive decline , but we observed that neurons correlated more strongly with amyloid plaques. Therefore, a discrepancy remains between the neuronal cell type data and disease state that warrants future investigation in a separate study cohort. We also noted an increase in oligodendrocytes in AsymAD, which is consistent with recent transcriptomic data suggesting alterations in oligodendrocyte and myelination biology in AD brain [30, 47]. Protein abundance differences between AsymAD and AD were largely preserved after adjustment for cell type changes, suggesting that perhaps these changes reflect a more “biochemical” phase of AD associated with cognitive dysfunction rather than a “cellular” phase of AD . One cause of such biochemical changes may be changes in RNA binding proteins as identified in our GO network analysis, and as previously described by our group . One of the most interesting RNA binding protein-enriched modules was M18. This module contained proteins often found in RNA granules , as well as proteins with low complexity domains such as U1-70K that bind to RNA and that have been associated with other neurodegenerative conditions such as frontotemporal dementia . M18 was significantly correlated with phosphorylated tau, U1-70K, TDP-43, and cognitive decline, but did not contain an overabundance of proteins from any of the four cell types we tested, unlike modules such as M4 and M7 which we have previously found to be astroglial and strongly associated with AD . One caveat regarding module correlation with cognitive decline in this analysis is that MMSE scores were skewed towards 30, suggesting that the cognitive time points captured from these individuals in the BLSA study were significantly removed from later disease stages. Future analyses using cohorts with more evenly distributed cognitive performance will be important to verify the cognitive associations reported here. Nevertheless, the MMSE-based cognitive correlations are likely correct in direction given their internal consistency with our previously published finding—validated here—that M4 and M7 correlate with progression from AsymAD to AD.
We developed an analytical pipeline to identify and quantify alternative exon-exon junctions at the protein level in brain. The databases we generated to identify alt-EEjxn peptides from brain were based on RNAseq data from relatively few control and AD brains from the University of Kentucky Brain Bank. However, most of the common alternatively spliced transcripts present in DLPFC control and AD brain were likely represented in this database. Adding RNAseq data from additional brains would perhaps uncover more rare local splicing variations, and will be a focus of future work. In our analysis of alt-EEjxns we observed a number of local splice variants that have not yet been documented to exist at the protein level in any human tissue. Because brain contains a large number of alternatively spliced proteins , we consider it likely that deeper characterization of protein splice variants in brain will uncover even more local splice variations that are translated into protein, with some being potentially relevant to disease. Our comparison between LFQ-trypsin and TMT-LysC analytical pipelines found that the TMT-LysC approach was slightly superior to LFQ-trypsin for quantification of alt-EEjxns, and even better when quantifying alt-EEjxns across case groups. However, for simply validating the existence of a particular alt-EEjxn at the protein level, LFQ-trypsin was superior to TMT-LysC. This is likely the case because trypsin digestion is more efficient than LysC and provides deeper coverage of the proteome, despite the fact that the number of peptides containing exon-exon splice junctions are reduced with trypsin digestion due to the overabundance of basic amino acid residues at splice junctions . A significantly deeper “bottom-up” analysis of local splicing variation at the proteomic level will likely require multiple and orthogonal enzymatic digestion approaches. As a case-in-point, we observed only 4 alt-EEjxns at the protein level out of a possible 74 alt-EEjxns at the mRNA transcript level in PICALM. It is unclear how many of these local splice variants are translated into protein rather than undergo nonsense-mediated decay, but it seems likely based on abundant steady state transcript levels that a majority of these splice isoforms are translated into protein. The use of an orthogonal digestion approach for better splice junction coverage is also supported by the observation that trypsin and LysC identified largely separate subsets of alt-EEjxns, both for junctions currently annotated in protein databases and for those observed only in RNAseq or expressed sequence tag data. It should be noted that our quantitative analysis of alt-EEjxns was necessarily limited to the peptide level, and therefore the analysis is best considered to represent quantification of alternative splicing “decisions” in brain, whereby many separate splicing decisions may contribute in a combinatorial fashion to the generation of different protein isoforms.
From the alt-EEjxns we identified, those that were elevated in AD compared to AsymAD tended to cluster into modules that were microglial or astrocytic in nature. It is possible that the increase in these cell types in AD lead to a relative increase in alt-EEjxns that are otherwise translated at a low baseline level, and development of an algorithm to potentially exclude this effect, similar to cell type deconvolution for total protein levels, would be a welcome advance for alt-EEjxn analysis. Alternatively, splicing decisions may change systematically and may also underlie phenotype changes among the astroglial population of cells in brain. A future analysis to probe the extent of splice decision “switching” in AD, whereby an alt-EEjxn is favored at the expense of the canonical junction, would also be informative. A number of alt-EEjxns in AD GWAS risk factor proteins were elevated in AD, including in BIN1 and PTK2B. The functional relevance of these alternative exon-exon splicing decisions in these and other AD risk factor proteins remains to be determined. We found that alt-EEjxns at a global level tended to cluster into the unique area of the TMT-LysC network, but did not significantly overlap with modules enriched in RNA binding proteins. Although we have previously observed that snRNP alterations are associated with deficits in RNA splicing in AD brain [24, 32]—a finding recently confirmed by others —the fact that there was little overlap with RNA binding proteins at the network level suggests that abundance levels of RNA binding proteins do not correlate directly with levels of alternative splicing. Rather, it is likely that only certain types of RNA binding proteins directly affect alternative splicing decisions. We assessed for protein components of the U1 spliceosome complex in our enrichment analysis, but we did not find strong enrichment of these proteins in the network. This may be due to the fact that U1 spliceosome proteins undergo a dramatic shift in solubility in AD brain, and aggregate in close proximity to neurofibrillary tangles [24, 32, 33, 34]. Module 29 contained five snRNPs and was annotated as being involved in mRNA splicing by GO analysis, but did not show enrichment of alt-EEjxn peptides. The relationship between RNA protein abundance and alternative splicing remains an area for future investigation.
We developed a TMT-based quantification pipeline for proteomic analysis of brain tissue that significantly increased our depth of proteome coverage of control and AD brain and led to additional insights into the protein changes that characterize AD pathophysiology, including changes in RNA splicing. Future advances in alternative protein isoform analysis by mass spectrometry will undoubtedly shed further light on this “dark matter” of the proteome and its role in AD.
We are grateful to participants in the Baltimore Longitudinal Study of Aging for their invaluable contribution. This study was supported in part by the intramural program of the National Institute on Aging (NIA).
Support for this study was provided by grants from the Accelerating Medicine Partnership AD (U01AG046161–02), the National Institute on Aging (R21AG054206, 5R01AG053960, RF1AG057470, and RF1AG057471), the NINDS Emory Neuroscience Core (P30NS055077), the Johns Hopkins Alzheimer’s Disease Research Center (P50AG05146), and the Emory Alzheimer’s Disease Research Center (P50AG025688). N.T.S. was supported in part by a Biomarkers Across Neurodegenerative Diseases grant (11060) funded by the Alzheimer’s Association (ALZ), Alzheimer’s Research UK (ARUK), The Michael J. Fox Foundation for Parkinson’s Research (MJFF), and the Weston Brain Institute. J.C.T was supported by the BrightFocus Foundation (A2015332S). This research was also supported in part by the Intramural Research Program of the NIH, National Institute on Aging.
Availability of data and materials
Protein and peptide master tables, protein and alternative exon-exon junction correlations, and TMT network module correlations are available at the Synapse Web Portal (https://www.synapse.org; https://www.synapse.org/#!Synapse:syn16816734/wiki/583834; https://doi.org/10.7303/syn16816734). All raw proteomic data generated contributing to the described work is also deposited electronically at the Synapse Web Portal ( https://doi.org/10.7303/syn2580853) in accordance with data sharing policies established by the NIH Accelerating Medicine Partnership (AMP) AD consortium. Specific software will be made available upon request.
Conceptualization, ECBJ, EBD, NTS, AIL, JJL,; Methodology, DMD, EBD, ECBJ, and NTS; Investigation, DMD, LY; Formal Analysis, EBD, ECBJ; Writing – Original Draft, ECBJ, EBD; Writing – Review & Editing, ECBJ, EBD, DMD, NTS, JJL, and AIL; Funding Acquisition, AIL and NTS; Resources, JCT, and MT. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.Prince M, Wimo A, Guerchet M, Ali G, Wu Y, Prina M. World Alzheimer report 2015: the global impact of dementia. In book world Alzheimer report 2015: the global impact of dementia (editor ed.^eds.). City: Alzheimer's Disease International; 2015.Google Scholar
- 10.Ping L, Duong DM, Yin L, Gearing M, Lah JJ, Levey AI, Seyfried NT: Global quantitative analysis of the human brain proteome in Alzheimer’s and Parkinson’s Disease. Nature Scientific Data 2018.Google Scholar
- 28.Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, Iwatsubo T, Jack CR Jr, Kaye J, Montine TJ, et al. Toward defining the preclinical stages of Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7:280–92.CrossRefGoogle Scholar
- 34.Hales CM, Seyfried NT, Dammer EB, Duong D, Yi H, Gearing M, Troncoso JC, Mufson EJ, Thambisetty M, Levey AI, Lah JJ. U1 small nuclear ribonucleoproteins (snRNPs) aggregate in Alzheimer's disease due to autosomal dominant genetic mutations and trisomy 21. Mol Neurodegener. 2014;9:15.CrossRefGoogle Scholar
- 39.Davison EJ, Pennington K, Hung CC, Peng J, Rafiq R, Ostareck-Lederer A, Ostareck DH, Ardley HC, Banks RE, Robinson PA. Proteomic analysis of increased Parkin expression and its interactants provides evidence for a role in modulation of mitochondrial function. Proteomics. 2009;9:4284–97.CrossRefGoogle Scholar
- 40.Hu A, Noble WS, Wolf-Yadlin A: Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 2016, 5.Google Scholar
- 52.Raj T, Li Y, Wong G, Ramdhani S, Wang Y-c, Ng B, Wang M, Gupta I, Haroutunian V, Zhang B, et al: Integrative analyses of splicing in the aging brain: role in susceptibility to Alzheimer's disease. bioRxiv 2017.Google Scholar
- 53.Serrano-Pozo A, Qian J, Muzikansky A, Monsell SE, Montine TJ, Frosch MP, Betensky RA, Hyman BT. Thal amyloid stages do not significantly impact the correlation between Neuropathological change and cognition in the Alzheimer disease continuum. J Neuropathol Exp Neurol. 2016;75:516–26.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.