1 Introduction

Alzheimer’s disease (AD) is a chronic neurodegenerative disease. It is clinically manifested as amnesia, loss of mobility and language ability, etc. [1] The World Alzheimer Report 2016 issued by Alzheimer’s Disease International proposed that the number of AD patients worldwide would increase from 47 million to 132 million in 2050 [2]. The pathogenesis of AD remains elusive and the course of AD is irreversible. No drugs available can cure AD or completely stop the progression of AD. Therefore, early diagnosis of AD is of great significance for developing new drugs and measures to prevent further deterioration of the disease.

Mild cognitive impairment (MCI) is a state between AD and healthy controls (HC), which can be subdivided into MCI patients who will convert to AD (MCIc) and MCI patients who will not convert to AD (MCInc). Previous studies have demonstrated that MCI could be more likely to convert to AD [3]. At present, many researchers have attempted to deliver an early diagnosis of MCIc. How to accurately diagnose the current stage of disease has become the focus of early diagnosis of AD.

With the rapid advancement of neuroimaging technology, magnetic resonance imaging (MRI) has been widely applied in the diagnosis of AD. In recent years, deep learning has been successfully applied in multiple domains. It integrates low-level features to form more abstract high-level representations to discover the distributed feature representations of data [4]. Deep learning model include stacked autoencoder (SAE) [5], deep belief network (DBN) [6] and deep convolutional neural network (CNN) [7], etc. At present, an 8-layer CNN structure proposed in [8] was utilized to deliver differential diagnosis based on MR images, aiming to effectively improve the accuracy and stability of the model for early diagnosis of AD.

Imaging genomics is a research hotspot which emerges with the development of high-throughput sequencing technologies and multimodal neuroimaging techniques. The main purpose is to obtain effective associations between traits, e.g. multimodal imaging features, and genetics variants, such as single nucleotide polymorphisms (SNPs) [9]. Researches have suggested that it is very likely that the genetic makeup of an individual may influence his/her susceptibility to AD traits. Thus, the research on genetic biomarkers of AD is of clinical significance. Here, on the basis of the MR imaging and genotype data from the enrolled subjects, the certain loci and genes, which were considered as candidate genetic biomarkers of AD, were eventually identified with the help of deep learning and genome-wide association studies (GWAS).

2 Materials and Methods

2.1 Dataset and Pre-processing

ADNI Database. Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).

In this study, the following steps were followed to discover the potential genetic biomarkers for early diagnosis of AD: (1) to determine the quantitative traits, including the brain regions involved in GWAS; (2) to select the subjects involved in the GWAS; (3) to obtain the phenotypic values (i.e. volumes) of the determined brain regions for the selected subjects; (4) to acquire the genotype data (i.e. SNPs) for the selected subjects; (5) to perform the quality control of genetic data; (6) to complete the GWA studies and generate the plots and tables.

Here, the early diagnosis of AD was further divided into three binary classification problems, i.e. AD vs. HC, MCIc vs. HC and MCIc vs. MCInc. An ensemble model of multi-slice classifiers based on CNN was proposed to make an early diagnosis of AD and at the same time to identify the significant brain regions related to AD. In each of binary classification problems, the five single-slice base classifiers with the best generalization results for each orientation (coronal, sagittal, or transverse) were selected using the verification set to yield a multi-slice classifier. Three multi-slice classifiers (i.e. coronal, sagittal, and transverse) constituted the ensemble model of multi-slice classifiers. The slices in the three orientations corresponding to the three multi-slice classifiers could form multiple intersection points in the MNI space. With the help of the Atlas, the brain regions with the most mapped slice intersection points in the three binary classification experiments were identified as those with the most contributions to the early diagnosis of AD. Subsequently, the morphological data of these identified brain regions and the genotype were utilized to carry out GWAS to explore the potential genetic biomarkers of AD.

MRI Data and Pre-processing. In this study, the MRI data was downloaded from ADNI database according to ImageID indicated in the appendix of [10], including 137 AD subjects, 76 MCIc subjects, 134 MCInc subjects and 162 HC (Health Control) subjects. Descriptions of the 509 subjects are shown in Table 1, including gender, age, weight, MMSE (Mini-Mental State Examination) score and CDR (Clinical Dementia Rating Scale) score, GDS (Geriatric Depression Scale) score. As the training and test set, the MRI data set was employed to train the single-slice base classifiers and to test the built ensemble model of multi-slice classifiers. In addition, the data set was utilized to generate phenotypic values for the quantitative traits (i.e. the volumes of the selected brain regions) as well.

Table 1. Descriptions of the subjects in the training and test dataset.

Plus, the MR images of additional 100 AD subjects, 100 HC subjects, 39 MCIc subjects and 39 MCInc subjects were downloaded from ADNI database as the verification set because the experimental setting required the validation set to screen out the single-slice base classifiers with the excellent generalization capabilities and the corresponding 2D slices while the proposed ensemble model of multi-slice classifiers based on CNN was built in a data-driven way. Descriptions of the 278 subjects in the validation set are shown in Table 2.

Table 2. Descriptions of the subjects in the validation dataset.

Here, CAT12 toolkit was firstly utilized for MR image pre-processing, which included skull removal, registration to MNI standard space and image smoothing. The default values were employed as the parameters of pre-processing procedures while the CAT12 toolkit was used. After pre-processing, all MR images were \(121\times 145\times 121\) in size and 1.5 mm in spatial resolution. And then, the gray scale values of each MRI were normalized to reduce the difference in the absolute values of image gray scale of different tissues while preserving the difference in the gray scale with diagnostic value, which enabled the CNN model to be more easily converged.

And then, 2D slices were utilized as the training data. Hence, 3D MR images were subject to slicing processing. For convenience of description, the vertical directions of the sagittal, coronal and transverse planes of 3D MR images were denoted as X-axis, Y-axis and Z-axis, and the coordinate ranges on the three axes were \([1,121], [1,145]\ \text {and}\ [1,121]\), respectively. Thus, each 3D image of a subject was re-sliced into three 2D image sets, each of the sagittal, coronal, or transverse orientation (with X, Y, and Z axes perpendicular to the sagittal, coronal, and transverse planes, respectively). A preprocessed 3D MRI image (of \(121\times 145\times 121\)) was thus re-sliced into 121 sagittal, 145 coronal, and 121 transverse 2D slices. The sizes of the sagittal, coronal, and transverse slices obtained through re-slicing were \(145\times 121\), \(121\times 121\), and \(121\times 145\), respectively. Each of the 2D slice was reformatted to \(145\times 145\) using edge padding and zero filling, so that the 2D slice is squared, while the center and the spatial resolution of the resized image remained unchanged. The overall pre-processing procedure is illustrated in Fig. 1.

Fig. 1.
figure 1

Pre-processing procedure.

Phenotype Data. The “Brainetome Atlas [11]” was imported into FreeSurfer [12] to automatically calculate the morphological parameters of the brain regions of 509 subjects on the basis of their MR images. The volume values of 246 brain regions of each subject could be obtained with FreeSurfer. Due to the errors occurring while processing the MR images of two subjects using FreeSurfer, for each of only 507 subjects, the volume values of 246 brain regions were eventually acquired to help prepare for phenotype data for GWA Studies on the basis of the 507 MR images.

2.2 Gene Data and Pre-processing

In this study, the genetic data in PLINK [13] format was also downloaded from ADNI database. In this downloaded dataset, 620,901 SNPs were collected from 757 subjects. The information about the subject and the gene were stored in three files with .bim, .fam and .bed suffixes, respectively.

For GWAS analysis, both genotype data and phenotype data of a subject should be available at the same time. Thus, a total of 458 subjects with both SNPs and morphological parameters were selected to complete the subsequent GWA Studies.

To limit the nuisances, such as missing data and population stratification, we did quality control of genetic data in this study. With the PLINK package, the following eight steps were done. (1) Screening subjects with the heterozygosity rate; (2) Screening subjects with the locus deletion rate; (3) Screening locus with the deletion rate of locus; (4) Filtering locus based on the Hardy-Weinberg equilibrium law; (5) Filtering locus based on linkage disequilibrium; (6) Screening subjects with individual independence; (7) Obtaining an eigenvector matrix with principal component analysis; (8) Correcting the population stratification by using the eigenvector matrix. And then, the acquired genetic data were used for the GWAS analysis.

2.3 Experiment

Imaging Datasets. Among the downloaded 787 MR images from the 787 subjects in the ADNI database, 509 and 278 MR images were used as the training and test set and as the validation set, respectively. Five-fold cross-validation method was adopted. The validation dataset was NOT involved in training the single-slice base classifiers or testing the built ensemble model of multi-slice classifiers, but was ONLY utilized to screen the trained base classifiers in order to prevent from the potential data leakage among three binary classification tasks. Here, the subjects were classified into four groups, i.e. AD, MCIc, MCInc and HC.

Data Augmentation. In order to acquire a CNN model with good generalization capability, a large number of images are usually required. If only the original slices were directly utilized to train the base classifiers of CNN, the amount of data was far from sufficient. Hence, the data augmentation (DA) was employed. New slices were generated from the original slices applying the following six operations: rotation, translation, gamma correction, random noise addition, scaling and random affine transformation. For example, in the binary classification experiment of MCIc vs. HC, for each slice from HC, ten new slices were generated with each of the six operations. Thus, the total number of slices from HC has multiplied by 61 times after DA.

Base Classifier. A base classifier is a model for learning the features from a single 2D slice, and its structure is the 8-layer CNN classifier described in [8], as demonstrated in Fig. 2.

Fig. 2.
figure 2

Base classifier structure.

Genome-Wide Association Study (GWAS). Because each of MRI slices was individually employed to train a corresponding single-slice base classifier, such a specific arrangement enabled us to respectively select the slices with the largest contributions to the AD classification in each orientation (coronal, sagittal, or transverse) with the help of the validation set. These selected slices in the three orientations formed a slice grid in the MNI space, and accordingly, multiple intersection points. Since the intersection points were located in the slices with the largest contributions to the AD classification in the three orientations at the same time, there is no reason to doubt that an intersection point is able to act as a valid proxy for the ability of a brain region in which the point is located to classify AD. Thus, the number of those points could be used to rank the brain regions they are located in according to the contributions to classification of AD from each of these brain regions.

In this way, 125 points in the MNI space were determined by the top five sagittal, coronal, and transverse slices. And then, these 125 points were mapped onto brain regions using the Brainnetome Atlas. Thus, the ability of a brain region to help classify AD was assessed with the number of intersection points located in that region. The brain regions with the intersection points in three binary classification experiments were summarized and ranked altogether. Accordingly, ten brain regions with the most mapped intersection points served as the brain regions significantly associated with AD. The volume values of the 10 brain regions, which were screened out among those of 246 brain regions from 458 subjects, were acted as the phenotype to be analyzed in the subsequent GWA Studies.

After the phenotype and genotype data were converted to PLINK format files, we did the GWAS experiment using PLINK package. During GWAS experiment, when the phenotype to be analyzed was qualitative, Logistic regression model was mainly employed. When the phenotype to be analyzed was quantitative, linear regression model was primarily used. Since the volume of a brain region was quantitative, linear regression model was chosen for GWAS experiment in this study.

3 Results and Discussion

3.1 Phenotype Results

Prior to GWAS experiment, the phenotype to be analyzed should be acquired. For the built ensemble model of multi-slice classifiers, the classification accuracy on the test set in the three binary classification experiments were AD vs. HC 81%, MCIc vs. HC 79% and MCIc vs. MCInc 62%, respectively. Meanwhile, the brain regions with the mapped slice intersection points in the three binary classification experiments were summarized and ranked, as illustrated in Fig. 3 Here, the Y-axis denoted the brain region labels from Brainnetome Atlas, and the X-axis represented the total number of slice intersection points mapped into a brain region in the three binary classification experiments.

Fig. 3.
figure 3

The number of intersection points in each significant brain region.

According to the statistical results in Fig. 3, the top 10 brain regions, i.e. R.rHipp, L.rHipp, R.mAmyg, L.A21r, L.A22r, L.A20cv, L.mAmyg, R.34, L.A37lv and R.lAmyg, were determined.

3.2 GWAS Results

The volumes of the top 10 brain regions utilized as the phenotype in the GWA Studies, and the 10 corresponding GWAS experiments were conducted. Linear regression model was adopted to perform correlation analysis on the genotype and phenotype data to obtain the significant correlation (P value) between each SNP and phenotype. Subsequently, Manhattan plots of the obtained P values for the association between SNPs and the volumes of the 10 brain regions were shown in Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13, respectively.

Fig. 4.
figure 4

Genome-wide association analysis results of region R.rHipp.

Fig. 5.
figure 5

Genome-wide association analysis results of region L.rHipp.

Fig. 6.
figure 6

Genome-wide association analysis results of region R.mAmyg.

Fig. 7.
figure 7

Genome-wide association analysis results of region L.A21r.

Fig. 8.
figure 8

Genome-wide association analysis results of region L.A22r.

Fig. 9.
figure 9

Genome-wide association analysis results of region L.A20cv.

Fig. 10.
figure 10

Genome-wide association analysis results of region L.mAmyg.

Fig. 11.
figure 11

Genome-wide association analysis results of region R.34.

Fig. 12.
figure 12

Genome-wide association analysis results of region L.A37lv.

Fig. 13.
figure 13

Genome-wide association analysis results of region R.lAmyg.

SNPs with high correlation with the volumes of the ten brain regions were summarized from the 10 GWAS experiments, as demonstrated in Table 3. That could help provide clues to pathogenesis of this complex brain disorder. Here, the negative logarithm of P value was used to determine the statistical significance of the associations between variants and traits. The genome wide significance p-value threshold of \(1\times 10^{-5}\) was adopted.

According to the experimental results, the fact that the SNP (rs2451078) located in the gene transmembrane phosphoinositide 3-phosphatase and tensin homolog 2 (TPTE2) was of high significance were observed in the seven of ten GWAS experiments, and the corresponding negative logarithm of P values were extremely high. It might suggest that rs2451078 was closely correlated with the volumes of these brain regions and would probably serve as a potential AD genetic biomarker. In the paper “Genome-Wide association study identifies candidate genes for Parkinson’s disease in an Ashkenazi Jewish population” [14], the SNP (rs2451078) that reached genome-wide significance with \(\mathrm{p} < 1.94 \times 10^{-10}\) was identified in the CIDR/Pankratz et al. 2009 dataset [15], though it was not replicated in the Ashkenazi Jewish [14] or a second dataset. Plus, the SNP (rs2451078) were found to be allegedly located on autosomes but that exhibited significantly different genotype frequencies in men and women [16].

Besides, rs10496214 and rs17016520, located in altogether LOC107985902 gene, were observed twice in the ten GWAS experiments, which could be worthy of further investigation. Moreover, in the patent “Genes associated with schizophrenia identified using a whole genome scan” [17], rs10496214 was identified as schizophrenia SNPs significantly associated and concordant in both collections, i.e. the Munich and Aberdeen Collections which consisted of 438 cases and 414 controls, and 440 cases and 453 controls, respectively. The SNP rs17016520 has not been studied so far.

In summary, rs2451078, rs10496214, rs17016520 as well as TPTE2 and LOC107985902 were identified as the potential genetic biomarkers of AD in this study, which could deserve further in-depth research and validation to offer certain insights for AD studies.

Table 3. The details of significant SNPs.

4 Conclusion

First of all, in order to obtain the phenotype, a CNN-based ensemble model of multi-slice classifiers for early diagnosis of AD was proposed, which could be used for computer-aided diagnosis of diseases in clinical settings as well. In this investigation, the features of the model are as follows:

  1. (1)

    In this model, six different operations were adopted to perform data augmentation on the original MRI slices, which significantly increased the number of training samples and made the sample size of two classes of images almost the same in the data sets used in a binary classification experiment after data augmentation.

  2. (2)

    Conventionally, a specific slice from MR images was selected to build 2DCNN-based models for early diagnosis of AD according to prior experience, which highly depended on domain knowledge and had certain limitations. In this advocated model, selected multiple sagittal, coronal, and transverse slices were employed to train an ensemble model of multi-slice classifiers. Moreover, these slices did NOT need to be pre-specified on the basis of domain knowledge but were chosen in a data-driven way. After all, for the same brain region, the morphologies significantly varied from orientation to orientation. Combining the sagittal, coronal, and transverse slices together further improved the classification accuracy and stability.

  3. (3)

    In comparison with that based on the three-dimensional images, the proposed ensemble model based on two-dimensional slices did NOT have high hardware requirements. In addition, each base classifier was independently trained, which considerably enhanced the training efficiency in a parallel mode.

Secondly, as a model for phenotype acquisition, the advocated model for early diagnosis of AD was good at pinpointing the brain regions significantly associated with AD. Here, an intersection point determined by the discriminable sagittal, coronal, and transverse slices acted as a valid proxy for the ability of a brain region in which the point was located to classify AD. Thus, the brain regions with most intersection points were considered as those mostly contributing to the early diagnosis of AD. In this way, the 10 brain regions with high correlation with AD were identified.

Finally, while the morphological data (i.e. volumes) of these 10 brain regions acted as phenotype in the GWA Studies, three loci including rs2451078, rs10496214 and rs17016520, as well as two genes of TPTE2 and LOC107985902 were identified, which might serve as potential genetic biomarkers of AD and offer some clues to subsequent AD research. Moreover, the SNPs rs2451078 and rs10496214 have been studied in research on other brain disorders, e.g. Parkinson disease and schizophrenia, while the SNP rs17016520 has not been investigated so far.