1 Introduction

Obsessive-compulsive disorder (OCD) is a mental disease characterized by compulsive thoughts or behaviors, which often causes negative impacts on the daily life of patients [1]. According to clinical studies, OCD is hereditary and the siblings show similar symptoms. About 2% to 3% of people are affected by this disease in the world. However, there are still no accurate physiological and biochemical indicators for the diagnosis of patients with OCD in the clinic. Also, OCD often co-occurs with depression and anxiety, which may cause misdiagnosis [2].

For accurate and objective OCD diagnosis, it is known that resting functional magnetic resonance imaging (rs-fMRI) can show a steady-state pattern of brain co-activation. To achieve this, the brain functional connectivity network (BFCN) is first built from rs-fMRI to understand the functional interactions among the brain areas. Recently, many BFCN methods have been proposed. For example, Sen et al. [2] combined Pearson’s correlation (PC) network and adjacent matrices features selected by minimum redundancy maximum relevance method for OCD diagnosis. However, this method only considers the relationship between the two brain regions, which ignores the relationship among the target brain region and other multiple brain regions. To enhance it, Xing et al. [3] proposed the Riemann kernel to build BFCN and used principal components analysis (PCA) to reduce feature dimensions. However, this BFCN is too dense to represent features well. To construct a BFCN with less density, Wee et al. [4] proposed a group-constrained sparse (GCS) model to construct BFCN for mild cognitive impairment identification. Although this method removes a lot of irrelevant information, the data dimension of the BFCN features is still very high. Also, this method ignores the similarity among subjects. There are some commonly used methods (e.g., Lasso, PCA) for reducing the data dimension, but these methods cannot learn the inside relation of BFCN features. First, we propose a smoothing sparse network (SSN) to construct the BFCN based on GCS method, which can control the BFCN density and add similarity constraints among subjects.

The deep learning method has witnessed great success by addressing the issue of dimensionality curse. For example, Chen et al. [5] used the sparse auto-encoder (SAE) in polarimetric synthetic aperture radar image for reducing the data dimension. The stacked sparse auto-encoder (SSAE) can stack multiple SAEs to learn high level features and reduce the data dimension. This method has achieved good results for nuclei detection [6]. Inspired by these methods, we propose a novel method which combines the techniques of traditional machine learning and deep learning for OCD diagnosis. Specifically, the features extracted from BFCN are fed to the \( \ell_{2} \) regularized SSAE to learn high level features, which can express disease-related features and reduce the data dimension as well. Our method can learn the nonlinear relationship inside the feature to reduce feature dimension and the high level features are exploited for OCD diagnosis. Our method can not only consider the similarities of the subjects, but also learn the advanced features in BFCN to reduce the dimensions of the data. Experimental results on our self-collected data show that our method has achieved quite promising performance.

2 Methodology

2.1 Proposed Framework

Figure 1 shows our framework combining the traditional machine learning and deep learning techniques for OCD diagnosis. Firstly, we preprocess the original rs-fMRI data in a standard way. Secondly, we construct BFCN by the SSN method. Then, SSAE is applied to learn high level features, which can reduce the feature dimension effectively and enhance the feature representation ability for final classification.

Fig. 1.
figure 1

The flow chart of our method.

2.2 Data Acquisition and Image Preprocessing

A Philips Medical Systems with 3.0-T MR was used for data acquisition. Subjects were instructed to relax with his eyes closed and remain awake without moving. The parameters are defined as follow: TR = 2000 ms; TE = 60 ms; flip angle = 90°, 33 slices, field of view = 240 mm × 240 mm, matrix = 64 × 64; slice thickness = 4.0 mm. The Statistical Parametric Mapping toolbox (SPM8), and Data Processing Assistant for Resting-State fMRI (DPARSFA, version 2.2) were used to preprocess the data. We discard the first 10 rs-fMRI volumes of each subject before any further processing to keep the magnetization equal. The remaining 170 volumes are corrected by the staggered sequence of slice collection, which takes advantage of the echo planar scan to ensure that the data on each slice corresponds to the unanimous point in time. The image preprocessing including: slice timing correction; head motion correction; realignment with the corresponding T1-volume; nuisance covariate regression (six head motion parameters, white matter signal and cerebrospinal fluid signal); spatial normalization into the stereotactic space of the Montreal Neurological Institute and resampling at 3 × 3 × 3 mm3; spatial smoothing with a 6-mm full-width half-maximum isotropic Gaussian kernel, and band-pass filtered (0.01–0.08 Hz).

The rs-fMRI is divided into 116 regions of interest (ROIs) using the automatic anatomical labeling (AAL) template. In addition, a high-pass filter is used to refine the average rs-fMRI time series of each brain region. Furthermore, we regress out head movement parameters, cerebrospinal fluid, and mean BOLD time series of the white matter. We extract the mean of the BOLD signal as the original rs-fMRI signal [7].

2.3 Smoothing Sparse Network

In this paper, matrices are represented in bold capital letters, the vectors are in bold lowercase letters, and the scalars are in normal italic letters. Assuming that there are N subjects and \( {\mathbf{X}} \) = \( \left[ {{\mathbf{x}}_{1} , \ldots ,{\mathbf{x}}_{r} , \ldots ,{\mathbf{x}}_{R} } \right] \in {\mathbb{R}}^{R \times N} \) denotes our input data, the AAL template is utilized to divide the brain into \( R \) ROIs and the \( r \)-th ROI with a BOLD regional mean time series (\( M \) length) is represented \( {\mathbf{x}}_{r}^{n} = \left[ {x_{1r}^{n} ,x_{2r}^{n} , \ldots ,x_{Mr}^{n} } \right] \in {\mathbb{R}}^{M \times 1} \). \( {\mathbf{A}}_{r}^{\varvec{n}} \) denotes all ROIs signal matrix except \( {\mathbf{x}}_{r}^{n} \), \( {\mathbf{A}}_{r}^{\varvec{n}} = \left[ {{\mathbf{x}}_{1}^{n} , \ldots {\mathbf{x}}_{r - 1}^{n} , \ldots {\mathbf{x}}_{r + 1}^{n} , \ldots {\mathbf{x}}_{R}^{n} } \right] \), \( {\mathbf{w}}_{r}^{n} \in {\mathbb{R}}^{R - 1} \) is a weighting regression coefficient vector, and \( {\mathbf{W}}_{r} = \left[ {{\mathbf{w}}_{r}^{1} , \ldots {\mathbf{w}}_{r}^{n} , \ldots ,{\mathbf{w}}_{r}^{N} } \right] \). The sparse networks used to represent brain functional connectivity can be constructed using GCS, which is defined as

$$ J({\mathbf{W}}_{r} ) = \mathop {\hbox{min} }\nolimits_{{W_{r} }} \frac{1}{2}\mathop \sum \nolimits_{n = 1}^{N} \left| {\left| {{\mathbf{x}}_{r}^{n} - {\mathbf{A}}_{r}^{n} {\mathbf{w}}_{r}^{n} } \right|} \right|_{2}^{2} + R_{g} ({\mathbf{W}}_{r} ), $$
(1)

where \( R_{g} ({\mathbf{W}}_{r} ) \) is a group regularization and defined as

$$ R_{g} ({\mathbf{W}}_{r} ) = \lambda_{1} \left| {\left| {{\mathbf{W}}_{r} } \right|} \right|_{2,1} = \lambda_{1} \mathop \sum \nolimits_{d = 1}^{R - 1} \left| {\left| {{\mathbf{w}}_{r}^{d} } \right|} \right|_{2} , $$
(2)

where \( \lambda_{1} \) is the group regularization parameter, \( \left| {\left| {{\mathbf{W}}_{r} } \right|} \right|_{2,1} \) is the summation of \( l_{2} \)-norm of \( {\mathbf{w}}_{r}^{n} \). Specifically, we jointly select information by \( R - 1 \) ROIs’ weights. \( {\mathbf{w}}_{r}^{d} \) is the \( d \)-th row vector of \( {\mathbf{W}}_{r} \). As a sparse regression network method, GCS ensures all models in the unequal group with identical connections. The \( l_{2} \)-norm is imposed on identical elements across the unequal matrix \( {\mathbf{W}}_{r} \), which forces the weight corresponding to connections across different subjects to be grouped together. The constraint imposes a common connection topology among subjects, and leverages variation of connection weights among them. Therefore, the model is able to rebuild the target ROI using the remaining ROIs. Moreover, the reconstruction of each ROI is independent from others. However, the existing GCS model with penalty ignores the smoothing properties of different subjects within the model. To overcome this drawback, a novel model is devised to jointly learn shared functional brain networks of each subject by the group sparse regularization and the smoothness regularization. The objective function is defined as

$$ J({\mathbf{W}}_{r} ) = \mathop {\hbox{min} }\nolimits_{{W_{r} }} \frac{1}{2}\mathop \sum \nolimits_{n = 1}^{N} \left| {\left| {{\mathbf{x}}_{r}^{n} - {\mathbf{A}}_{r}^{n} {\mathbf{w}}_{r}^{n} } \right|} \right|_{2}^{2} + R_{g} ({\mathbf{W}}_{r} ) + R_{s} ({\mathbf{W}}_{r} ), $$
(3)

where \( R_{g} ({\mathbf{W}}_{r} ) \) is the group regularization, and \( R_{s} ({\mathbf{W}}_{r} ) \) denotes the smoothness regularization, which is denoted as

$$ R_{s} ({\mathbf{W}}_{r} ) = \lambda_{2} \mathop \sum \nolimits_{n = 1}^{N - 1} \left| {\left| {{\mathbf{w}}_{r}^{n} - {\mathbf{w}}_{r}^{n + 1} } \right|} \right|_{1} , $$
(4)

where \( \lambda_{2} \) is the parameter of smoothness regularization. The second term \( \left| {\left| {{\mathbf{w}}_{r}^{n} - {\mathbf{w}}_{r}^{n + 1} } \right|} \right|_{1} \) constrains the diversity between two consecutive weighting vectors from the same groups to be as small as possible. When the smoothness regularization parameter \( \lambda_{2} \) is zero, the proposed method reproduces the original GCS method. Due to the use of \( l_{1} \)-norm in the fused smoothness term that encourages the sparsity on difference of weight vectors, there will be a lot of zero components in the weight difference vectors. The informative features will be selected due to the non-zero weights in our task. We introduce the smoothness terms to smooth the connectivity coefficients of the subjects. In addition, we fuse the regularization terms to impose a high level of constraints. We call this sparse learning model as smoothing sparse network. The asymmetrical BFCN does not contribute to the final classification accuracy. Hence, \( {\mathbf{W}}^{ *} = \left( {{\mathbf{W}}_{n} + {\mathbf{W}}_{n}^{\text{T}} } \right) \)/2 is defined to obtain symmetry. The local clustering coefficients of weighted graphs are used to extract features from each established BFCN [8, 9].

2.4 Stacked Sparse Auto-Encoder

Auto-encoder (AE) mainly consists of an encoding network and a decoding network, which is a symmetric neural network with only one hidden layer. The encoder network converts the input data from a high dimensional space to a feature space with a lower dimension, and the decoder network can reconstruct the input data from the feature space. Multiple auto-encoders can form a stacked AE, which can learn the high level features from the original features. SAE is one of the classic variants of learning relatively sparse features by penalizing hidden unit deviations. It improves the performance of traditional AEs and demonstrates more practical application. Our proposed SSAE not only learns high level features, but also controls the sparsity of features, which is more beneficial to the improvement of classification performance. The function of SSAE is denoted as:

$$ {\mathbf{J}}\left( {\varvec{W},\varvec{b}} \right) = \frac{1}{n}\mathop \sum \nolimits_{n = 1}^{N} \mathop \sum \nolimits_{f = 1}^{F} \left( {y_{nf} - x_{nf} } \right)^{2} + \beta_{1} \varvec{S}_{1} \left( {{\mathbf{w}}_{nf} } \right) + \beta_{2} \varvec{S}_{2} \left( \rho \right), $$
(5)
$$ \varvec{S}_{1} \left( {{\mathbf{w}}_{nf} } \right) = \frac{1}{2}\left\| {{\mathbf{w}}_{nf} } \right\|_{2} , $$
(6)
$$ \varvec{S}_{2} (\rho ||\rho_{k}^{\prime} ) = \mathop \sum \nolimits_{k = 1}^{K} \rho \log \frac{{\rho_{k}^{\prime} }}{\rho } + \left( {1 - \rho } \right)\log \frac{{1 - \rho_{k}^{\prime} }}{1 - \rho }. $$
(7)

The first item is the mean squared error. The second item is the \( \ell_{2} \) regularization part on encoding weights, \( \beta_{1} \) is the penalty coefficient of \( \ell_{2} \) regularization term. The third item is the sparsity constraint term, where \( \beta_{2} \) is the coefficient of the sparsity constraint term. \( \varvec{S}_{2} \left( \rho \right) \) is the Kullback-Leibler (KL) divergence. \( {\mathbf{X}} \in {\mathbb{R}}^{N \times F} \) denotes the input data, and \( {\mathbf{Y}} \in {\mathbb{R}}^{N \times D} \) denotes the reconstructed data. For N subjects, \( F \) and \( D \) represent the feature dimensions of the input data and the reconstructed data, respectively. \( {\mathbf{Z}} \in {\mathbb{R}}^{N \times K} \varvec{ } \) means the activation matrix of a hidden layer, which has K nodes. The weights \( {\mathbf{w}}_{1} \) and the bias \( {\mathbf{b}}_{1} \) are used to encode the input data \( {\mathbf{X}} \) as the activation matrix \( {\mathbf{Z}} \), where \( {\mathbf{Z}} = f\left( {{\mathbf{w}}_{1} {\mathbf{X}} + {\mathbf{b}}_{1} } \right) \). The weights \( {\mathbf{w}}_{2} \) and bias \( {\mathbf{b}}_{2} \varvec{ } \) decode activation matrix \( {\mathbf{Z}},\varvec{ } {\mathbf{Y}} = f\left( {{\mathbf{w}}_{2} {\mathbf{Z}} + {\mathbf{b}}_{2} } \right) \). \( \rho_{k}^{'} = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} \left[ {\varvec{Z}_{k} \left( {{\mathbf{x}}_{n} } \right)} \right] \) is the average activation of the k-th hidden node, and \( \rho \) is a constant. Then, the weights \( {\mathbf{w}} \) and bias \( {\mathbf{b}} \) are optimized by the scaled conjugate gradient descent algorithm. Therefore, the output \( {\mathbf{Z}} \) of each layer in AE is regarded as the input data of the next layer. In this paper, the \( \ell_{2} \) regularized SSAE contains two different AEs.

3 Experiments and Results

3.1 Experimental Setup

In this paper, we collect 180 subjects’ rs-fMRI data from local hospital, which contains 62 OCD patients, 53 OCD patients’ sibling and 65 normal control (NC) people. All data was collected from the Chinese Han population and was marked by two highly trained and experienced clinical psychiatrists and psychologists.

Since we have a small amount of data, the leave-one-out cross-validation (LOOCV) strategy is used to assess our proposed method. Specifically, given N subjects, one of them is left out for testing, and the rest N-1subjects are utilized for training. The hyperparameters in each method are empirically set by the greedy search to identify the optimal parameters. The three quantitative measurements are utilized to evaluate the diagnosis performance: accuracy (ACC), area under receiver operating characteristic curve (AUC) and sensitivity (SEN). The experiments are conducted using MATLAB 2018a to verify our proposed method. The SLEP and LibSVM toolboxes are used to construct sparse representation and classification, respectively.

3.2 Classification Performance

The experimental results are shown in Table 1 (boldfaces represent the best performance). The receiver operating characteristic (ROC) curves is shown in Fig. 2. To demonstrate the effectiveness of our proposed SSN approach, our BFCN construction method is compared to typical BFCN such as PC and GCS. Also, our classification results are compared to typical dimensionality reduction (DR) methods such as PCA, Lasso and SAE.

Table 1. Classification performance of all methods used in this study (%).
Fig. 2.
figure 2

The receiver operating characteristic curves of different tasks.

  • PCP: PCP uses PC for BFCN construction and PCA to reduce the features.

  • PCL: PCL uses PC to construct BFCN and Lasso for feature selection.

  • PCS: PCS utilizes PC to generate the BFCN and SAE to reduce the data dimension.

  • PCSS: PCSS uses PC to generate the BFCN and SSAE to reduce the data dimension.

  • GCSP: GCSP uses GCS to get BFCN and PCA to reduce the features.

  • GCSL: GCSL uses GCS to get BFCN and Lasso for feature selection.

  • GCSS: GCSS uses GCS to get BFCN and SAE to reduce the features.

  • GCSSS: GCSSS uses GCS to get the BFCN and SSAE to reduce the data dimension.

  • SSNP: SSNP uses SSN to get BFCN and PCA to reduce the features.

  • SSNL: SSNL utilizes SSN to get BFCN and Lasso for feature selection.

  • SSNS: SSNS uses SSN to get BFCN and SAE to reduce the data dimension.

  • SSNSS: SSNSS uses SSN to get BFCN and SSAE for feature dimension reduction.

Obviously, the SSNSS method achieves the best performance. In the classification task of OCD vs. NC, the highest accuracy of our SSNSS method is 88.82%, which is 6.30% higher than other methods. Similarly, for Sibling vs. NC task, our SSNSS model achieves an accuracy of 79.15%, which is 2.03% higher than other methods. For OCD vs. Sibling task, our SSNSS model obtains an accuracy of 79.48%, which is 1.74% higher than other methods. The above results demonstrate that our SSNSS model is effective and outperforms other competing methods.

We use the proposed method to build functional connectivity network of OCD, Sibling and NC. The feature maps of our BFCN are shown in Fig. 3. The image below is the whole brain BFCN, while the image above is a partial area magnified image. It can be seen that the network constructed by SSN is sparse.

Fig. 3.
figure 3

The feature map of BFCN of different groups.

The brain functional connectivity networks are shown in Fig. 4. Our method can clearly express the activity of the brain, which has been verified in the classification results. It can be clearly seen from the BFCN that there are some differences between OCD and NC. The OCD patients have similar characteristics to Siblings. The differences between OCD and NC brain activity status include frontal_sup_orb, hippocampus, caudate, putamen. These ROIs are similar to OCD related areas found by previous researchers [1, 10].

Fig. 4.
figure 4

The brain functional connectivity network of different groups.

4 Conclusions

In this paper, a novel method of diagnosing OCD and OCD’s sibling has been proposed, which integrates the merits of both traditional machine learning and deep learning techniques. Specifically, the SSN model based on sparse learning has been proposed to construct the BFCN, which not only can control the density of BFCN, but also take into account the similarity among subjects. The SSAE is used to learn high-level features to obtain discriminative features for final classification. In our future work, we will consider more modalities and constraints to enhance the disease diagnosis accuracy. Also, the dynamic and high-order BFCN can be incorporated into our framework to enhance the performance of the entire framework as well.