Multi-task Sparse Low-Rank Learning for Multi-classification of Parkinson’s Disease

Lei, Haijun; Zhao, Yujia; Lei, Baiying

doi:10.1007/978-3-030-00889-5_41

Haijun Lei³⁶,
Yujia Zhao³⁶ &
Baiying Lei³⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11045))

Included in the following conference series:

7199 Accesses

Abstract

Identifying prodromal stages of Parkinson’s disease (PD) draws increasing recognition as non-motor symptoms may appear before classical clinical diagnosis based on motor signs. To effectively develop a computer-aided diagnosis for multiple disease progression stages, neuroimaging has been widely applied for its convenience of revealing the intricate brain structure. However, the high dimensional neuroimaging features and limited sample size bring the main challenges for the diagnosis task. To handle it, a multi-task sparse low-rank learning framework is proposed to unveil the underlying relationships between input data and output targets by building a matrix-regularized feature network. Inductions of multiple tasks are simultaneously performed to capture intrinsic feature relatedness with multi-task learning. By discarding the irrelevant features and preserving the discriminative structured features, our proposed method can select the most relevant features and identify different stages of PD with different multi-classification models. Extensive experimental results on the Parkinson’s progression markers initiative (PPMI) dataset demonstrate that the proposed method achieves promising classification performance and outperforms the conventional algorithms.

You have full access to this open access chapter, Download conference paper PDF

Predicting Early Stages of Neurodegenerative Diseases via Multi-task Low-Rank Feature Learning

Template-Oriented Multi-task Sparse Low-Rank Learning for Parkinson’s Diseases Diagnosis

Unsupervised Feature Selection via Adaptive Embedding and Sparse Learning for Parkinson’s Disease Diagnosis

Keywords

1 Introduction

PD has gained increasing attention as the growing aging problem of the population. The chronic progression nature and imperceptible neuro-diminishment of PD make the treatment comparatively difficult [1]. There is suggestive evidence that olfaction changes, sleep behavior disorder, subtle cognitive changes and depression can be present at early PD stages, suggesting high potential of having PD [2]. Before the occurrence of motor symptoms permits the clinical diagnosis of PD, about or above 50% of the dopaminergic neurons of the substantia nigra have degenerated. The time span between the onset of neurodegeneration and manifestation of the typical motor symptoms is referred as prodromal phase of PD (PROD) [3]. The term SWEDD (scans without evidence for dopaminergic deficit) refers to the absence of an imaging abnormality in patients clinically presumed to have PD [4]. PROD and SWEDD are different disorders of PD, whose patients require targeted treatment. Therefore, early PD diagnosis offers timely prevention treatment of the patients.

Using the rich information of neuroimaging techniques, we can monitor the minor neuro changes, which are not easy to perceive in normal clinical symptom-based diagnosis. Common neuroimaging techniques include magnetic resonance imaging (MRI), diffusion-weighted tensor imaging (DTI). Recently, many machine learning methods have been applied to utilize the neuroimages in the computer-aided diagnosis of neurodegenerative disease. A robust feature-sample selection scheme was developed for PD diagnosis [5]. Due to the challenges of high dimensionality and limited sample size, the overfitting problem could be occurred in the data analysis. Recent studies have demonstrated that feature selection is capable of overcoming this issue. A l₁-regularizer (i.e., a sparse term) is introduced in the estimation model for feature selection when the sample size is significantly smaller than the feature dimension [6]. However, sparsity regularization is insufficient in multi-classification application since there are four progressive classification targets: normal control (NC), SWEDD, PROD and PD.

In fact, the relationship between input data (i.e., MRI images) and output targets (i.e., prediction results) have more to explore. Inspired by the fact that the brain is organized with modular structures, we intend to find the most representative features to train our multi-class classifiers by extracting the low-rank structure of the matrix-regularized feature network as well as its sparseness.

On the other hand, gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) are the most significant biomarkers in the brain which are later used as features. The conventional feature extraction methods apply a simple linear combination to use the three matters without considering their own contributing factor. We model this problem as a multi-task learning framework by proposing a model that efficiently leverages the multi-modal data [7]. Our model considers the multi-classification of disease stages using each modal as one task. We assume that these tasks are related and can benefit each other for the classification purpose. Then we perform the three tasks simultaneously to capture their intrinsic relatedness to achieve better classification performance.

Moreover, clinical symptoms have been considered as a vital indicator of PD diagnosis. The judgement results of clinicians are reflected on the clinical assessment scores for each potential PD patient. The combination of constructive information with the neuroimaging information provides sufficient information for computer-aided analytical diagnosis. For this reason, we propose a multi-task sparse low-rank learning (MSLRL) framework for multi-classification of PD. The proposed MSLRL framework combines the sparsity and low-rank constraints together for each task to select the most PD related features. To the best of our knowledge, this is the first work to introduce multi-task sparse low-rank learning to PD diagnosis using neuroimages. Experimental results demonstrate the prominent performance of our proposed method on the PPMI dataset.

2 Method

The proposed method intends to find a subset of features that are most related to PD. The multi-task sparse low-rank learning framework is shown in Fig. 1. We extract our feature input data from MRI images. In order to predict the accurate labels, we add a low-rank and sparse constraint to the matrix-regularized feature network and extract the respective weighted significance by clustering for each task. Each task applies the same feature selection method in a jointly multi-task framework. The shared weight matrix leads to the selected features with reduced dimensions to train a support vector machine (SVM) based classifiers.

Supposing that we have m subjects and each has n features belong to k tasks. In the linear regression model $ {\mathbf{Y}}^{\left( i \right)} = {\mathbf{X}}^{\left( i \right)} {\mathbf{W}}^{\left( i \right)} $, $ {\mathbf{Y}}^{\left( i \right)} \in {\mathbb{R}}^{m \times 1} $ is the ground truth label vector of i-th task, $ {\mathbf{X}}^{\left( i \right)} \in {\mathbb{R}}^{m \times n} $ is the input data matrix of i-th task, and $ {\mathbf{W}}^{\left( i \right)} \in {\mathbb{R}}^{n \times 1} $ is the weight coefficient matrix for each feature of i-th task. We can get $ {\mathbf{W}}^{\left( i \right)} $ by solving the following objective function:

$$ \text{min}_{{{\mathbf{W}}^{\left( i \right)} }} \left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{X}}^{\left( i \right)} {\mathbf{W}}^{\left( i \right)} } \right\|_{F}^{2} , $$

(1)

where $ \left\| {\mathbf{A}} \right\|_{F} $ is the Frobenius norm (F-norm) of A which is defined as $ \left\| {\mathbf{A}} \right\|_{F} = \sqrt {\sum\nolimits_{i} {\left\| {{\mathbf{A}}_{i} } \right\|}_{2}^{2} } $, where $ {\mathbf{A}}_{i} $ is the row vector. F-norm also known as the l₂-norm or the l₂-regularizer. Equation (1) is a simple and straightforward linear regression model without constraint on any variable. However, it does not consider the properties of weight matrix, which result in inferior performance. In most machine learning applications, over-fitting is a common problem when the data matrix is unbalanced. Especially in the field of neuroimaging-aided diagnosis, the brain images are rare, and yet they provide extensive information, leading to high dimensionality. A sparse term like l₁-regularizer is generally adopted to regulate the weight matrix by setting certain entries to zero for sparseness. Let $ \left\| {\mathbf{A}} \right\|_{1} $ be the l₁-norm of A and is defined as $ \left\| {\mathbf{A}} \right\|_{1} = \sum\nolimits_{i = 1}^{N} {\left| {{\mathbf{A}}_{i} } \right|,} $ we can formulate the objective function using sparse representation as:

$$ \text{min}_{{{\mathbf{W}}^{\left( i \right)} }} \left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{X}}^{\left( i \right)} {\mathbf{W}}^{(i)} } \right\|_{F}^{2} + \lambda \left\| {{\mathbf{W}}^{\left( i \right)} } \right\|_{1} , $$

(2)

Equation (2) selects the most representative features under the assumption of sparsity of $ {\mathbf{W}}^{\left( i \right)} $ and constraint of the first data-fitting term. In the model, we aim to find a weight matrix that represents the feature significance. We further explore the low-rank structure between features. It is well-known that, the brain is divided into different parts known as regions of interest (ROIs), we extract different features from these regions. Since PD is one category of neurodegenerative disease, it is influenced by a block of brain regions that are responsible for certain human actions or emotions. For this reason, we assume that a group of features are dependent on each other, leading to a low-rank structure of the coefficient weight matrix because certain rows are dependent. The sparse low-rank learning framework for each task is built on the assumption that, features are closely related with group of features while the relevance between these groups may be sparse. Multiple tasks share the same low-rank and sparse weight coefficients. Thus, the objective function for each task is reformulated as:

$$ \text{min}_{{{\mathbf{W}}^{\left( i \right)} }} \left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{X}}^{\left( i \right)} {\mathbf{W}}^{\left( i \right)} } \right\|_{F}^{2} +\uplambda_{1} \left\| {{\mathbf{W}}^{\left( i \right)} } \right\|_{1} +\uplambda_{2} rank\left( {{\mathbf{W}}^{\left( i \right)} } \right), $$

(3)

where $ rank\left( {{\mathbf{W}}^{\left( i \right)} } \right) $ is the rank function of $ {\mathbf{W}}^{\left( i \right)} $. Low-rank learning has been utilized in matrix recovery and network modeling. The weight matrix $ {\mathbf{W}}^{\left( i \right)} $ in Eq. (3) has dimension of n rows representing the respective feature significance. The rank minimization of $ {\mathbf{W}}^{\left( i \right)} $ explores the low-rank structure among features to obtain the intrinsic relationship. However, it is difficult to solve $ {\mathbf{W}}^{\left( i \right)} $ since the $ rank $ function is non-convex and the rank minimization is a NP-hard problem. Recently, researchers have proved that trace norm function is the convex envelop of the rank function over the domain $ \left\| {{\mathbf{W}}^{\left( i \right)} } \right\|_{2} \le 1 $, which provides the lowest bounds of the rank function $ rank $ [11]. The trace norm $ \left\| {\mathbf{W}} \right\|_{*} $ is defined as:

$$ \left\| {\mathbf{W}} \right\|_{*} = \sum\nolimits_{i = 1}^{{{ \hbox{min} }\left\{ {{\text{n}},\text{k}} \right\}}} {\sigma_{i} = Tr\left( {\left( {{\mathbf{W}}^{T} {\mathbf{W}}} \right)^{{\frac{1}{2}}} } \right),} $$

(4)

where $ \sigma_{i} $ is the i-th singular value of $ {\mathbf{W}} $ and can be obtained by singular value decomposition (SVD). Thus, we can establish the final objective function with a $ l_{1} $-norm $ \left\| {\mathbf{W}} \right\|_{1} $ and a trace norm $ \left\| {\mathbf{W}} \right\|_{*} $ as:

$$ \text{min}_{{\mathbf{W}}} \sum\nolimits_{i = 1}^{k} {\left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{W}}^{\left( i \right)} {\mathbf{X}}^{\left( i \right)} } \right\|_{F}^{2} + \alpha \left\| {\mathbf{W}} \right\|_{1} + \beta \left\| {\mathbf{W}} \right\|_{*} ,} $$

(5)

where $ \alpha $ and $ \beta $ are the parameters controlling the sparse degree and the low-rank degree, respectively. When $ \alpha = 0 $, Eq. (5) has only the low-rank constraint. When we add a $ l_{2} $-norm $ \left\| {\mathbf{W}} \right\|_{2} $ to Eq. (2), we can get the standard elastic net formulation. Moreover, if we change the $ l_{1} $-norm $ \left\| {\mathbf{W}} \right\|_{1} $ in Eq. (2) to $ l_{2,1} $-norm $ \left\| {\mathbf{W}} \right\|_{2,1} $, we can get the classic least absolute shrinkage and selection (LASSO).

For optimization for Eq. (5), we notice that, the $ l_{1 } $-norm and trace norm are non-differentiable. Thus, we solve $ {\mathbf{W}} $ using the proximal gradient descent method due to its effectiveness in solving $ l_{1} $-norm involved equations. Since we have three terms in Eq. (5), we update $ {\mathbf{W}} $ by the value of each term. First, we find the proximal operator of $ \alpha \left\| {\mathbf{W}} \right\|_{1} $ according to:

$$ prox_{{\alpha \left\| \cdot \right\|_{1} }} \left( {\mathbf{W}} \right) = \left[ {sign\left( {w_{ij} } \right) \cdot { \hbox{max} }\left\{ {\left| {w_{ij} } \right| - \alpha ,0} \right\}} \right]_{n \times k} , $$

(6)

where $ prox\left( \right) $ denotes the proximal operator and $ sign\left( \right) $ is the sign function. Similarly, we can obtain the proximal operator of $ \beta \left\| {\mathbf{W}} \right\|_{*} $ using:

$$ prox_{{\upbeta\left\| \cdot \right\|_{ *} }} \left( {\mathbf{W}} \right) = {\text{U}}diag\left( {\text{max}\left\{ {\widehat{\sigma }_{1} ,0} \right\}, \cdots ,\hbox{max} \left\{ {\widehat{\sigma }_{l} ,0} \right\}} \right){\text{V}}^{\text{T}} , $$

(7)

where U is the unitary matrix in the SVD of $ {\mathbf{W}} $ so that $ {\mathbf{W}} = {\text{U}}diag\left( {\sigma_{1} , \cdots ,\sigma_{l} } \right){\text{V}}^{\text{T}} $ with $ \widehat{\sigma }_{i} = \sigma_{i} - \beta $ and $ l = { \hbox{min} }\left\{ {n,k} \right\} $. Then, we consider the first data-fitting term $ \left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{W}}^{\left( i \right)} {\mathbf{X}}^{\left( i \right)} } \right\|_{F}^{2} $. Given $ f_{1} \left( {{\mathbf{W}}^{\left( i \right)} } \right) = \left\| {{\mathbf{Y}}^{\left( i \right)} - {\mathbf{W}}^{\left( i \right)} {\mathbf{X}}^{\left( i \right)} } \right\|_{F}^{2} $, we can get the derivative of $ {\mathbf{W}}^{\left( i \right)} $ as $ \nabla f\left( {{\mathbf{W}}^{\left( i \right)} } \right) = {\mathbf{X}}^{{\left( i \right){\text{T}}}} {\mathbf{X}}^{\left( i \right)} {\mathbf{W}}^{\left( i \right)} - {\mathbf{X}}^{{\left( i \right){\text{T}}}} {\mathbf{Y}}^{\left( i \right)} $. Consequently, we can solve W by iteratively updating the values until convergence.

3 Experiments

3.1 Experimental Settings

We validate our method by classifying different stages of PD subjects. We choose SVM classifiers to construct a multi-class classification model for its efficiency in separating different class samples with the maximum margin [8]. Another classifier we apply is the capped $ l_{p} $-norm SVM [9]. This upgraded classifier can deal with both light and heavy outliers, boosting classification performance. The main parameters used are $ \upalpha $ and $ \upbeta $ in Eq. (5), where $ \upalpha $ controls the sparse term $ \left\| {\mathbf{W}} \right\|_{1} $ and $ \upbeta $ controls the low-rank term $ \left\| {\mathbf{W}} \right\|_{ *} $, respectively. The initial values are set as $ \alpha \in \left\{ {2^{ - 5} , \ldots ,2^{5} } \right\} $, $ \upbeta \in \left\{ {2^{ - 5} , \ldots ,2^{5} } \right\} $. The fine-tuned parameter values are specified by a 5-fold cross-validation strategy. The results are evaluated using: accuracy (ACC), sensitivity (SEN), specificity (SPEC), and area under the receiver operating characteristic curve (AUC). For fair evaluation, the classification performance of the proposed method is evaluated via a 10-fold cross-validation strategy.

3.2 Data Preprocessing

The data used in this experiment are MRI images from the PPMI dataset. All the original images are preprocessed by the anterior commissure-posterior commissure correction and skull-stripping for later operation. Then we segment the images into GM, WM, and CSF using Statistical Parametric Mapping (SPM) [10]. Following the automated anatomical labeling atlas which parcel brain into 116 regions, we compute the mean tissue density value of each region as features. In this work, we collect 643 subjects (127 NC, 380 PD, 56 SWEDD and 34 PROD). For each subject, the feature dimension is 116 for each tissue modal (116 GM, 116 WM, 116 CSF). Apart from these features, we also collect four clinical scores, namely, sleep scores, olfaction scores, depression scores, and Montreal cognitive assessment scores as features. Theses clinical scores are the clinical assessment results from the clinicians’ experience and diagnosis. With the guidance of these clinical scores as features, we can build a more reliable classification model.

3.3 Classification Performance

To further validate the effectiveness of our MSLRL method, we compare the method with other similar methods. Apart from the elastic net and LASSO methods, we further compare MSLRL with another two sparsity-based methods. One is multi-modal multi-task (M3T) [11] and the other is joint sparse learning [12]. Furthermore, we additionally compare MSLRL with low-rank learning and sparse learning and sparse low-rank learning (SLRL). The classification performance results are summarized in Table 1. It is clear that, the MSLRL method achieves higher accuracy than classical Elastic net and LASSO as well as sparse-based M3T and joint sparse learning using both SVM classifiers. SLRL turns out to be more effective than low-rank learning and sparse learning, which validates the strategy of combining $ l_{1} $-norm $ \left\| {\mathbf{W}} \right\|_{1} $ and trace norm $ \left\| {\mathbf{W}} \right\|_{ *} $ using sparsity and low-rank structure. MSLRL outperforming SLRL in both classifiers, which proves that multi-task learning successfully explores the intrinsic relation within multi-modal features. Receiver operating characteristic curves (ROC) for algorithm comparison are shown in Fig. 2. MSLRL obtains the best performance in all competing methods in each classifier, which shows the advantage and potential for early PD diagnosis.

Table 1. Classification performance of all competing methods with different classifiers.

Full size table

3.4 Most Distinctive Brain Regions

The identification of PD-related features and the monitoring of progression are of great significance in early diagnosis. We utilize the weight coefficient matrix generated in feature selection to study the discriminative brain regions most related to PD. The regions most related with PD are visualized in Fig. 3. The selected brain regions are slightly different in two methods. The higher relevance of MSLRL than SLRL reveals that MSLRL is more effective than SLRI for PD diagnosis. These distinctive brain regions can be further investigated for clinical practice.

4 Conclusion

In this paper, we introduce a multi-task sparse low-rank learning framework for early PD diagnosis between four progression stages. Specifically, for each task we add the sparsity and low-rank regularization to the weight coefficients with a $ l_{1} $-norm and a trace norm to unveil the underlying relationships within data. By exploring the intrinsic relationships between multiple tasks, this framework can select the most representative features by jointly considering the dimension reduction of neuroimaging feature vectors and the relevant dependency properties of PD-related brain region features. Using multi-modal data from PPMI neuroimaging dataset, experiments demonstrate that our method has the best multi-class classification results among all the traditional methods.

References

Simons, J.A., Fietzek, U.M., Waldmann, A., Warnecke, T., Schuster, T., Ceballos-Baumann, A.O.: Development and validation of a new screening questionnaire for dysphagia in early stages of Parkinson’s disease. Park. Relat. Disord. 20(9), 992–998 (2014)
Article Google Scholar
Postuma, R.B., et al.: Identifying prodromal Parkinson’s disease: pre-motor disorders in Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord. Soc. 27(5), 617–626 (2012)
Article Google Scholar
Gaenslen, A., Swid, I., Liepelt-Scarfone, I., Godau, J., Berg, D.: The patients’ perception of prodromal symptoms before the initial diagnosis of Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord. Soc. 26(4), 653–658 (2011)
Article Google Scholar
Erro, R., Schneider, S.A., Quinn, N.P., Bhatia, K.P.: What do patients with scans without evidence of dopaminergic deficit (SWEDD) have? New evidence and continuing controversies. J. Neurol. Neurosurg. Psychiatry (2015)
Google Scholar
Adeli, E., et al.: Joint feature-sample selection and robust diagnosis of Parkinson’s disease from MRI data. NeuroImage 141, 206–219 (2016)
Article Google Scholar
Peng, J., An, L., Zhu, X., Jin, Y., Shen, D.: Structured sparse kernel learning for imaging genetics based alzheimer’s disease diagnosis. In: MICCAI, pp. 70–78 (2016)
Google Scholar
Zhou, J., Chen, J., Ye, J.: Multi-task learning: theory, algorithms, and applications. SDM Tutor. (2012)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Nie, F., Wang, X., Huang, H.: Multiclass capped p-Norm SVM for robust classifications. In: AAAI, pp. 2415–2417 (2017)
Google Scholar
Friston, K.J.: Statistical parametric mapping (1994)
Google Scholar
Zhang, D., Shen, D.: Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Neuroimage 59(2), 895–907 (2012)
Article MathSciNet Google Scholar
Lei, H., et al.: Joint detection and clinical score prediction in Parkinson’s disease via multi-modal sparse learning. Expert Syst. Appl. 80(1), 284–296 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Guangdong Province Key Laboratory of Popular High-Performance Computers, Shenzhen University, Shenzhen, China
Haijun Lei & Yujia Zhao
National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
Baiying Lei

Authors

Haijun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yujia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Baiying Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haijun Lei .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
University of Adelaide, Adelaide, SA, Australia
Gustavo Carneiro
IBM Research – Almaden, San Jose, CA, USA
Tanveer Syeda-Mahmood
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Porto, Porto, Portugal
João Manuel R.S. Tavares
Queensland University of Technology, Brisbane, QLD, Australia
Andrew Bradley
Universidade Estadual Paulista, Bauru, São Paulo, Brazil
João Paulo Papa
OSRAM (Germany), Garching b. München, Germany
Vasileios Belagiannis
University of Lisbon, Lisboa, Portugal
Jacinto C. Nascimento
ReFUEL4, Singapore, Singapore
Zhi Lu
German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
Sailesh Conjeti
IBM Research – Almaden, San Jose, CA, USA
Mehdi Moradi
Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Case Western Reserve University, Cleveland, OH, USA
Anant Madabhushi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, H., Zhao, Y., Lei, B. (2018). Multi-task Sparse Low-Rank Learning for Multi-classification of Parkinson’s Disease. In: Stoyanov, D., et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2018 2018. Lecture Notes in Computer Science(), vol 11045. Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-00889-5_41
Published: 20 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00888-8
Online ISBN: 978-3-030-00889-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics