Skip to main content

Advertisement

Log in

Machine Learning EEG to Predict Cognitive Functioning and Processing Speed Over a 2-Year Period in Multiple Sclerosis Patients and Controls

  • Original Paper
  • Published:
Brain Topography Aims and scope Submit manuscript

Abstract

Event-related potentials (ERPs) show promise to be objective indicators of cognitive functioning. The aim of the study was to examine if ERPs recorded during an oddball task would predict cognitive functioning and information processing speed in Multiple Sclerosis (MS) patients and controls at the individual level. Seventy-eight participants (35 MS patients, 43 healthy age-matched controls) completed visual and auditory 2- and 3-stimulus oddball tasks with 128-channel EEG, and a neuropsychological battery, at baseline (month 0) and at Months 13 and 26. ERPs from 0 to 700 ms and across the whole scalp were transformed into 1728 individual spatio-temporal datapoints per participant. A machine learning method that included penalized linear regression used the entire spatio-temporal ERP to predict composite scores of both cognitive functioning and processing speed at baseline (month 0), and months 13 and 26. The results showed ERPs during the visual oddball tasks could predict cognitive functioning and information processing speed at baseline and a year later in a sample of MS patients and healthy controls. In contrast, ERPs during auditory tasks were not predictive of cognitive performance. These objective neurophysiological indicators of cognitive functioning and processing speed, and machine learning methods that can interrogate high-dimensional data, show promise in outcome prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

Download references

Funding

This study was partly funded by an Enterprise Ireland (eBiomed: eHealthCare based on Biomedical Signal Processing and ICT for Integrated Diagnosis and Treatment of Disease), a Science Foundation Ireland grant to R.B. Reilly (09/RFP/NE2382), an IRCSET grants to H. Kiiski and S. Ó. Donnchadha (http://www.ircset.ie), a Health Service Executive funding to M.C. O’Brien and a Science Foundation Ireland grant to R. Whelan (16/ERCD/3797). The study sponsors had no involvement in the collection, analysis and interpretation of data and in the writing of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Whelan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (the Ethics and Medical Research Committee of the St. Vincent’s Healthcare Group) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Written informed consent was obtained from all individual participants included in the study on each testing occasion.

Additional information

Handling Editor: Stefano Seri.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Fig. 1

Cognitive functioning composite score (z) in RRMS, SPMS and control participants. (TIFF 312 KB)

Supplementary Fig. 2

Processing speed and working memory composite score (z) in RRMS, SPMS and control participants. (TIFF 323 KB)

Supplementary material 3 (EEGJOB 7 KB)

Supplementary material 4 (DOCX 13 KB)

Supplementary material 5 (DOCX 15 KB)

Supplementary material 6 (DOCX 12 KB)

Supplementary material 7 (DOCX 15 KB)

Supplementary Video 1 ERP activity over the scalp during visual 2-stimulus oddball task (0-700ms) that predicted cognitive functioning at Month 0. Higher beta choice frequency values denote better accuracy in predicting cognitive functioning score. (AVI 2023 KB)

Supplementary Video 2 ERP activity over the scalp during visual 2-stimulus oddball task (0-700ms) that predicted cognitive functioning at Month 13. Higher beta choice frequency values denote better accuracy in predicting cognitive functioning score. (AVI 2109 KB)

Supplementary Video 3 ERP activity over the scalp during visual 3-stimulus oddball task (0-700ms) that predicted cognitive functioning at Month 13. Higher beta choice frequency values denote better accuracy in predicting cognitive functioning score. (AVI 1947 KB)

Supplementary Video 4 ERP activity over the scalp during visual 2-stimulus oddball task (0-700ms) that predicted processing speed and working memory performance at Month 0. Higher beta choice frequency values denote better accuracy in predicting processing speed and working memory score. (AVI 1935 KB)

Supplementary Video 5 ERP activity over the scalp during visual 2-stimulus oddball task (0-700ms) that predicted processing speed and working memory performance at Month 13. Higher beta choice frequency values denote better accuracy in predicting processing speed and working memory score. (AVI 1990 KB)

Supplementary Video 6 ERP activity (µV) in multiple sclerosis and healthy control participants during visual 2-stimulus oddball task at Month 0. (AVI 1574 KB)

Appendix

Appendix

The Regularized Adaptive Feature Thresholding Algorithm (RAFT)

Nested Cross-Validation

The dataset is initially divided into ten cross-validation (CV) folds. The entire analysis is performed ten times, using 90% of the dataset (the training set) to create a regression model which is then tested on the remaining 10% of the data (the test set). Within the training set, additional ‘nested’ cross-validation with ten partitions is used to support the analyses at the feature selection and model optimisation level. Results from all ten CV folds are finally aggregated, using the frequency with which a variable is found in models from different CV folds as a measure of its robustness.

Threshold Creation

Each feature of the dataset is individually evaluated to assess its utility in predicting the continuous target variable, in what can be considered a filtering step. A simple linear regression model is applied to the training set (81% of the data) for that feature and the target variable, and the resulting regression weight is used to make outcome predictions for the test set (9% of the data). A root mean squared error criterion (mse) is used to quantify the prediction error.

$$mse=\sqrt {mean\,{{(truth - prediction)}^2}}$$

Based on the range of mse values estimated across all CV folds, a set of ten prediction error thresholds is created. These thresholds rank features according to their mse. At each prediction error threshold (t mse ) and for each nested CV partition n in every main CV fold m there is a subset of features f which have smaller prediction error values than that threshold.

$${s}_{m,n}\left({t}_{mse}\right)=\left\{f \right| {mse}_{m,n}\left(f\right)<{t}_{mse}\}$$

A set of ten stability thresholds (tstability) is also used to assess how stable, over samples, the mse value assigned to each feature is. This is quantified as the number of nested CV partitions n in which a feature had a prediction error value lower than each mse threshold.

$${Stability}_{m, {t}_{mse}}\left(f\right)= \left|\left\{n \right|f\in {s}_{m,n}\left({t}_{mse}\right)\} \right|$$

The ten prediction error thresholds and ten stability thresholds jointly define 100 new summary datasets, which include all features that had a smaller mse value than tmse in the number of CV partitions specified by tstability.

$${D}_{m}\left({t}_{mse}, { t}_{stability}\right)=\left\{f \right| {stability}_{m,{t}_{mse}}\left(f\right)\ge {t}_{stability}\}$$

The prediction error thresholds are chosen based on the range of prediction error and feature stability across the sample. The most liberal prediction error threshold t mse (max) is chosen such that in each main CV fold m there is at least one feature which is common to all nested CV partitions, i.e. t mse (max) is the smallest prediction error value at which the following is true:

$${\left|\right\{m \left|{ D}_{m}\right(t}_{mse}\left(max\right),10)\ne \varnothing \}|=10$$

The strictest prediction error threshold t mse (min) is set as the lowest possible prediction error value at which every nested CV partition n in every main CV fold still contains at least one feature that has a smaller prediction error value than that threshold. That is, t mse (min) is the smallest prediction error value at which the following is true:

$$\left|{\left\{(m,n) \right| s}_{m,n}\right({t}_{mse}\left(min\right))\ne \varnothing \}|=100$$

Taken together t mse and t stability define how high the individual predictive power of each feature in the knowledge base is, and how stable results with each feature are across subsets of the sample. The creation of these thresholds serves the purpose of integrating the choice of the criterion used to select features from the filtering step into model selection, eliminating researcher input at this point.

Model Optimisation

The feature sets which are created in the first analysis step are used as inputs into a model optimisation algorithm. We chose to use Elastic Net regularisation (Zou and Hastie 2005), but the feature selection framework can be adapted for use with other optimisation algorithms. The Elastic Net uses two parameters: λ and α. Alpha represents the weight of lasso vs. ridge regularisation which the Elastic Net uses, and λ is the regularization coefficient. Both Lasso and Ridge regression apply a penalty for large regression coefficient values, but Lasso regularization favours models with fewer features, making it more prone to excluding features. Eight values of each are considered, resulting in 64 models being built with each feature set, resulting in a total of 6400 models. For each of these models the prediction error is measured using the same root mean squared error criterion which was used to assess the feature mse. For each model an updated feature set is saved, which excludes any features that were excluded by the Elastic Net.

$${d}_{m,n}\left({t}_{mse}, { t}_{stability}, \alpha , \lambda \right)\subseteq {D}_{m}\left({t}_{mse}, { t}_{stability}\right)$$

Bootstrap Aggregation

Calculations in the thresholding and model optimisation step are validated using 25-fold bootstrap aggregation (bagging). Instead of performing the analysis once using all data, summary datasets are created by randomly sampling on average two-thirds of the data in each iteration. Results from each iteration are aggregated using the median value. A feature was removed after the model optimization step if the Elastic Net removed the feature in more than half of all bagging iterations. As bagging greatly increases the computational expense we also tested the method without bagging.

Model Validation

After the model optimisation step, the combination of model parameters and thresholds which resulted in the model with the lowest prediction error is identified for each nested CV partition. The optimal model parameters and thresholds from each nested CV partition are used to identify what parameters will be used to create the final prediction model in each main CV fold, using the most frequently occurring values of α, λ, t mse and t stability . To select the features to include in the final model for each CV fold, the stability of all features that were included in the updated feature sets at the optimal prediction error and stability thresholds is re-calculated.

$${Stability}_{m}\left(f\right)= \left|\left\{n \right|f\in {d}_{m,n}\left({t}_{mse}, { t}_{stability}, \alpha , \lambda \right)\} \right|$$

Only the features that were included in at least as many of the ten models with optimal parameters as specified by the optimal stability threshold are used to create the feature set for the final model.

$${FeatureSet}_{m}=\left\{f \right| {stability}_{m}\left(f\right)\ge {t}_{stability}\}$$

It is possible that this implementation of the stability threshold does not leave any features for inclusion in the model. Should this be the case the closest possible parameter combination is used to create the feature set.

This feature set and the covariates are used as input into the Elastic Net, using the optimal values for α and λ, and the entire training set (90% of the data). The beta weights generated by the Elastic Net are subsequently used to make outcome prediction for the final unseen portion of the data (10%). Each CV fold is used to make outcome predictions for 10% of the data, and the evaluation of model fit is carried out using the complete vector of outcome predictions from all CV folds.

Performing model selection as an integrated step within the CV framework is an essential step in preserving the external validity of the resultant model (Cawley and Talbot 2010).

See Fig. 8.

Fig. 8
figure 8

Schematic description of the RAFT algorithm

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kiiski, H., Jollans, L., Donnchadha, S.Ó. et al. Machine Learning EEG to Predict Cognitive Functioning and Processing Speed Over a 2-Year Period in Multiple Sclerosis Patients and Controls. Brain Topogr 31, 346–363 (2018). https://doi.org/10.1007/s10548-018-0620-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10548-018-0620-4

Keywords

Navigation