Skip to main content

Advertisement

Log in

Disentangling Multiple Sclerosis and depression: an adjusted depression screening score for patient-centered care

  • Published:
Journal of Behavioral Medicine Aims and scope Submit manuscript

An Erratum to this article was published on 28 September 2015

Abstract

Screening for depression can be challenging in Multiple Sclerosis (MS) patients due to the overlap of depressive symptoms with other symptoms, such as fatigue, cognitive impairment and functional impairment, for MS patients. The aim of this study was to understand these overlapping symptoms and subsequently develop an adjusted depression screening tool for better clinical assessment of depressive symptoms in MS patients. We evaluated 3,507 MS patients with a self-reported depression screening (PHQ-9) score using a multiple indicator multiple cause modeling approach. Our models showed significant differential item functioning effects denoting significant overlap of depressive symptoms with all MS symptoms under study and good model fit. The magnitude of the overlap was especially large for fatigue. Adjusted depression screening scales were formed based on factor scores and loadings that will allow clinicians to understand the depressive symptoms separate from other symptoms for MS patients for improved patient care.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aikens, J. E., Reinecke, M. A., Pliskin, N. H., Fischer, J. S., Wiebe, J. S., McCracken, L. M., et al. (1999). Assessing depressive symptoms in multiple sclerosis: Is it necessary to omit items from the original Beck Depression Inventory? Journal of Behavioral Medicine, 22, 127–142.

    Article  CAS  PubMed  Google Scholar 

  • Alemayehu, D., Cappelleri, J. C., & Murphy, M. F. (2012). Conceptual and analytical considerations toward the use of patient-reported outcomes in personalized medicine. American Health & Drug Benefits, 5, 310–317.

    Google Scholar 

  • Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438.

    Article  Google Scholar 

  • Benedict, R. H., Fishman, I., McClellan, M. M., Bakshi, R., & Weinstock-Guttman, B. (2003). Validity of the beck depression inventory-fast screen in multiple sclerosis. Multiple Sclerosis, 9, 393–396.

    Article  CAS  PubMed  Google Scholar 

  • Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.

    Article  CAS  PubMed  Google Scholar 

  • Blacker, D. (2009). Psychiatric rating scales. In B. J. Sadock, V. A. Sadock, & P. Ruiz (Eds.), Kaplan and Sadock’s comprehensive textbook of psychiatry (9th ed.). Philadelphia, PA: Lippincott Williams & Wilkins.

    Google Scholar 

  • Bollen, K. A. (1989). Structural equations with latent variables. Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Bollen, K. A., & Long, J. S. (1993). Testing structural equation models. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Brown, T. (2006). Confirmatory factor analysis for applied research (methodology in the social science). New York, NY: The Guilford Press.

    Google Scholar 

  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Casella, G., & Berger, R. L. (2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury Press.

    Google Scholar 

  • Chang, C. H., Nyenhuis, D. L., Cella, D., Luchetta, T., Dineen, K., & Reder, A. T. (2003). Psychometric evaluation of the Chicago Multiscale Depression Inventory in multiple sclerosis patients. Multiple Sclerosis, 9, 160–170.

    Article  PubMed  Google Scholar 

  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

    Article  CAS  PubMed  Google Scholar 

  • Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10, 1–9.

    Google Scholar 

  • Crawford, P. W. (2009). Assessment of depression in multiple sclerosis validity of including somatic items on the beck depression inventory-II. International Journal of MS Care, 11, 167–173.

    Article  Google Scholar 

  • Ferrando, S. J., Samton, J., Mor, N., Nicora, S., Findler, M., & Apatoff, B. (2007). Patient health questionnaire-9 to screen for depression in outpatients with multiple sclerosis. International Journal of MS Care, 9, 99–103.

    Article  Google Scholar 

  • Fischer, J. S., Rudick, R. A., Cutter, G. R., & Reingold, S. C. (1999). The multiple sclerosis functional composite measure (MSFC): An integrated approach to MS clinical outcome assessment. Multiple Sclerosis, 5, 244–250.

    Article  CAS  PubMed  Google Scholar 

  • Gilbody, S., Richards, D., Brealey, S., & Hewitt, C. (2007). Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): A diagnostic meta-analysis. Journal of General Internal Medicine, 11, 596–602.

    Google Scholar 

  • Goldman Consensus Panel. (2005). The Goldman Consensus statement on depression in multiple sclerosis. Multiple Sclerosis, 11, 328–337.

    Article  Google Scholar 

  • Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 56–62.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hansson, M., Chotai, J., Nordstöm, A., & Bodlund, O. (2009). Comparison of two self-rating scales to detect depression: HADS and PHQ-9. British Journal of General Practice, 59, e283–e288.

    Article  PubMed Central  PubMed  Google Scholar 

  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.

    Article  CAS  PubMed  Google Scholar 

  • Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.

    Article  Google Scholar 

  • Huang, F. Y., Chung, H., Kroenke, K., Delucchi, K. L., & Spitzer, R. L. (2006). Using the Patient Health Questionnaire-9 to measure depression among racially and ethnically diverse primary care patients. Journal of General Internal Medicine, 21, 547–552.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Johnson, D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 48, 398–407.

    Article  Google Scholar 

  • Kalpakjian, C. Z., Toussaint, L. L., Albright, K. J., Bombardier, C. H., Krause, J. K., & Tate, D. G. (2009). Patient Health Questionnaire-9 in spinal cord injury: An examination of factor structure as related to gender. Journal of Spinal Cord Medicine, 32, 147–156.

    PubMed Central  PubMed  Google Scholar 

  • Kline, R. B. (2010). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford.

    Google Scholar 

  • Knowledge Program developed at Cleveland Clinic’s Neurological Institute. (2008–2013). Retrieved from, http://my.clevelandclinic.org/neurological_institute/about/default.aspx

  • Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606–613.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kroenke, K., Spitzer, R. L., & Williams, J. B. (2003). The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care, 41, 1284–1292.

    Article  PubMed  Google Scholar 

  • Krupp, L. B. (2004). Fatigue in multiple sclerosis. New York, NY: Demos Medical Publishing.

    Google Scholar 

  • Marrie, R. A., & Goldman, M. (2007). Validity of performance scales for disability assessment in multiple sclerosis. Multiple Sclerosis, 13, 1176–1182.

    Article  CAS  PubMed  Google Scholar 

  • Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling A Multidisciplinary Journal, 11, 320–341.

    Article  Google Scholar 

  • Mellen Center for Multiple Sclerosis Treatment and Research, Cleveland Clinic, Neurological Institute. (2013). Retrieved from, http://my.clevelandclinic.org/neurological_institute/mellen-center-multiple-sclerosis/default.aspx

  • Mohr, D. C., Goodkin, D. E., Likosky, W., Beutler, L., Gatto, N., & Langan, M. K. (1997). Identification of Beck Depression Inventory items related to multiple sclerosis. Journal of Behavioral Medicine, 20, 407–414.

    Article  CAS  PubMed  Google Scholar 

  • Mohr, D. C., Hart, S. L., & Goldberg, A. (2003). Effects of treatment for depression on fatigue in multiple sclerosis. Psychosomatic Medicine, 65, 542–547.

    Article  PubMed  Google Scholar 

  • Multiple Sclerosis Association of America. (2014). Retrieved from, http://www.mymsaa.org/about-ms/faq/

  • Muthén, B., & Muthén, L. (2000). Integrating person-centered and variable-centered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism, Clinical and Experimental Research, 24, 882–891.

    Article  PubMed  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.

    Google Scholar 

  • Pinto-Meza, A., Serrano-Blanco, A., et al. (2005). Assessing depression in primary care with the PHQ-9: Can it be carried out over the telephone? Journal of General Internal Medicine, 20, 738–742.

    Article  PubMed Central  PubMed  Google Scholar 

  • Polman, C. H., & Rudick, R. A. (2010). The multiple sclerosis functional composite: A clinically meaningful measure of disability. Neurology, 74, S8–S15.

    Article  PubMed  Google Scholar 

  • R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Raykov, T., & Marcoulides, G. (2011). Introduction to psychometric theory. New York, NY: Taylor and Francis Group.

    Google Scholar 

  • Rudick, R. A., Antel, J., Confavreux, C., Cutter, G., Ellison, G., Fischer, J., et al. (1996). Clinical outcomes assessment in multiple sclerosis. Annals of Neurology, 40, 469–479.

    Article  CAS  PubMed  Google Scholar 

  • SAS Institute Inc. (2008). SAS/STAT ® 9.2 user’s guide. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Schwartz, C. E., Vollmer, T., et al. (1999). Reliability and validity of two self-report measures of impairment and disability for MS. Neurology, 52, 63–70.

    Article  CAS  PubMed  Google Scholar 

  • Siegert, R. J., & Abernethy, D. A. (2005). Depression in multiple sclerosis: A review. Journal of Neurology, Neurosurgery and Psychiatry, 76, 469–475.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sjonnesen, K., Berzins, S., Fiest, K. M., Bulloch, A. G., Metz, L. M., Thombs, B. D., et al. (2012). Evaluation of the 9-item Patient Health Questionnaire (PHQ-9) as an assessment instrument for symptoms of depression in patients with multiple sclerosis. Postgraduate Medicine, 124, 69–77.

    Article  PubMed  Google Scholar 

  • Sudhahar, J. C., Israel, D., & Selvam, M. (2006). Banking service loyalty determination through SEM technique. Journal of Applied Sciences, 6, 1472–1480.

    Article  Google Scholar 

  • Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.

    Article  Google Scholar 

  • Wallin, M. T., Wilken, J. A., Turner, A. P., Williams, R. M., & Kane, R. (2006). Depression and multiple sclerosis: Review of a lethal combination. Journal of Rehabilitation Research and Development, 43, 45–62.

    Article  PubMed  Google Scholar 

  • Whitaker, J. N., McFarland, H. F., Rudge, P., & Reingold, S. C. (1995). Outcomes assessment in multiple sclerosis clinical trials: A critical analysis. Multiple Sclerosis, 1, 37–47.

    CAS  PubMed  Google Scholar 

  • Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-Model DIF testing with the schedule for nonadaptive and adaptive personality. Journal of Psychopathological and Behavioral Assessment, 31, 320–330.

    Article  Google Scholar 

  • Zumbo, B. D., & Zimmerman, D. W. (1993). Is the selection of statistical methods governed by level of measurement? Canadian Psychology, 34, 390–400.

    Article  Google Scholar 

Download references

Acknowledgments

Financial support for this study was provided by a Grant from NIH/NCRR CTSA KL2TR000440 and by a Grant from Novartis. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. We appreciate the contributions from Drs. Randall Cebul, Thomas Love, Irene Katzan, Neal Dawson, Center for Health Care Research and Policy, Drs. Richard Rudick, and Francois Bethoux, Mellen Center, and Dr. Martha Sajatovic, Departments of Psychiatry and Neurology at Case Western Reserve University School of Medicine.

Conflict of interest

Douglas Gunzler, Adam Perzynski, Nathan Morris, Steven Lewis and Deborah Miller declare that they have no conflict of interest. Robert Bermel has received research grants from Novartis.

Human and Animal Rights and Informed Consent

All procedures followed were in accordance with ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2000. Informed consent was obtained from all patients for being included in the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas D. Gunzler.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 33 kb)

Appendix: More details about adjusted depression screening scale algorithms

Appendix: More details about adjusted depression screening scale algorithms

Factor scores

Factor scores include adjustment for the effect of both CFA as in Fig. 1 Panel C, since the nine PHQ-9 items load unequally on depression, and the adjustment of overlapping symptoms via covariates and DIF paths from our E = 0 model in Fig. 1 Panel F. Using MPlus, we calculate factor scores within our sample using the maximum a posteriori method (Muthén and Muthén, 2012), which for continuous outcomes is the widely used regression method: mean = 0, variance = 0.56, median = − 0.21, range = (−0.84, 2.47), 1st quartile = −0.62, and 3rd quartile = 0.40. These factor scores can then be linearly transformed to fit any potentially useful range (i.e. 0–27, 0–100, etc.)

In our application, and for immediate clinical use, a probability integral transformation (Casella and Berger, 2002) is used to transform these factors scores into the PHQ-9 scores within this population to maintain the interpretation and thresholds of the PHQ-9. This is a one-to-one transformation, so the distribution of PHQ-9 scores in this population will remain the same, just that certain individuals will be matched to different PHQ-9 scores. This transformation is built on continuous cumulative distribution functions and thus there are no special steps required for tied adjusted scores between two or more individuals (unlikely for factor scores). To perform the transformation:

  1. 1.

    Use a kernel density estimator to estimate the density of the PHQ-9 scores in this population.

  2. 2.

    Obtain the cumulative distribution function by numerical integration.

  3. 3.

    Numerically invert the cumulative distribution function for the PHQ-9.

  4. 4.

    Repeat steps (1) and (2) for the factor scores.

  5. 5.

    Transform the factor scores of each individual into a PHQ-9 score for this population by transforming the cumulative distribution function for the factor scores using the inverse of the cumulative distribution function for the PHQ-9.

The resulting factor scores will be distributed the same as the PHQ-9 within this population. For ease of use in a clinical setting, the recommendation is to round these transformed scores off to the nearest whole number.

Note, within EHR-based database, every time a new or series of new PHQ-9 scores is entered, we can compute the factor scores. Then, either all subjects can be updated or we can keep prior records the same, and just use the functions to transform the new scores only. Using this approach, only 44 % of subjects maintain the same transformed adjusted score as the original PHQ-9 score, with 16 % of subjects scoring a transformed adjusted score of two points or more (with three subjects up to 6 points) different than the original PHQ-9 score. Further, 74 subjects had a PHQ-9 score ≥10 and now have a transformed adjusted score <10, while 65 subjects had a PHQ-9 score <10 and now have a transformed adjusted score ≥10.

Item weights

Compared to this algorithm in “Item weights” based on factor loadings, the factor scores algorithm in “Factor scores” results in a more mathematically rigorous individualized score, taking into account item loadings, intercepts, correlation among observed variables, and a predictive equation. However, in the factor scores algorithm, we do not quantify how much each item contributes.

We can directly build a scale using the factor loadings as weights:

$$ \begin{gathered} {\text{Item i Weight}} = Factor{\text{ l}}oading \, i \hfill \\ {\text{MS-adjusted PHQ-9 score}} = \sum\limits_{i = 1}^{ 9} {Item \, i \, Weight \times Item \, i \, PHQ - 9{\text{ s}}core} \hfill \\ \end{gathered} $$
(1)

In this case we use the factor loadings in Table 4 for E = 0 as item weights and using (1), in our sample, the mean = 3.95 and standard deviation = 3.71. Similarly, we can use the probability integral transformation for this algorithm as described for the factor scores algorithm in “Factor scores”. In this case, there will be ties (same scores on MS-adjusted PHQ-9), but as explained above this will not be an issue with this transformation. The results here are a lot more conservative in change from the standard scoring, in that 64 % of subjects maintain the same score and only 2 % have greater than or equal to a two points or more difference from the original PHQ-9. As a result, this score may not be particularly useful in the transformed version, since it is a naïve approach of algorithm “Factor scores”, and may be more useful once more rigorous psychometric evaluation of this score is performed.

Downweighting overlapping symptoms

We only downweight the influence of items of the PHQ-9 in which depressive symptoms overlap with other symptoms for MS patients:

$$ {\text{Item i Weight}} = \frac{{E_{i} }}{{CFA_{i} }} $$
(2)

where Ei is the item i standardized factor loading for E = 0 and CFAi is the item i standardized factor loading for the one factor CFA model (see Table 4).

We multiply a patient’s score on items for sleep problems by 0.79, fatigue by 0.52, poor concentration by 0.70 and psychomotor symptoms by 0.74. PHQ-9 items for appetite change, feelings of failure and self-harm maintain the same score. Items for anhedonia and feel depressed, have an almost negligible residual DIF effect and we multiply a patient’s score on these items by 1.01. We observe for our adjusted PHQ-9, a population mean = 5.66, standard deviation = 5.30, and median = 4.07 with a range = (0, 23.32) and 1st quartile = 1.31, and 3rd quartile = 8.53. As noted, an application of “Downweighting overlapping symptoms” is to subtract this score from the PHQ-9, as an estimate of the amount that items on the PHQ-9 overlap with other symptoms for MS patients.

Using the PHQ-2

The PHQ-2 comprises the first two items of the PHQ-9 and has been used as a depression screening tool or as a pre-screener for the PHQ-9 (Kroenke et al., 2003). Thus, using this shortened scale will bypass evaluating items for sleep problems, fatigue, poor concentration and psychomotor symptoms. A PHQ-2 score of three has been used as a threshold for depression for screening purposes (Kroenke et al., 2003). The PHQ-2 is highly correlated with the PHQ-9 in this MS study population (pearson ρ ≈ 0.87). Further, a receiver operating characteristic (ROC) analysis of PHQ-9 ≥ 10 versus PHQ-2 ≥ 3 in this study population, showed high specificity = 95.7 and a large positive predictive value (PPV) = 91.2, and area under the curve (AUC) = 0.852, though at the expense of the test sensitivity = 63.8. In general, the PHQ-2 has shown wide variability in sensitivity in previous validation studies and more research is needed to see if its diagnostic properties approach those of the PHQ-9 (Gilbody et al., 2007).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gunzler, D.D., Perzynski, A., Morris, N. et al. Disentangling Multiple Sclerosis and depression: an adjusted depression screening score for patient-centered care. J Behav Med 38, 237–250 (2015). https://doi.org/10.1007/s10865-014-9574-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10865-014-9574-5

Keywords

Navigation