Abstract
Screening for depression can be challenging in Multiple Sclerosis (MS) patients due to the overlap of depressive symptoms with other symptoms, such as fatigue, cognitive impairment and functional impairment, for MS patients. The aim of this study was to understand these overlapping symptoms and subsequently develop an adjusted depression screening tool for better clinical assessment of depressive symptoms in MS patients. We evaluated 3,507 MS patients with a self-reported depression screening (PHQ-9) score using a multiple indicator multiple cause modeling approach. Our models showed significant differential item functioning effects denoting significant overlap of depressive symptoms with all MS symptoms under study and good model fit. The magnitude of the overlap was especially large for fatigue. Adjusted depression screening scales were formed based on factor scores and loadings that will allow clinicians to understand the depressive symptoms separate from other symptoms for MS patients for improved patient care.
Similar content being viewed by others
References
Aikens, J. E., Reinecke, M. A., Pliskin, N. H., Fischer, J. S., Wiebe, J. S., McCracken, L. M., et al. (1999). Assessing depressive symptoms in multiple sclerosis: Is it necessary to omit items from the original Beck Depression Inventory? Journal of Behavioral Medicine, 22, 127–142.
Alemayehu, D., Cappelleri, J. C., & Murphy, M. F. (2012). Conceptual and analytical considerations toward the use of patient-reported outcomes in personalized medicine. American Health & Drug Benefits, 5, 310–317.
Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438.
Benedict, R. H., Fishman, I., McClellan, M. M., Bakshi, R., & Weinstock-Guttman, B. (2003). Validity of the beck depression inventory-fast screen in multiple sclerosis. Multiple Sclerosis, 9, 393–396.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.
Blacker, D. (2009). Psychiatric rating scales. In B. J. Sadock, V. A. Sadock, & P. Ruiz (Eds.), Kaplan and Sadock’s comprehensive textbook of psychiatry (9th ed.). Philadelphia, PA: Lippincott Williams & Wilkins.
Bollen, K. A. (1989). Structural equations with latent variables. Hoboken, NJ: Wiley.
Bollen, K. A., & Long, J. S. (1993). Testing structural equation models. Newbury Park, CA: Sage Publications.
Brown, T. (2006). Confirmatory factor analysis for applied research (methodology in the social science). New York, NY: The Guilford Press.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications.
Casella, G., & Berger, R. L. (2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury Press.
Chang, C. H., Nyenhuis, D. L., Cella, D., Luchetta, T., Dineen, K., & Reder, A. T. (2003). Psychometric evaluation of the Chicago Multiscale Depression Inventory in multiple sclerosis patients. Multiple Sclerosis, 9, 160–170.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10, 1–9.
Crawford, P. W. (2009). Assessment of depression in multiple sclerosis validity of including somatic items on the beck depression inventory-II. International Journal of MS Care, 11, 167–173.
Ferrando, S. J., Samton, J., Mor, N., Nicora, S., Findler, M., & Apatoff, B. (2007). Patient health questionnaire-9 to screen for depression in outpatients with multiple sclerosis. International Journal of MS Care, 9, 99–103.
Fischer, J. S., Rudick, R. A., Cutter, G. R., & Reingold, S. C. (1999). The multiple sclerosis functional composite measure (MSFC): An integrated approach to MS clinical outcome assessment. Multiple Sclerosis, 5, 244–250.
Gilbody, S., Richards, D., Brealey, S., & Hewitt, C. (2007). Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): A diagnostic meta-analysis. Journal of General Internal Medicine, 11, 596–602.
Goldman Consensus Panel. (2005). The Goldman Consensus statement on depression in multiple sclerosis. Multiple Sclerosis, 11, 328–337.
Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 56–62.
Hansson, M., Chotai, J., Nordstöm, A., & Bodlund, O. (2009). Comparison of two self-rating scales to detect depression: HADS and PHQ-9. British Journal of General Practice, 59, e283–e288.
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.
Huang, F. Y., Chung, H., Kroenke, K., Delucchi, K. L., & Spitzer, R. L. (2006). Using the Patient Health Questionnaire-9 to measure depression among racially and ethnically diverse primary care patients. Journal of General Internal Medicine, 21, 547–552.
Johnson, D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 48, 398–407.
Kalpakjian, C. Z., Toussaint, L. L., Albright, K. J., Bombardier, C. H., Krause, J. K., & Tate, D. G. (2009). Patient Health Questionnaire-9 in spinal cord injury: An examination of factor structure as related to gender. Journal of Spinal Cord Medicine, 32, 147–156.
Kline, R. B. (2010). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford.
Knowledge Program developed at Cleveland Clinic’s Neurological Institute. (2008–2013). Retrieved from, http://my.clevelandclinic.org/neurological_institute/about/default.aspx
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606–613.
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2003). The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care, 41, 1284–1292.
Krupp, L. B. (2004). Fatigue in multiple sclerosis. New York, NY: Demos Medical Publishing.
Marrie, R. A., & Goldman, M. (2007). Validity of performance scales for disability assessment in multiple sclerosis. Multiple Sclerosis, 13, 1176–1182.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling A Multidisciplinary Journal, 11, 320–341.
Mellen Center for Multiple Sclerosis Treatment and Research, Cleveland Clinic, Neurological Institute. (2013). Retrieved from, http://my.clevelandclinic.org/neurological_institute/mellen-center-multiple-sclerosis/default.aspx
Mohr, D. C., Goodkin, D. E., Likosky, W., Beutler, L., Gatto, N., & Langan, M. K. (1997). Identification of Beck Depression Inventory items related to multiple sclerosis. Journal of Behavioral Medicine, 20, 407–414.
Mohr, D. C., Hart, S. L., & Goldberg, A. (2003). Effects of treatment for depression on fatigue in multiple sclerosis. Psychosomatic Medicine, 65, 542–547.
Multiple Sclerosis Association of America. (2014). Retrieved from, http://www.mymsaa.org/about-ms/faq/
Muthén, B., & Muthén, L. (2000). Integrating person-centered and variable-centered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism, Clinical and Experimental Research, 24, 882–891.
Muthén, L. K., & Muthén, B. O. (2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.
Pinto-Meza, A., Serrano-Blanco, A., et al. (2005). Assessing depression in primary care with the PHQ-9: Can it be carried out over the telephone? Journal of General Internal Medicine, 20, 738–742.
Polman, C. H., & Rudick, R. A. (2010). The multiple sclerosis functional composite: A clinically meaningful measure of disability. Neurology, 74, S8–S15.
R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Raykov, T., & Marcoulides, G. (2011). Introduction to psychometric theory. New York, NY: Taylor and Francis Group.
Rudick, R. A., Antel, J., Confavreux, C., Cutter, G., Ellison, G., Fischer, J., et al. (1996). Clinical outcomes assessment in multiple sclerosis. Annals of Neurology, 40, 469–479.
SAS Institute Inc. (2008). SAS/STAT ® 9.2 user’s guide. Cary, NC: SAS Institute Inc.
Schwartz, C. E., Vollmer, T., et al. (1999). Reliability and validity of two self-report measures of impairment and disability for MS. Neurology, 52, 63–70.
Siegert, R. J., & Abernethy, D. A. (2005). Depression in multiple sclerosis: A review. Journal of Neurology, Neurosurgery and Psychiatry, 76, 469–475.
Sjonnesen, K., Berzins, S., Fiest, K. M., Bulloch, A. G., Metz, L. M., Thombs, B. D., et al. (2012). Evaluation of the 9-item Patient Health Questionnaire (PHQ-9) as an assessment instrument for symptoms of depression in patients with multiple sclerosis. Postgraduate Medicine, 124, 69–77.
Sudhahar, J. C., Israel, D., & Selvam, M. (2006). Banking service loyalty determination through SEM technique. Journal of Applied Sciences, 6, 1472–1480.
Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.
Wallin, M. T., Wilken, J. A., Turner, A. P., Williams, R. M., & Kane, R. (2006). Depression and multiple sclerosis: Review of a lethal combination. Journal of Rehabilitation Research and Development, 43, 45–62.
Whitaker, J. N., McFarland, H. F., Rudge, P., & Reingold, S. C. (1995). Outcomes assessment in multiple sclerosis clinical trials: A critical analysis. Multiple Sclerosis, 1, 37–47.
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-Model DIF testing with the schedule for nonadaptive and adaptive personality. Journal of Psychopathological and Behavioral Assessment, 31, 320–330.
Zumbo, B. D., & Zimmerman, D. W. (1993). Is the selection of statistical methods governed by level of measurement? Canadian Psychology, 34, 390–400.
Acknowledgments
Financial support for this study was provided by a Grant from NIH/NCRR CTSA KL2TR000440 and by a Grant from Novartis. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. We appreciate the contributions from Drs. Randall Cebul, Thomas Love, Irene Katzan, Neal Dawson, Center for Health Care Research and Policy, Drs. Richard Rudick, and Francois Bethoux, Mellen Center, and Dr. Martha Sajatovic, Departments of Psychiatry and Neurology at Case Western Reserve University School of Medicine.
Conflict of interest
Douglas Gunzler, Adam Perzynski, Nathan Morris, Steven Lewis and Deborah Miller declare that they have no conflict of interest. Robert Bermel has received research grants from Novartis.
Human and Animal Rights and Informed Consent
All procedures followed were in accordance with ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2000. Informed consent was obtained from all patients for being included in the study.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: More details about adjusted depression screening scale algorithms
Appendix: More details about adjusted depression screening scale algorithms
Factor scores
Factor scores include adjustment for the effect of both CFA as in Fig. 1 Panel C, since the nine PHQ-9 items load unequally on depression, and the adjustment of overlapping symptoms via covariates and DIF paths from our E = 0 model in Fig. 1 Panel F. Using MPlus, we calculate factor scores within our sample using the maximum a posteriori method (Muthén and Muthén, 2012), which for continuous outcomes is the widely used regression method: mean = 0, variance = 0.56, median = − 0.21, range = (−0.84, 2.47), 1st quartile = −0.62, and 3rd quartile = 0.40. These factor scores can then be linearly transformed to fit any potentially useful range (i.e. 0–27, 0–100, etc.)
In our application, and for immediate clinical use, a probability integral transformation (Casella and Berger, 2002) is used to transform these factors scores into the PHQ-9 scores within this population to maintain the interpretation and thresholds of the PHQ-9. This is a one-to-one transformation, so the distribution of PHQ-9 scores in this population will remain the same, just that certain individuals will be matched to different PHQ-9 scores. This transformation is built on continuous cumulative distribution functions and thus there are no special steps required for tied adjusted scores between two or more individuals (unlikely for factor scores). To perform the transformation:
-
1.
Use a kernel density estimator to estimate the density of the PHQ-9 scores in this population.
-
2.
Obtain the cumulative distribution function by numerical integration.
-
3.
Numerically invert the cumulative distribution function for the PHQ-9.
-
4.
Repeat steps (1) and (2) for the factor scores.
-
5.
Transform the factor scores of each individual into a PHQ-9 score for this population by transforming the cumulative distribution function for the factor scores using the inverse of the cumulative distribution function for the PHQ-9.
The resulting factor scores will be distributed the same as the PHQ-9 within this population. For ease of use in a clinical setting, the recommendation is to round these transformed scores off to the nearest whole number.
Note, within EHR-based database, every time a new or series of new PHQ-9 scores is entered, we can compute the factor scores. Then, either all subjects can be updated or we can keep prior records the same, and just use the functions to transform the new scores only. Using this approach, only 44 % of subjects maintain the same transformed adjusted score as the original PHQ-9 score, with 16 % of subjects scoring a transformed adjusted score of two points or more (with three subjects up to 6 points) different than the original PHQ-9 score. Further, 74 subjects had a PHQ-9 score ≥10 and now have a transformed adjusted score <10, while 65 subjects had a PHQ-9 score <10 and now have a transformed adjusted score ≥10.
Item weights
Compared to this algorithm in “Item weights” based on factor loadings, the factor scores algorithm in “Factor scores” results in a more mathematically rigorous individualized score, taking into account item loadings, intercepts, correlation among observed variables, and a predictive equation. However, in the factor scores algorithm, we do not quantify how much each item contributes.
We can directly build a scale using the factor loadings as weights:
In this case we use the factor loadings in Table 4 for E = 0 as item weights and using (1), in our sample, the mean = 3.95 and standard deviation = 3.71. Similarly, we can use the probability integral transformation for this algorithm as described for the factor scores algorithm in “Factor scores”. In this case, there will be ties (same scores on MS-adjusted PHQ-9), but as explained above this will not be an issue with this transformation. The results here are a lot more conservative in change from the standard scoring, in that 64 % of subjects maintain the same score and only 2 % have greater than or equal to a two points or more difference from the original PHQ-9. As a result, this score may not be particularly useful in the transformed version, since it is a naïve approach of algorithm “Factor scores”, and may be more useful once more rigorous psychometric evaluation of this score is performed.
Downweighting overlapping symptoms
We only downweight the influence of items of the PHQ-9 in which depressive symptoms overlap with other symptoms for MS patients:
where Ei is the item i standardized factor loading for E = 0 and CFAi is the item i standardized factor loading for the one factor CFA model (see Table 4).
We multiply a patient’s score on items for sleep problems by 0.79, fatigue by 0.52, poor concentration by 0.70 and psychomotor symptoms by 0.74. PHQ-9 items for appetite change, feelings of failure and self-harm maintain the same score. Items for anhedonia and feel depressed, have an almost negligible residual DIF effect and we multiply a patient’s score on these items by 1.01. We observe for our adjusted PHQ-9, a population mean = 5.66, standard deviation = 5.30, and median = 4.07 with a range = (0, 23.32) and 1st quartile = 1.31, and 3rd quartile = 8.53. As noted, an application of “Downweighting overlapping symptoms” is to subtract this score from the PHQ-9, as an estimate of the amount that items on the PHQ-9 overlap with other symptoms for MS patients.
Using the PHQ-2
The PHQ-2 comprises the first two items of the PHQ-9 and has been used as a depression screening tool or as a pre-screener for the PHQ-9 (Kroenke et al., 2003). Thus, using this shortened scale will bypass evaluating items for sleep problems, fatigue, poor concentration and psychomotor symptoms. A PHQ-2 score of three has been used as a threshold for depression for screening purposes (Kroenke et al., 2003). The PHQ-2 is highly correlated with the PHQ-9 in this MS study population (pearson ρ ≈ 0.87). Further, a receiver operating characteristic (ROC) analysis of PHQ-9 ≥ 10 versus PHQ-2 ≥ 3 in this study population, showed high specificity = 95.7 and a large positive predictive value (PPV) = 91.2, and area under the curve (AUC) = 0.852, though at the expense of the test sensitivity = 63.8. In general, the PHQ-2 has shown wide variability in sensitivity in previous validation studies and more research is needed to see if its diagnostic properties approach those of the PHQ-9 (Gilbody et al., 2007).
Rights and permissions
About this article
Cite this article
Gunzler, D.D., Perzynski, A., Morris, N. et al. Disentangling Multiple Sclerosis and depression: an adjusted depression screening score for patient-centered care. J Behav Med 38, 237–250 (2015). https://doi.org/10.1007/s10865-014-9574-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10865-014-9574-5