Applying Machine Learning Algorithms to Segment High-Cost Patient Populations
Efforts to improve the value of care for high-cost patients may benefit from care management strategies targeted at clinically distinct subgroups of patients.
To evaluate the performance of three different machine learning algorithms for identifying subgroups of high-cost patients.
We applied three different clustering algorithms—connectivity-based clustering using agglomerative hierarchical clustering, centroid-based clustering with the k-medoids algorithm, and density-based clustering with the OPTICS algorithm—to a clinical and administrative dataset. We then examined the extent to which each algorithm identified subgroups of patients that were (1) clinically distinct and (2) associated with meaningful differences in relevant utilization metrics.
Patients enrolled in a national Medicare Advantage plan, categorized in the top decile of spending (n = 6154).
Post hoc discriminative models comparing the importance of variables for distinguishing observations in one cluster from the rest. Variance in utilization and spending measures.
Connectivity-based, centroid-based, and density-based clustering identified eight, five, and ten subgroups of high-cost patients, respectively. Post hoc discriminative models indicated that density-based clustering subgroups were the most clinically distinct. The variance of utilization and spending measures was the greatest among the subgroups identified through density-based clustering.
Machine learning algorithms can be used to segment a high-cost patient population into subgroups of patients that are clinically distinct and associated with meaningful differences in utilization and spending measures. For these purposes, density-based clustering with the OPTICS algorithm outperformed connectivity-based and centroid-based clustering algorithms.
KEY WORDShigh-cost patients machine learning patient segmentation
This study was supported by a grant from the Anthem Public Policy Institute and, in part, under a grant with the Pennsylvania Department of Health.
Compliance with Ethical Standards
This study was approved by the Institutional Review Board of the University of Pennsylvania.
Conflict of Interest
Dr. Navathe reports that he has received grant support from Hawaii Medical Service Association, Anthem Public Policy Institute, and Oscar Health; personal fees from Navvis and Co, Navigant Inc., Lynx Medical, Indegene Inc., Agathos, Inc, and Sutherland Global Services; personal fees and equity from NavaHealth; serves on the board without compensation for Integrated Services, Inc., speaking fees from the Cleveland Clinic, and honoraria from Elsevier Press. Dr. Linn reports that she has received grant support from Hawaii Medical Service Association. Dr. Jain reports employment by Anthem, Inc.; stock ownership in Anthem, Inc., and honoraria from Elsevier Press. Ms. Kowalski reports employment by Anthem, Inc. and stock ownership in Anthem, Inc. and Amazon. Dr. Powers reports employment by Anthem, Inc. All other authors declare no conflicts of interest.
The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations, or conclusions.
- 1.National Academy of Medicine. Effective Care for High-Need Patients. Washington, DC: National Academy of Medicine; 2017.Google Scholar
- 2.Hong CS, Siegel AL, Ferris TG. Caring for High-Need, High-Cost Patients: What Makes for a Successful Care Management Program? 2014; https://www.commonwealthfund.org/sites/default/files/documents/___media_files_publications_issue_brief_2014_aug_1764_hong_caring_for_high_need_high_cost_patients_ccm_ib.pdf. Accessed October 19, 2018.Google Scholar
- 8.Gan G, Ma C, Wu J. Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics; 2007.Google Scholar
- 19.Powers BW, Yan J, Zhu J, et al. Subgroups of High-Cost Medicare Advantage Patients: An Observational Study. J Gen Intern Med 2018.Google Scholar
- 20.Bellman R. Adaptive control processes: a guided tour. Princeton, N.J.,: Princeton University Press; 1961.Google Scholar
- 21.Donoho DL. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture. 2000:1–32.Google Scholar
- 22.Van Der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(Nov):2579–2605.Google Scholar
- 23.Van Der Maaten L. Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 2014;15(1):3221–3245.Google Scholar
- 26.Kaufman L, Rousseeuw PJ. Clustering by means of medoids. Amsterdam: North-Holland/Elsevier; 1987.Google Scholar
- 27.Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; 1996; Portland, Oregon.Google Scholar
- 30.Figueroa JF, Jha AK. Approach for Achieving Effective Care for High-Need Patients. JAMA Intern Med. 2018;178(6):845–846.Google Scholar