Regression analysis of heterogeneous samples with subgroup structure is essential to the development of precision medicine. In practice, this task is often challenging owing to the lack of prior knowledge of subgroup labels. Therefore, detecting the subgroups with similar characteristics becomes critical, which often controls the accuracy of regression analysis. In this article, we investigate a new framework for detecting the subgroups that have similar characters in feature space and similar treatment effects. The key idea is that we incorporate K-means clustering into the regression framework of concave pairwise fusion, so that the regression and subgroup detection tasks can be performed simultaneously. Our method is specifically tailored for handling the situations where the sample is not homogeneous in the sense that the response variables in different domains of feature space are generated through different mechanisms.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Eckstein J (2012) Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. In: RUTCOR research report RRR 32-2012, Rutgers University, pp 1–34
El-Banna M (2017) Modified Mahalanobis Taguchi system for imbalance data classification. Comput Intell Neurosc 2017:5874896–15
Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Arnold, London
Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fortin M, Glowinski R (1983) On decomposition-coordination methods using an augmented Lagrangian. In: Fortin M, Glowinski R (eds) Augmented Lagrangian methods: applications to the solution of boundary-value problems. North-Holland, Amsterdam
Huang H (2017) Regression in heterogeneous problems. Statistica Sinica 27(1):71–88
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Hastie T, Tibshirani R, Friedman J (2016) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin, pp 459–463
Huber PJ (1981) Robust statistics. Wiley, New York, pp 153–164
Kumar P, Kanaujia SK, Singh A, Pradhan A (2019) In vivo detection of oral precancer using a fluorescence-based, in-house-fabricated device: a Mahalanobis distance-based classification. Lasers Med Sci 34(6):1243–1251
Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112(517):410–423
Martino A, Ghiglietti A, Ieva F, Paganoni AM (2019) A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. Stat Methods Appl 28(2):301–322
Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71
Morgan KL, Rubin DB (2015) Rerandomization to balance tiers of covariates. J Am Stat Assoc 110(512):1412–1421
Nikpay S, Freedman S, Levy H, Buchmueller T (2017) Effect of the affordable care act medicaid expansion on emergency department visits: evidence from state-level emergency department databases. Ann Emerg Med 70(2):215–225.e6
Sorensen T (1996) Which patients may be harmed by good treatments? Lancet 384:351–352
Shen J, He X (2015) Inference for subgroup analysis with a structured logistic-normal mixture model. J Am Stat Assoc 110(509):303–312
Tehan H, Witteveen K, Tolan GA, Tehan G, Senior GJ (2018) Using mahalanobis distance to evaluate recovery in acute stroke. Arch Clin Neuropsychol 33(5):577–582
Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang Y, Wang HJ, Zhu Z (2019) Robust subgroup identification. Stat Sin 29(4):1873–1889
Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108(502):527–539
The authors thank AE and two anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this article. The authors also thank professor Shujie Ma for her constructive comments on our work during the meeting at LICAS 2019. This research was supported by the Fundamental Research Funds for the Central Universities, Beijing Natural Science Foundation (No. 1204031), and the National Natural Science Foundation of China (No. 11901013).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (pdf 217 KB)
About this article
Cite this article
Liang, B., Wu, P., Tong, X. et al. Regression and subgroup detection for heterogeneous samples. Comput Stat 35, 1853–1878 (2020). https://doi.org/10.1007/s00180-020-00965-5
- Concave fusion
- Heterogeneous problem
- K-means clustering
- Subgroup detection