Abstract
The logistic Generalized Estimating Equations (logistic-GEE) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic GEE models, namely ranking accuracy for models based on clustered data (RAMCD). We define RAMCD as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for RAMCD. The algorithm can be applied for two cases: (1) when we estimate RAMCD as a goodness-of-fit criterion and (2) when we estimate RAMCD as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We assume the usage of efficient sorting algorithms like merge sort.
References
Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. J. Mach. Learn. Res. 6, 393–425 (2005)
Ahsan, H., Chen, Y., Parvez, F., Zablotska, L., Argos, M., Hussain, I., Momotaj, H., Levy, D., Cheng, Z., Slavkovich, V., Van Geen, A.: Arsenic exposure from drinking water and risk of premalignant skin lesions in Bangladesh: baseline results from the health effects of arsenic longitudinal study. Am. J. Epidemiol. 163(12), 1138–1148 (2006)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Barnhart, H.X., Williamson, J.M.: Goodness-of-fit tests for GEE modeling with binary responses. Biometrics 54(2), 720–729 (1998)
Brunner–La Rocca, H.P., Buser, P.T., Schindler, R., Bernheim, A., Rickenbacher, P., Pfisterer, M., TIME-CHF-Investigators: Management of elderly patients with congestive heart failuredesign of the Trial of Intensified versus standard Medical therapy in Elderly patients with Congestive Heart Failure (TIME-CHF). Am. Heart J. 151(5), 949–955 (2006)
Evans, S.R., Hosmer Jr., D.W.: Goodness of fit tests for logistic GEE models: simulation results. Commun. Stat. Simul. Comput. 33(1), 247–258 (2004)
Evans, S., Li, L.: A comparison of goodness of fit tests for the logistic GEE model. Stat. Med. 24(8), 1245–1261 (2005)
Hanley, J.A., Negassa, A., Forrester, J.E.: Statistical analysis of correlated data using generalized estimating equations: an orientation. Biometrics 157(4), 364–375 (2003)
Lafata, J.E., Pladevall, M., Divine, G., Ayoub, M., Philbin, E.F.: Are there race/ethnicity differences in outpatient congestive heart failure management, hospital use, and mortality among an insured population? Med. Care 42(7), 680–689 (2004)
Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)
McCullagh, P.: Quasi-likelihood functions. Ann. Stat. 11(1), 59–67 (1983)
Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57(1), 120–125 (2001)
Pulkstenis, E., Robinson, T.J.: Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat. Med. 21(1), 79–93 (2002)
Titler, M.G., Jensen, G.A., Dochterman, J.M., Xie, X.J., Kanak, M., Reed, D., Shever, L.L.: Cost of hospital care for older adults with heart failure: medical, pharmaceutical, and nursing costs. Health Serv. Res. 43(2), 635–655 (2008)
Williamson, J.M., Lin, H.M., Barnhart, H.X.: A classification statistic for GEE categorical response models. J. Data Sci. 1, 149–165 (2003)
Zeger, S.L., Liang, K.Y.: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1), 121–130 (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Davarzani, N., Peeters, R., Smirnov, E., Karel, J., Brunner-La Rocca, HP. (2016). Ranking Accuracy for Logistic-GEE Models. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)