Ranking Accuracy for Logistic-GEE Models

Davarzani, Nasser; Peeters, Ralf; Smirnov, Evgueni; Karel, Joël; Brunner-La Rocca, Hans-Peter

doi:10.1007/978-3-319-46349-0_2

Ranking Accuracy for Logistic-GEE Models

Nasser Davarzani¹⁷,
Ralf Peeters¹⁷,
Evgueni Smirnov¹⁷,
Joël Karel¹⁷ &
…
Hans-Peter Brunner-La Rocca¹⁸

Conference paper
First Online: 21 September 2016

1712 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Abstract

The logistic Generalized Estimating Equations (logistic-GEE) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic GEE models, namely ranking accuracy for models based on clustered data (RAMCD). We define RAMCD as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for RAMCD. The algorithm can be applied for two cases: (1) when we estimate RAMCD as a goodness-of-fit criterion and (2) when we estimate RAMCD as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We assume the usage of efficient sorting algorithms like merge sort.

References

Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. J. Mach. Learn. Res. 6, 393–425 (2005)
MathSciNet MATH Google Scholar
Ahsan, H., Chen, Y., Parvez, F., Zablotska, L., Argos, M., Hussain, I., Momotaj, H., Levy, D., Cheng, Z., Slavkovich, V., Van Geen, A.: Arsenic exposure from drinking water and risk of premalignant skin lesions in Bangladesh: baseline results from the health effects of arsenic longitudinal study. Am. J. Epidemiol. 163(12), 1138–1148 (2006)
Article Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article MathSciNet MATH Google Scholar
Barnhart, H.X., Williamson, J.M.: Goodness-of-fit tests for GEE modeling with binary responses. Biometrics 54(2), 720–729 (1998)
Article MATH Google Scholar
Brunner–La Rocca, H.P., Buser, P.T., Schindler, R., Bernheim, A., Rickenbacher, P., Pfisterer, M., TIME-CHF-Investigators: Management of elderly patients with congestive heart failuredesign of the Trial of Intensified versus standard Medical therapy in Elderly patients with Congestive Heart Failure (TIME-CHF). Am. Heart J. 151(5), 949–955 (2006)
Google Scholar
Evans, S.R., Hosmer Jr., D.W.: Goodness of fit tests for logistic GEE models: simulation results. Commun. Stat. Simul. Comput. 33(1), 247–258 (2004)
Article MathSciNet MATH Google Scholar
Evans, S., Li, L.: A comparison of goodness of fit tests for the logistic GEE model. Stat. Med. 24(8), 1245–1261 (2005)
Article MathSciNet Google Scholar
Hanley, J.A., Negassa, A., Forrester, J.E.: Statistical analysis of correlated data using generalized estimating equations: an orientation. Biometrics 157(4), 364–375 (2003)
Google Scholar
Lafata, J.E., Pladevall, M., Divine, G., Ayoub, M., Philbin, E.F.: Are there race/ethnicity differences in outpatient congestive heart failure management, hospital use, and mortality among an insured population? Med. Care 42(7), 680–689 (2004)
Article Google Scholar
Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)
Article MathSciNet MATH Google Scholar
McCullagh, P.: Quasi-likelihood functions. Ann. Stat. 11(1), 59–67 (1983)
Article MathSciNet MATH Google Scholar
Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57(1), 120–125 (2001)
Article MathSciNet MATH Google Scholar
Pulkstenis, E., Robinson, T.J.: Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat. Med. 21(1), 79–93 (2002)
Article Google Scholar
Titler, M.G., Jensen, G.A., Dochterman, J.M., Xie, X.J., Kanak, M., Reed, D., Shever, L.L.: Cost of hospital care for older adults with heart failure: medical, pharmaceutical, and nursing costs. Health Serv. Res. 43(2), 635–655 (2008)
Article Google Scholar
Williamson, J.M., Lin, H.M., Barnhart, H.X.: A classification statistic for GEE categorical response models. J. Data Sci. 1, 149–165 (2003)
Google Scholar
Zeger, S.L., Liang, K.Y.: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1), 121–130 (1986)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Science and Knowledge Engineering, Maastricht University, P.O.BOX 616, 6200 MD, Maastricht, The Netherlands
Nasser Davarzani, Ralf Peeters, Evgueni Smirnov & Joël Karel
Department of Cardiology, Maastricht University Medical Center, Maastricht, The Netherlands
Hans-Peter Brunner-La Rocca

Authors

Nasser Davarzani
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Peeters
View author publications
You can also search for this author in PubMed Google Scholar
Evgueni Smirnov
View author publications
You can also search for this author in PubMed Google Scholar
Joël Karel
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Brunner-La Rocca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nasser Davarzani .

Editor information

Editors and Affiliations

Stockholm University , Stockholm, Sweden
Henrik Boström
Leiden University , Leiden, The Netherlands
Arno Knobbe
University of Porto , Porto, Portugal
Carlos Soares
Stockholm University , Stockholm, Sweden
Panagiotis Papapetrou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davarzani, N., Peeters, R., Smirnov, E., Karel, J., Brunner-La Rocca, HP. (2016). Ranking Accuracy for Logistic-GEE Models. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-46349-0_2
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics