Advertisement

Ranking Accuracy for Logistic-GEE Models

  • Nasser DavarzaniEmail author
  • Ralf Peeters
  • Evgueni Smirnov
  • Joël Karel
  • Hans-Peter Brunner-La Rocca
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)

Abstract

The logistic Generalized Estimating Equations (logistic-GEE) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic GEE models, namely ranking accuracy for models based on clustered data (RAMCD). We define RAMCD as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for RAMCD. The algorithm can be applied for two cases: (1) when we estimate RAMCD as a goodness-of-fit criterion and (2) when we estimate RAMCD as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.

Keywords

Clustered data Generalized Estimating Equation Goodness-of-fit Predictability Ranking accuracy 

References

  1. 1.
    Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. J. Mach. Learn. Res. 6, 393–425 (2005)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Ahsan, H., Chen, Y., Parvez, F., Zablotska, L., Argos, M., Hussain, I., Momotaj, H., Levy, D., Cheng, Z., Slavkovich, V., Van Geen, A.: Arsenic exposure from drinking water and risk of premalignant skin lesions in Bangladesh: baseline results from the health effects of arsenic longitudinal study. Am. J. Epidemiol. 163(12), 1138–1148 (2006)CrossRefGoogle Scholar
  3. 3.
    Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Barnhart, H.X., Williamson, J.M.: Goodness-of-fit tests for GEE modeling with binary responses. Biometrics 54(2), 720–729 (1998)CrossRefzbMATHGoogle Scholar
  5. 5.
    Brunner–La Rocca, H.P., Buser, P.T., Schindler, R., Bernheim, A., Rickenbacher, P., Pfisterer, M., TIME-CHF-Investigators: Management of elderly patients with congestive heart failuredesign of the Trial of Intensified versus standard Medical therapy in Elderly patients with Congestive Heart Failure (TIME-CHF). Am. Heart J. 151(5), 949–955 (2006)Google Scholar
  6. 6.
    Evans, S.R., Hosmer Jr., D.W.: Goodness of fit tests for logistic GEE models: simulation results. Commun. Stat. Simul. Comput. 33(1), 247–258 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Evans, S., Li, L.: A comparison of goodness of fit tests for the logistic GEE model. Stat. Med. 24(8), 1245–1261 (2005)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Hanley, J.A., Negassa, A., Forrester, J.E.: Statistical analysis of correlated data using generalized estimating equations: an orientation. Biometrics 157(4), 364–375 (2003)Google Scholar
  9. 9.
    Lafata, J.E., Pladevall, M., Divine, G., Ayoub, M., Philbin, E.F.: Are there race/ethnicity differences in outpatient congestive heart failure management, hospital use, and mortality among an insured population? Med. Care 42(7), 680–689 (2004)CrossRefGoogle Scholar
  10. 10.
    Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    McCullagh, P.: Quasi-likelihood functions. Ann. Stat. 11(1), 59–67 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57(1), 120–125 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Pulkstenis, E., Robinson, T.J.: Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat. Med. 21(1), 79–93 (2002)CrossRefGoogle Scholar
  14. 14.
    Titler, M.G., Jensen, G.A., Dochterman, J.M., Xie, X.J., Kanak, M., Reed, D., Shever, L.L.: Cost of hospital care for older adults with heart failure: medical, pharmaceutical, and nursing costs. Health Serv. Res. 43(2), 635–655 (2008)CrossRefGoogle Scholar
  15. 15.
    Williamson, J.M., Lin, H.M., Barnhart, H.X.: A classification statistic for GEE categorical response models. J. Data Sci. 1, 149–165 (2003)Google Scholar
  16. 16.
    Zeger, S.L., Liang, K.Y.: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1), 121–130 (1986)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Nasser Davarzani
    • 1
    Email author
  • Ralf Peeters
    • 1
  • Evgueni Smirnov
    • 1
  • Joël Karel
    • 1
  • Hans-Peter Brunner-La Rocca
    • 2
  1. 1.Department of Data Science and Knowledge EngineeringMaastricht UniversityMaastrichtThe Netherlands
  2. 2.Department of CardiologyMaastricht University Medical CenterMaastrichtThe Netherlands

Personalised recommendations