Skip to main content

Ranking Accuracy for Logistic-GEE Models

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Abstract

The logistic Generalized Estimating Equations (logistic-GEE) models have been extensively used for analyzing clustered binary data. However, assessing the goodness-of-fit and predictability of these models is problematic due to the fact that no likelihood is available and the observations can be correlated within a cluster. In this paper we propose a new measure for estimating the generalization performance of the logistic GEE models, namely ranking accuracy for models based on clustered data (RAMCD). We define RAMCD as the probability that a randomly selected positive observation is ranked higher than randomly selected negative observation from another cluster. We propose a computationally efficient algorithm for RAMCD. The algorithm can be applied for two cases: (1) when we estimate RAMCD as a goodness-of-fit criterion and (2) when we estimate RAMCD as a predictability criterion. This is experimentally shown on clustered data from a simulation study and a biomarkers’ study.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We assume the usage of efficient sorting algorithms like merge sort.

References

  1. Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. J. Mach. Learn. Res. 6, 393–425 (2005)

    MathSciNet  MATH  Google Scholar 

  2. Ahsan, H., Chen, Y., Parvez, F., Zablotska, L., Argos, M., Hussain, I., Momotaj, H., Levy, D., Cheng, Z., Slavkovich, V., Van Geen, A.: Arsenic exposure from drinking water and risk of premalignant skin lesions in Bangladesh: baseline results from the health effects of arsenic longitudinal study. Am. J. Epidemiol. 163(12), 1138–1148 (2006)

    Article  Google Scholar 

  3. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  4. Barnhart, H.X., Williamson, J.M.: Goodness-of-fit tests for GEE modeling with binary responses. Biometrics 54(2), 720–729 (1998)

    Article  MATH  Google Scholar 

  5. Brunner–La Rocca, H.P., Buser, P.T., Schindler, R., Bernheim, A., Rickenbacher, P., Pfisterer, M., TIME-CHF-Investigators: Management of elderly patients with congestive heart failuredesign of the Trial of Intensified versus standard Medical therapy in Elderly patients with Congestive Heart Failure (TIME-CHF). Am. Heart J. 151(5), 949–955 (2006)

    Google Scholar 

  6. Evans, S.R., Hosmer Jr., D.W.: Goodness of fit tests for logistic GEE models: simulation results. Commun. Stat. Simul. Comput. 33(1), 247–258 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Evans, S., Li, L.: A comparison of goodness of fit tests for the logistic GEE model. Stat. Med. 24(8), 1245–1261 (2005)

    Article  MathSciNet  Google Scholar 

  8. Hanley, J.A., Negassa, A., Forrester, J.E.: Statistical analysis of correlated data using generalized estimating equations: an orientation. Biometrics 157(4), 364–375 (2003)

    Google Scholar 

  9. Lafata, J.E., Pladevall, M., Divine, G., Ayoub, M., Philbin, E.F.: Are there race/ethnicity differences in outpatient congestive heart failure management, hospital use, and mortality among an insured population? Med. Care 42(7), 680–689 (2004)

    Article  Google Scholar 

  10. Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  11. McCullagh, P.: Quasi-likelihood functions. Ann. Stat. 11(1), 59–67 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  12. Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57(1), 120–125 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Pulkstenis, E., Robinson, T.J.: Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat. Med. 21(1), 79–93 (2002)

    Article  Google Scholar 

  14. Titler, M.G., Jensen, G.A., Dochterman, J.M., Xie, X.J., Kanak, M., Reed, D., Shever, L.L.: Cost of hospital care for older adults with heart failure: medical, pharmaceutical, and nursing costs. Health Serv. Res. 43(2), 635–655 (2008)

    Article  Google Scholar 

  15. Williamson, J.M., Lin, H.M., Barnhart, H.X.: A classification statistic for GEE categorical response models. J. Data Sci. 1, 149–165 (2003)

    Google Scholar 

  16. Zeger, S.L., Liang, K.Y.: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1), 121–130 (1986)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasser Davarzani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Davarzani, N., Peeters, R., Smirnov, E., Karel, J., Brunner-La Rocca, HP. (2016). Ranking Accuracy for Logistic-GEE Models. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46349-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46348-3

  • Online ISBN: 978-3-319-46349-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics