Statistics in Biosciences

, Volume 11, Issue 3, pp 567–596 | Cite as

Dealing with the Phenomenon of Quasi-complete Separation and a Goodness of Fit Test in Logistic Regression Models in the Case of Long Data Sets

  • V. G. Vassiliadis
  • I. I. Spyroglou
  • A. G. RigasEmail author
  • J. R. Rosenberg
  • K. A. Lindsay
Case Studies and Practice Articles


The phenomenon of quasi-complete separation that appears in the identification of the neuromuscular system called muscle spindle by a logistic regression model is considered. The system responds when it is affected by a number of stimuli. Both the response and the stimuli are very long binary sequences of events. In the logistic model, three functions are of special interest: the threshold, the recovery and the summation functions. The maximum likelihood estimates are obtained efficiently and very fast by using the penalized likelihood function. A validity test for the fitted model based on the randomized quantile residuals is proposed. The validity test is transformed to a goodness of fit test and the use of Q–Q plot is also discussed.


Penalized likelihood function Randomized quantile residuals Q–Q plot Binary data Muscle spindle 



We would like to express our gratitude to the Editor, to the Associate Editor and the two anonymous reviewers for their helpful and constructive comments which led to the improvement of the quality of this paper.

Supplementary material

12561_2019_9249_MOESM1_ESM.pdf (2.5 mb)
Supplementary material 1 (pdf 2611 KB)


  1. 1.
    Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71:1–10MathSciNetzbMATHGoogle Scholar
  2. 2.
    Anderson TW, Darling DA (1952) Asymptotic theory of certain goodness-of-fit criteria based on stochastic processes. Ann Math Stat 23:193–212MathSciNetzbMATHGoogle Scholar
  3. 3.
    Boyd IA (1980) The isolated mammalian muscle spindle. Trends Neurosci 3:258–265Google Scholar
  4. 4.
    Brillinger DR (1976) Estimation of the second-order intensities of a bivariate stationary point process. J R Stat Soc B 38:60–66MathSciNetzbMATHGoogle Scholar
  5. 5.
    Brillinger DR, Bryant HL, Segundo JP (1976) Identification of synaptic interactions. Biol Cybern 22:213–228zbMATHGoogle Scholar
  6. 6.
    Brillinger DR (1988) Maximum likelihood analysis of spike trains of interacting nerve cells. Biol Cybern 59:189–200zbMATHGoogle Scholar
  7. 7.
    Brillinger DR (1992) Nerve cell spike train data analysis: a progression of technique. JASA 87:260–271Google Scholar
  8. 8.
    Brillinger DR, Lindsay KA, Rosenberg JR (2009) Combining frequency and time domain approaches to systems with multiple spike train input and output. Biol Cybern 100:459–474MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bull SB, Lewinger JP, Lee SSF (2007) Confidence intervals for multinomial logistic regression in sparce data. Stat Med 26:903–918MathSciNetGoogle Scholar
  10. 10.
    Butler AA, Héroux ME, Gandevia SC (2017) Body ownership and a new proprioceptive role for muscle spindles. Acta Physiol 220(1):19–27Google Scholar
  11. 11.
    Carter DB, Signorino CS (2010) Back to the future: modeling time dependence in binary data. Polit Anal 18:271–292Google Scholar
  12. 12.
    Cox DR, Lewis PAW (1968) Statistical analysis of series of events. Methuen, LondonzbMATHGoogle Scholar
  13. 13.
    Cox DR, Isham V (1980) Point processes. Chapman and Hall, LondonzbMATHGoogle Scholar
  14. 14.
    Cox DR, Snell EJ (1989) Analysis of binary data, 2nd edn. Routledge, New YorkzbMATHGoogle Scholar
  15. 15.
    CRAN. The R project for statistical computing.
  16. 16.
    Dobson AJ, Barnett AG (2008) An introduction to generalized linear models, 3rd edn. Chapman and Hall, Boca RatonzbMATHGoogle Scholar
  17. 17.
    Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5(3):236–244Google Scholar
  18. 18.
    Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26MathSciNetzbMATHGoogle Scholar
  19. 19.
    Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New YorkzbMATHGoogle Scholar
  20. 20.
    Firth D, Glosup J, Hinkley DV (1991) Model checking with nonparametric curves. Biometrica 78(2):245–252MathSciNetGoogle Scholar
  21. 21.
    Firth D (1992a) Bias reduction, the Jeffreys prior and GLIM. In: Fahrmeir L, Francis B, Gilchrist R, Tutz G (eds) Advances in GLIM and statistical modelling. Springer, New York, pp 91–100zbMATHGoogle Scholar
  22. 22.
    Firth D (1992b) Generalized linear models and Jeffreys priors: an iterative weighted least-squares approach. In: Dodge Y, Whittaker J (eds) Computational statistics, vol 1. Physica-Verlag, Heidelberg, pp 553–557Google Scholar
  23. 23.
    Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38MathSciNetzbMATHGoogle Scholar
  24. 24.
    FL, logistf. Logistic regression using Firth’s bias reduction: a solution to the problem of separation in logistic regression.
  25. 25.
    Freedman DA (1981) Bootstrapping regression models. Ann Stat 9(6):1218–1228MathSciNetzbMATHGoogle Scholar
  26. 26.
    Friedl H, Tilg N (1995) Variance estimates in logistic regression using the bootstrap. Commun Stat Theory Methods 24(2):473–486zbMATHGoogle Scholar
  27. 27.
    Hardin JW, Hilbe JM (2007) Generalized linear models and extensions, 2nd edn. Stata Press, College StationzbMATHGoogle Scholar
  28. 28.
    Heinze G (1999) Technical report 10: the application of Firth s procedure to Cox and logistic regression. Department of Medical Computer Sciences, Section of Clinical Biometrics, Vienna University, ViennaGoogle Scholar
  29. 29.
    Heinze G, Schemper M (2002) A solution to the problem of separation in logistic regression. Stat Med 21:2409–2419Google Scholar
  30. 30.
    Heinze G, Ploner M (2004) Technical report 2/2004: a SAS-macro, S-PLUS library and R package to perform logistic regression without convergence problems. Section of Clinical Biometrics, Department of Medical Computer Sciences, Medical University of Vienna, Vienna, AustriaGoogle Scholar
  31. 31.
    Heinze G (2006) A comparative investigation of methods for logistic regression with separated or nearly separated data. Stat Med 25:4216–4226MathSciNetGoogle Scholar
  32. 32.
    Heinze G, Puhr R (2010) Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Stat Med 29:770–777MathSciNetGoogle Scholar
  33. 33.
    Holden AV (1976) Models for the stochastic activity of neurons. Springer, BerlinGoogle Scholar
  34. 34.
    Jeffreys H (1946) An invariant form for the prior probability in estimation problems. Proc R Soc A 186:453–461MathSciNetzbMATHGoogle Scholar
  35. 35.
    Karavasilis GJ, Kotti VK, Tsitsis DS, Vassiliadis VG, Rigas AG (2005) Statistical methods and software for risk assessment: applications to a neurophysiological data set. Comput Stat Data Anal 49:243–263MathSciNetzbMATHGoogle Scholar
  36. 36.
    Kosmidis I, Firth D (2010) A generic algorithm for reducing bias in parametric estimation. Electron J Stat 4:1097–1112MathSciNetzbMATHGoogle Scholar
  37. 37.
    Kotti VK, Rigas AG (2003) Identification of a complex neurophysiological system using the maximum likelihood approach. J Biol Syst 11(2):189–204zbMATHGoogle Scholar
  38. 38.
    Kotti VK, Rigas AG (2005) Logistic regression methods and their implementation. In: Edler L, Kitsos CP (eds) Recent advances in quantitative methods in cancer and human health risk assessment. Wiley, New York, pp 355–369Google Scholar
  39. 39.
    Kotti VK, Rigas AG (2008) A Monte Carlo method used for the identification of the muscle spindle. In: Deutsch A, de la Bravo Parra R, Boer RJ, Diekmann O, Jagers P, Kisdi E, Kretzschmar M, Lansky P, Metz H (eds) Mathematical modeling of biological systems, vol II. Birkhauser, Boston, pp 237–243Google Scholar
  40. 40.
    Liang J, Tang M, Chanc PS (2009) A generalized Shapiro-Wilk W statistic for testing high-dimensional normality. Comput Stat Data Anal 53:3883–3891MathSciNetzbMATHGoogle Scholar
  41. 41.
    Lindsay KA, Rosenberg JR (2012) Linear and quadratic models of point process systems: contributions of patterned input to output. Prog Biophys Mol Biol 109:76–94Google Scholar
  42. 42.
    LogXact 8 (2007) User manual. Cytel Inc., Cambridge, MAGoogle Scholar
  43. 43.
    Matthews PBC (1981) Review lecture: evolving views on the internal operation and functional role of the muscle spindle. J Physiol 320:1–30Google Scholar
  44. 44.
    McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonzbMATHGoogle Scholar
  45. 45.
    Mehta CR, Patel NR (1995) Exact logistic regression: theory and examples. Stat Med 14(19):2143–2160Google Scholar
  46. 46.
    Mehta CR, Patel NR, Senchaudhuri P (2000) Efficient Monte Carlo methods for conditional logistic regression. JASA 95(449):99–108Google Scholar
  47. 47.
    Proske U, Gandevia SC (2012) The proprioceptive senses: their roles in signaling body shape, body position and movement, and muscle force. Physiol Rev 92(4):1651–1697Google Scholar
  48. 48.
    Rigas AG, Liatsis P (2000) Identification of a neuroelectric system involving a single input and a single output. Signal Process 80(9):1883–1894zbMATHGoogle Scholar
  49. 49.
    Rigas AG, Vassiliadis VG (2015) Risk assessment of complex evolving systems involving multiple inputs. In: Kitsos C et al (eds) Theory and practice of risk assessment, Springer proceedings in mathematics and statistics, vol 136. Springer, Cham, pp 159–175Google Scholar
  50. 50.
    Santner TJ, Duffy DE (1986) A note on A. Albert’s and J.A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika 73:755–758MathSciNetzbMATHGoogle Scholar
  51. 51.
    Spyroglou II, Chatzimichail EA, Paraskakis EN, Rigas AG (2016) A comparison of different ridge parameters in an asthma persistence prediction model. Int J Biol Biomed Eng 10(1):1–9Google Scholar
  52. 52.
    Tsitsis DS, Karavasilis GJ, Rigas AG (2012) Measuring the association of stationary point processes using spectral analysis techniques. Stat Methods Appl 21(1):23–47MathSciNetzbMATHGoogle Scholar
  53. 53.
    Vassiliadis VG, Rigas AG (2009) A new formulation of the Hinich’s bispectral test for linearity based on a novel Q–Q plot for testing distributional hypotheses. In: Kitsos CP, Caroni C (eds) e-Proc international conference on cancer risk assessment 3Google Scholar
  54. 54.
    Venzon DJ, Moolgavkar SH (1988) A method for computing profile likelihood based confidence intervals. Appl Stat 37:87–94Google Scholar
  55. 55.
    Windhorst U (2007) Muscle proprioceptive feedback and spinal networks. Brain Res Bull 73:155–202Google Scholar
  56. 56.
    Zorn C (2005) A solution to separation in binary response models. Polit Anal 13:157–170Google Scholar

Copyright information

© International Chinese Statistical Association 2019

Authors and Affiliations

  • V. G. Vassiliadis
    • 1
  • I. I. Spyroglou
    • 1
  • A. G. Rigas
    • 1
    Email author
  • J. R. Rosenberg
    • 2
  • K. A. Lindsay
    • 3
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece
  2. 2.Division of Neuroscience and Biomedical SystemsUniversity of GlasgowGlasgowUK
  3. 3.Department of Mathematics, University GardensUniversity of GlasgowGlasgowUK

Personalised recommendations