Estimating the Number of Clusters in Logistic Regression Clustering by an Information Theoretic Criterion

  • Guoqi Qian
  • C. Radhakrishna Rao
  • Yuehua Wu
  • Qing Shao

This paper studies the problem of estimating the number of clusters in the context of logistic regression clustering. The classi.cation likelihood approach is employed to tackle this problem. An information theoretic criterion for selecting the number of logistic curves is proposed in the sequel and then its asymptotic property is considered.

The paper is arranged as follows: In Section 2, some notations are given and an information theoretic criterion is proposed for estimating the number of clusters. In Section 3, the small sample performance of the proposed criterion is studied by Monte Carlo simulation. In Section 4, the asymptotic property of the criterion proposed in Section 2 is investigated. Some lemmas needed in Section 4 are given in the appendix.


Logistic Regression Binomial Distribution Maximum Likelihood Estimator Asymptotic Property Linear Predictor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Chung KL (2001) A Course in Probability Theory (3rd edition). Academic PressGoogle Scholar
  2. Farewell BT, Sprott D (1988) The use of a mixture model in the analysis of count data. Biometrics 44:1191-1194MATHCrossRefGoogle Scholar
  3. Follmann DA, Lambert D (1989) Generalizing logistic regression by nonparametric mixing. Journal of the American Statistical Association 84:295-300CrossRefGoogle Scholar
  4. Follmann DA, Lambert D (1991) Identifiability for nonparametric mixtures of logistic regressions. Journal of Statistical Planning and Inference 27:375-381MATHCrossRefMathSciNetGoogle Scholar
  5. McCullagh P, Nelder JA (1989) Generalized Linear Models (2nd edition). Chapman and HallGoogle Scholar
  6. Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Statistics & Probability Letters 56:101-112MATHCrossRefMathSciNetGoogle Scholar
  7. Shao Q, Wu Y (2005) A consistent procedure for determining the number of clusters in regression clustering. Journal of Statistical Planning and Inference 135:461-476MATHCrossRefMathSciNetGoogle Scholar
  8. Wedel M, DeSarbo WS (1995) A mixture likelihood approach for generalized linear models. Journal of Classification 12:21-55MATHCrossRefGoogle Scholar

Copyright information

© Physica-Verlag Heidelberg 2008

Authors and Affiliations

  • Guoqi Qian
    • 1
  • C. Radhakrishna Rao
    • 2
  • Yuehua Wu
    • 3
  • Qing Shao
    • 4
  1. 1.Department of Mathematics and StatisticsUniversity of MelbourneAustralia
  2. 2.Department of StatisticsPenn State UniversityUniversity ParkUSA
  3. 3.Department of Mathematics and StatisticsYork UniversityTorontoCanada
  4. 4.Biostatistics and Statistical ReportingNovartis Pharmaceuticals CorporationEast HanoverUSA

Personalised recommendations