Abstract
In a longitudinal setup, the so-called generalized estimating equations approach was a popular inference technique to obtain efficient regression estimates until it was discovered that this approach may in fact yield less efficient estimates than an independence assumption-based estimating equation approach. In this paper, we revisit this inference issue in a semi-parametric longitudinal setup for binary data and find that the semi-parametric generalized estimating equations also encounter similar efficiency drawbacks when compared with independence assumption-based approach. This makes the generalized estimating equations approach unacceptable for correlated data analysis. We analyze the repeated binary data by fitting a semi-parametric binary dynamic model. The non-parametric function and the regression parameters involved in the semi-parametric regression function are estimated by using a semi-parametric generalized quasi-likelihood and a semi-parametric quasi-likelihood approach, respectively, whereas the dynamic dependence, that is, the correlation index parameter of the model is estimated by a semi-parametric method of moments. Asymptotic and finite sample properties of the estimators are discussed. The proposed model and the estimation methodology are also illustrated by reanalyzing the well-known respiratory disease data.
Similar content being viewed by others
References
Amemiya, T. (1985). Advanced Econometrics. Cambridge: Harvard University Press.
Bahadur, R.R. (1961). A representation of the joint distribution of responses to n dichotomous items. In Studies in Item Analysis and Prediction. Stanford Mathematical Studies in the Social Sciences; Solomon, H., Ed.; Vol. 6, 158-168.
Breiman, L. (1968). Probability. Addison-Wesley Pub. Co.
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of American Statistical Association, 88, 9–25.
Diggle, P.J., Liang, K.Y. and Zeger, S.L. (1994). Analysis of Longitudinal Data, Oxford, U. K.: Oxford University Press.
Horowitz, J.L. (2009). Semiparametric and Nonparametric Methods in Econometrics. New York: Springer.
Jowaheer, V. and Sutradhar, B.C. (2002). Analysing longitudinal count data with overdispersion. Biometrika, 89, 389–399.
Kaufmann, H. (1987). Regression models for nonstationary categorical time series: Asymptotic estimation theory. The Annals of Statistics, 15, 79–98.
Liang, K. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Lin, X. and Carroll, R.J. (2001). Semiparametric regression for clustered data using generalized estimating equations. J. Am. Statist. Ass., 96, 1045–1056.
Lin, X. and Carroll, R.J. (2006). Semi-parametric estimation in general repeated measures problems. Journal of Royal Statistical Society, 68, 69–88.
Pagan, A. and Ullah, A. (1999). Nonparametric Econometrics. Cambridge: Cambridge University Press.
Powell, J.L. and Stoker, T.M. (1996). Optimal bandwidth choice for density weighted averages. Journal of Econometrics, 75, 291–316.
Severini, T.A. and Staniswallis, J.G. (1994). Quasi-likelihood estimation in semiparametric models. Journal of American Statistical Association, 89, 501–511.
Sutradhar, B.C. (2003). An overview on regression models for discrete longitudinal responses. Statistical Science, 18, 377–393.
Sutradhar, B.C. (2011). Dynamic Mixed Models for Familial Longitudinal Data. New York: Springer.
Sutradhar, B.C. (2010). Inferences in generalized linear longitudinal mixed models. Canadian Journal of Statistics, Special issue, 38, 174–196.
Sutradhar, B. and Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika, 86, 459–65.
Sutradhar, B.C. and Kovacevic, M. (2000). Analysing ordinal longitudinal survey data : Generalized estimating equations approach. Biometrika, 87, 837–848.
Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics, 46, 657–71.
Warriyar, K.V.V. and Sutradhar, B.C. (2014). Estimation with improved efficiency in semi-parametric linear longitudinal models. Brazilian J. of Probability and Statistics, 28, 561–586.
Weddurburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61, 439–447.
Zeger, S.L. and Diggle, P.J. (1994). Semi-parametric Models for Longitudinal Data With Application to CD4 Cell Numbers in HIV Seroconverters. Biometrics, 50, 689– 699.
Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: A Gibbs sampling approach. Journal of American Statistical Association, 86, 79–86.
Zeger, S.L., Liang, K.Y. and SELF, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika, 72, 31–38.
Acknowledgements
This research was partially supported by a NSERC grant. The authors wish to thank two referees for their valuable comments and suggestions leading to the improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Asymptotic Properties of the Estimators of the SLDCP Model
Appendix: Asymptotic Properties of the Estimators of the SLDCP Model
The following regularity conditions and/or assumptions (A) are needed to study the asymptotic properties of the estimators of the non-parametric functionψ(⋅), and the parameters β and ρ, for the SLDCP model (2.1).
A.1.
The mean function μij(⋅) in model (2.1) is continuous and
A.2.
For i = 1,…,K, the estimating functions
from Eq. 3.12, with \(V^{*}_{K}={\sum }^{K}_{i = 1}\text {cov}[f_{i}(\beta )],\) satisfy the Lindeberg’s condition (Amemiya, 1985, Theorem 3.3.6), that is,
for all 𝜖 > 0,g(⋅) being the probability distribution of fi(β). [A proof for the Lindeberg condition in the context of categorical/binary time series data is available in Kaufmann (1987, pp. 89, 93)].
A.3.
Recall from the estimating equation (3.14) for ρ that the standardized residuals are defined as \(y_{ij}^{*} = [y_{ij} - \mu _{ij}(\beta ,x_{ij}, \psi (z_{ij}))]/\sqrt {\sigma _{i,jj}(\beta ,x_{ij}, \psi (z_{ij}))}\). For two fixed quantities M1 and M2, we now assume that the lag 1 sum of products and sum of squares have bounded variances satisfying
respectively.
-
Consistency of \(\hat {\psi }(\cdot )\)]
For convenience, in Eq. 3.6, we have shown the estimation for ψ(z0) for z0 ≡ zℓu for a selected value of ℓ(ℓ = 1,…,K) and u(u = 1,…,nℓ). For notational simplicity, here we use μij(z0) for μij(β,xij,ψ(z0)). Now for known β, and for true binary mean μij ≡ μij(β,xij,ψ(zij)) given by Eq. ??, a Taylor expansion of \(f(\psi (z_{0}),\beta )={\sum }_{i = 1}^{K} {\sum }^{n_{i}}_{j = 1}w_{ij}(z_{0}) [y_{ij}- \mu _{ij}(\beta ,x_{ij}, \psi (z_{0}))]\) (3.6) under the assumption A.1 gives
where
with \(p_{ij}(z_{0}) \equiv p_{ij}(\frac {z_{0}-z_{ij}}{b})\) as the kernel density defined in Eq. 3.4. Here, b ∝ K−α for a suitable α(Lin and Carroll 2001; Pagan and Ullah, 1999, p. 28). Notice that E[AK] = 0 and
with \(Q_{K} = \frac {1}{{B_{K}^{2}}}\frac {1}{K}{\sum }_{i = 1}^{K} \text {Var}\left [{\sum }_{j = 1}^{n_{i}}p_{ij}(z_{0}) Y_{ij}\right ] = O(1)\). It then follows, for example, from Amemiya (1985, Theorem 14.4-1) that
Next by using
after some algebras, the second term in Eq. 6.1 may be expressed as
because E[zij − z0] for a symmetric such as Gaussian kernel. Thus, by using Eqs. 6.3 and 6.2 in Eq. 6.1, one obtains
showing that \(\hat {\psi }(z_{0};\beta )\) is consistent for ψ(z0) provided Kb4 → 0 as K →∞, that is, \(K\frac {1}{K^{4\alpha }}=\frac {1}{K^{4\alpha -1}}\rightarrow 0\) yielding the condition α > 1/4. Note that this convergence result is obtained by minimizing the bias of the estimator(see Eq. 6.1).
-
Asymptotic Normality and Consistency of \(\hat {\beta }_{SGQL}\)
Recall that the SGQL estimator \(\hat {\beta }_{SGQL}\) of β is obtained by solving the estimating equation (3.12). Suppose that for true β, the estimating function in Eq. 3.12 is denoted as
Thus, \(\hat {\beta }_{SGQL}\) must satisfy \(D_{K}(\hat {\beta }_{SGQL})= 0\) which by a linear Taylor expansion about true β provides
yielding
where
Notice that in Eq. 6.6, one may write
Next by using \(Z_{1i} = \frac {\partial \tilde {\mu }_{i}^{\prime }(\beta , \hat {\psi }(z_{i};\beta ) )}{\partial \beta }\,{\tilde {{\Sigma }}_{i}}^{-1}(\beta , \hat {\psi }(z_{i}; \beta ), {\rho }),\) and \(v_{1i}^{jk}\) as the (j,k)th element of \({\tilde {{\Sigma }}_{i}}^{-1},\) the estimating function DK(β) in Eq. 6.6 may be expressed as
by Eq. 6.4, where \(Z_{2i} = \left (Z_{2i1},\cdots , Z_{2in_{i}}\right )\) with
where BK and the kernel density pij(z0) are defined in Eq. 6.1. Hence by using (6.7) in Eq. 6.6, one obtains
Now because E[Yi − μi] = 0, and cov[Yi] = Σi, by applying the assumption A.2, we can use the Lindeberg-Feller central limit theorem (Amemiya, 1985, Theorem 3.3.6) for independent random variables with non-identical distributions, and obtain
where
-
Consistency of \(\hat {\tilde {\rho }}\)
We prove the consistency for known β and ψ(zij). The result remains valid when β and ψ(zij) are replaced by their respective consistent estimates. Notice that for known β and ψ(zij), the moment estimator of ρ is given by Eq. 3.14. Now because Yij’s are independent for different i, and also because the variances of the lag 1 sum of products and sum of squares are bounded by assumption A.3, for K →∞, we may apply the law of large numbers for independent random variables (Breiman, 1968, Theorem 3.27) and obtain
Similarly,
Dividing (6.10) by Eq. 6.11, after some algebras, one obtains
The consistency for \(\hat {\tilde {\rho }}\) in Eq. 3.15 follows from Eq. 6.12 because of the fact that \(\hat {\tilde {\rho }}\) was constructed by putting consistent estimates forβ and ψ(zij) in the formula for \(\hat {\rho }\) in Eq. 3.14.
Rights and permissions
About this article
Cite this article
Sutradhar, B.C., Zheng, N. Inferences in Binary Dynamic Fixed Models in a Semi-parametric Setup. Sankhya B 80, 263–291 (2018). https://doi.org/10.1007/s13571-018-0160-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-018-0160-7
Keywords and phrases
- Dynamic models for repeated binary responses
- GEE approach in semi-parametric setup
- Non-parametric function in secondary covariates
- Parametric regression in primary covariates
- Semi-parametric quasi-likelihood and semi-parametric generalized quasi-likelihood estimation
- Time dependent covariates