Abstract
A gradient-like statistic, recently introduced as an influence measure, has been proven to work well in large sample, thanks to its asymptotic properties. In this work, through small-scale simulation schemes, the performance of such a diagnostic measure is further investigated in terms of concordance with the main influence measures used for outlier identification. The simulation studies are performed by using generalized linear mixed models (GLMMs).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bates, D., Maechler, M., Bolker, B.: lme4: Linear mixed-effects models using S4 classes. R package version 0.999999-2. http://CRAN.R-project.org/package=lme4 (2013)
Böstrom, G., Holmberg, H.: glmmML: Generalized linear models with clustering. R package version 0.82-1. http://CRAN.R-project.org/package=glmmML (2011)
Cook, R.D.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977)
Cook, R.D.: Assessment of Local Influence. J. R. Stat. Soc. B Met. 4(2), 133–169 (1986)
Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, London (1982)
Enea, M., Plaia, A.: Influence diagnostics for meta-analysis of individual patient data using generalized linear mixed models. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds.) Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, New York (2014)
Fahrmeier, L., Tutz, G.: Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York (1994)
Lemonte, A.J.: On the gradient statistic under model misspecification. Stat. Prob. Lett. 83, 390–398 (2013)
McCulloch, C.E.: Maximum likelihood algorithm for generalized linear mixed models: applications to clustered data. J. Am. Stat. Assoc. 92, 162–170 (1997)
McCulloch, C.E., Searle, S.R.: Generalized, Linear, and Mixed Models. Wiley, New York (2001)
Ouwens, M.J.N.M., Tan, F.E.S., Berger, M.P.F.: Local influence to detect influential data structures for generalized linear mixed models. Biometrics 57(42), 1166–1172 (2001)
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2012) [ISBN 3-900051-07-0]
Terrell, G.R.: The gradient statistic. Comput. Sci. Stat. 34, 206–215 (2002)
Xiang, L., Tse, S.-K., Lee A. H.: Influence diagnostics for generalized linear mixed models: applications to clustered data. Comput. Stat. Data Anal. 40, 759–774 (2002)
Xu, L., Lee, S., Poon, W.: Deletion measures for generalized linear mixed models. Comput. Stat. Data Anal. 51, 1131–1146 (2006)
Zhu, H., Lee, S., Wei, B., Zhou, J.: Case-deletion measures for models with incomplete data. Biometrika. 88(3), 727–737 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: The following R [12] code allows
Appendix: The following R [12] code allows
The following code allows to perform cluster-level influence diagnostics from an object returned by glmer for binomial or Poisson random intercept models. Currently, the code works under lme4 version 0.999999-2. At time of writing, package lme4 was updated to version 1.0-5, but some bugs, concerning the conditional variances of the random effects, are not fixed yet. Further, as it has been explained by Enea and Plaia [6], the information matrix, which is necessary to perform the diagnostics using C i and CD i , can be obtained from package glmmML [2], which uses the same estimation method and provides the same estimates of lme4. A more complete code allowing diagnostics at the observation level, for random intercept/slopes models and for specified parameter subsets, here not reported due to space limits, can be requested to the authors.
influence.mer <- function(obj,H=NULL){
options(warn=-1)
parf <- obj@fixef
nparf <- length(parf)
oneresp <- is.null(ncol(obj@frame[[1]]))
Y <- (if (oneresp) obj@frame[[1]] else obj@frame[[1]][,1])
m <- if(oneresp) rep(1,length(Y)) else rowSums(obj@frame[[1]])
nobs <- as.vector(table(obj@flist[,ncol(obj@flist)]))
iclus <- obj@flist[,ncol(obj@flist)]
clus <- levels(iclus)
nclus <- length(clus)
logLik1 <- logLik(obj)[1]
delta <- VarCorr(obj)[[1]]
names(delta) <- "delta"
psi <- c(parf,delta)
bi <- ranef(obj,postVar=TRUE)[[1]]
Di <- c()
for (i in 1:nclus) Di[i]<-(attributes(bi)\(postVar[,,i]+bi[i,]^2)/(2*delta^2)
E <- Y-fitted(obj)*m
logLik2 <-c()
offset <- if (length(obj@offset)>0) exp(obj@offset) else rep(1,length(Y))
sDelta <- matrix(,nclus,nparf)
Dpsi <- matrix(,nclus,length(psi))
for (j in 1:nclus){
yes <- (iclus==clus[j])
sDelta[j,] <- crossprod(obj@X[yes,],E[yes])
newobj <- update(obj,data=obj@frame[!yes,])
deltai <- VarCorr(newobj)[[1]]
Dpsi[j,] <- psi-c(fixef(newobj),deltai)
logLik2[j] <- logLik(update(obj,data=obj@frame,start=list(ST=newobj@ST,
fixef=fixef(newobj)),control=list(maxFN=0,maxIter=0)))[1]
}
Delta <- cbind(sDelta,Di)
DD <- Delta*Dpsi
sGD <- 2*abs(DD)
GD <- 2*abs(rowSums(DD))
colnames(sGD) <- colnames(Delta) <- colnames(Dpsi) <- names(psi)
Ci <- if (!is.null(H)) 2*diag(abs(Delta%*%solve(H)%*%t(Delta))) else NULL
CDi <- if (!is.null(H)) diag(Dpsi%*%H%*%t(Dpsi)) else NULL
return(list("GDi"=GD,"LDi"=2*abs(logLik1-logLik2),"Ci"=Ci,"CDi"=CDi))
}
library(lme4)
library(glmmML)
library(mvtnorm)
simul.pois <- function(j,n,param){ #create an artificial data set
pa <- as.vector(rmvnorm(j,c(0,0),matrix(a,2,2)))
clus <- kronecker(1:j,rep(1,n))
x <- rep((1:n)/n ,j)
resp <- rpois(n*j,lambda=exp(param[1]+param[2]*x+cbind(kronecker(diag(j),
rep(1,n)),kronecker(diag(j),(1:n/n)))%*%pa ))
data.frame(clus,x,resp)
}
a <- c(1,0.5,0.5,1) #for variance/covariance components
dad <- simul.pois(j=10,n=30,param=c(1,-1,a))
m0 <- glmer(resp ~ x + (1|clus),data=dad, family=poisson, x=TRUE)
m0b <- glmmML(resp ~ x, cluster=clus,data=dad, family=poisson)
r0 <- influence.mer(obj=m0,H=solve(m0b\)variance))
r01 <- r0
r01$Ci <- r01$Ci/2
r01$GDi <- r01$GDi/2 #GDi is the Gradient-like influence measure
r01 <- do.call("cbind",r01)
matplot(r01,lty=1:4,type="l",col=1:4,ylab="influence",xlab="cluster index")
legend("topright",c("GDi/2","LRi","Ci/2","CDi"),lty=1:4,col=1:4)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Enea, M., Plaia, A. (2015). The Performance of the Gradient-Like Influence Measure in Generalized Linear Mixed Models. In: Morlini, I., Minerva, T., Vichi, M. (eds) Advances in Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-17377-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-17377-1_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17376-4
Online ISBN: 978-3-319-17377-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)