Abstract
Numerous alternative indices for test reliability have been proposed as being superior to Cronbach’s alpha. One such alternative is Guttman’s L4. This is calculated by dividing the items in a test into two halves such that the covariance between scores on the two halves is as high as possible. However, although simple to understand and intuitively appealing, the method can potentially be severely positively biased if the sample size is small or the number of items in the test is large.
To begin with this paper compares a number of available algorithms for calculating L4. We then empirically evaluate the bias of L4 for 51 separate upper secondary school examinations taken in the UK in June 2012. For each of these tests we have evaluated the likely bias of L4 for a range of different sample sizes. The results show that the positive bias of L4 is likely to be small if the estimated reliability is larger than 0.85, if there are less than 25 items and if a sample size of more than 3,000 is available. A sample size of 1,000 may be sufficient if the estimate of L4 is above 0.9.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although most subsequent literature refers to this reliability index as “Guttman’s”, this same coefficient was presented in an earlier work by Rulon (1939). As such it is also sometimes referred to as the “Flanagan-Rulon” coefficient.
- 2.
Hunt (2013).
- 3.
Revelle (2013).
- 4.
Although the functions for finding L4 were not introduced until 2009.
- 5.
- 6.
Hadamard matrices are generated using the survey package published by Thomas Lumley and available from http://cran.r-project.org/web/packages/survey/index.html (Lumley 2004).
- 7.
Whole question scores were analysed for the purposes of calculating reliability rather than items from the same question stem. This was to avoid the possibility of irrelevant associations between item scores within the same question spuriously inflating the reliability estimate.
- 8.
The same analysis was also run with unstandardized item scores. The results were very similar.
- 9.
This time without standardising item scores before beginning.
- 10.
The intercept is referred to as the “additive coefficient” in the report by Verhelst.
References
Brennan R (2001) An essay on the history and future of reliability from the perspective of replications. J Educ Meas 38:295–317
Callender J, Osburn H (1977) A method for maximizing and cross-validating split-half reliability coefficients. Educ Psychol Meas 37:819–826
Cronbach L (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16:297–334
Feldt L (1975) Estimation of the reliability of a test divided into two parts of unequal length. Psychometrika 40:557–561
Guttman L (1945) A basis for analysing test-retest reliability. Psychometrika 10:255–282
Hunt T (2013) Lambda4: collection of internal consistency reliability coefficients. R package version 3.0. http://CRAN.R-project.org/package=Lambda4
Lumley T (2004) Analysis of complex survey samples. J Statist Softw 9:1–19
Raju N (1977) A generalization of coefficient alpha. Psychometrika 42:549–565
Revelle W (2013) Psych: procedures for personality and psychological research. Northwestern University, Evanston. http://CRAN.R-project.org/package=psych
Revelle W, Zinbarg R (2009) Coefficients alpha, beta, omega, and the glb: comments on Sijtsma. Psychometrika 74:145–154
Rulon P (1939) A simplified procedure for determining the reliability of a test by split-halves. Harv Educ Rev 9:99–103
Sijtsma K (2009) On the use, the misuse and the very limited usefulness of Cronbach’s alpha. Psychometrika 74:107–120
Ten Berge J, Socan G (2004) The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika 69:613–625
Verhelst N (2000) Estimating the reliability of a test from a single test administration. CITO, Arnhem. http://www.cito.com/en/research_and_development/psychometrics/~/media/cito_com/research_and_development/publications/cito_report98_2.ashx
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: R Code to Find Best Split Using the “Start-Then-Improve” Algorithm
Appendix: R Code to Find Best Split Using the “Start-Then-Improve” Algorithm
-
#Function to find best split half from a given starting split
-
MaxSplitHalf = function(data,xal){
-
#data – matrix of items scores (row=candidates,column=items)
-
#xal – vector of 0s and 1s specifying initial split
-
nite = ncol(data)
-
cov1 = cov(data)
-
v = diag(cov1)
-
yal = 1-xal
-
ones = rep(1,nite)
-
covxy = t(xal)%*%cov1%*%yal
-
#Code to examine all possible swaps
-
maxchg1=9;
-
while(maxchg1>0){
-
#Calculate change for swapping items in X and Y;
-
#This is equal to 2covxiyj+covxix+covyyj-vx-vy-covxiy-covxyj;
-
covxiyj = cov1
-
covxix = (cov1%*%xal)%*%t(ones)
-
covyyj = ones%*%(yal%*%cov1)
-
vx = v%*%t(ones)
-
vy = t(vx)
-
covxiy = (cov1%*%yal)%*%t(ones)
-
covxyj = ones%*%(xal%*%cov1)
-
result = 2*covxiyj+covxix+covyyj-vx-vy-covxiy-covxyj
-
for (i in 1:nite){for (j in 1:nite){if (xal[i]==xal[j])
-
{result[i,j]=0}}}
-
#Add bits for swapping with no other item
-
result = cbind(result,as.vector(cov1%*%xal-cov1%*%yal-v)*xal)
-
result = rbind(result,c(as.vector(cov1%*%yal-cov1%*%xal-v)*yal,0))
-
#find indices of maximum change;
-
maxchg=0
-
maxchgx=0
-
maxchgy=0
-
which1=which(result==max(result),arr.ind=TRUE)[1,]
-
if (result[which1[1],which1[2]]>0){maxchgx=which1[1]
-
maxchgy=which1[2]
-
maxchg=result[which1[1],which1[2]]}
-
maxchg1 = maxchg
-
if (maxchgx>0 & maxchgx<(nite+1)) {xal[maxchgx]=0}
-
if (maxchgy>0 & maxchgy<(nite+1)) {xal[maxchgy]=1}
-
if (maxchgx>0 & maxchgx<(nite+1)) {yal[maxchgx]=1}
-
if (maxchgy>0 & maxchgy<(nite+1)) {yal[maxchgy]=0}
-
covxy = t(xal)%*%cov1%*%yal}
-
guttman = 4*covxy/sum(cov1)
-
pites = sum(xal)/nite
-
raju = covxy/(sum(cov1)*pites*(1-pites))
-
v1 = t(xal)%*%cov1%*%xal
-
v2 = t(yal)%*%cov1%*%yal
-
feldt = 4*covxy/(sum(cov1)-((v1-v2)/sqrt(sum(cov1)))**2);
-
res = list(guttman=as.vector(guttman),
-
raju=as.vector(raju),
-
feldt=as.vector(feldt),
-
xal=xal)
-
return(res)}
-
#Maximise L4 starting from odd/even and 12 splits from 12x12 Hadamard matrix
-
library(survey)
-
MaxSplitHalfHad12 = function(data){
-
#data – matrix of items scores (row=candidates,column=items)
-
#start with odd vs even
-
nite = ncol(data)
-
sequence = 1:nite
-
xal = (sequence%%2)
-
res1 = MaxSplitHalf(data,xal)
-
#now try 12 further splits based on 12*12 Hadamard matrix
-
had = hadamard(11)
-
for (iz in 1:12){
-
nextra = max(nite-12,0)
-
resrand = MaxSplitHalf(data,c(had[,iz],rep(0,nextra))[1:nite])
-
if (resrand$guttman>res1$guttman){res1 = resrand}}
-
return(res1)}
-
#Maximise using exhaustive search
-
library(Lambda4)
-
MaxSplitExhaustive = function(data){
-
#data – matrix of items scores (row=candidates,column=items)
-
cov1 = cov(data)
-
nite = dim(data)[2]
-
mat1 = (bin.combs(nite)+1)/2
-
res1 = list(guttman=0,xal=rep(-99,nite))
-
for (jjz in 1:length(mat1[,1])){
-
xal = mat1[jjz,]
-
gutt1 = 4*(t(xal)%*%cov1%*%(1-xal))/sum(cov1)
-
resrand = list(guttman=gutt1,xal=xal)
-
if (resrand$guttman>res1$guttman){res1 = resrand}}
-
return(res1)}
-
#Examples of use (using data from the Lambda4 package)
-
data(Rosenberg)
-
MaxSplitHalf(Rosenberg,c(0,1,0,1,0,1,0,1,0,1))
-
MaxSplitHalfHad12(Rosenberg)
-
MaxSplitExhaustive(Rosenberg)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Benton, T. (2015). An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-07503-7_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)