An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient

Benton, Tom

doi:10.1007/978-3-319-07503-7_19

Tom Benton⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 89))

2355 Accesses
12 Citations

Abstract

Numerous alternative indices for test reliability have been proposed as being superior to Cronbach’s alpha. One such alternative is Guttman’s L4. This is calculated by dividing the items in a test into two halves such that the covariance between scores on the two halves is as high as possible. However, although simple to understand and intuitively appealing, the method can potentially be severely positively biased if the sample size is small or the number of items in the test is large.

To begin with this paper compares a number of available algorithms for calculating L4. We then empirically evaluate the bias of L4 for 51 separate upper secondary school examinations taken in the UK in June 2012. For each of these tests we have evaluated the likely bias of L4 for a range of different sample sizes. The results show that the positive bias of L4 is likely to be small if the estimated reliability is larger than 0.85, if there are less than 25 items and if a sample size of more than 3,000 is available. A sample size of 1,000 may be sufficient if the estimate of L4 is above 0.9.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although most subsequent literature refers to this reliability index as “Guttman’s”, this same coefficient was presented in an earlier work by Rulon (1939). As such it is also sometimes referred to as the “Flanagan-Rulon” coefficient.
2.
Hunt (2013).
3.
Revelle (2013).
4.
Although the functions for finding L4 were not introduced until 2009.
5.
The code in the appendix also applies to adjustments proposed by Raju (1977) and Feldt (1975) for cases where the split halves may be of unequal length.
6.
Hadamard matrices are generated using the survey package published by Thomas Lumley and available from http://cran.r-project.org/web/packages/survey/index.html (Lumley 2004).
7.
Whole question scores were analysed for the purposes of calculating reliability rather than items from the same question stem. This was to avoid the possibility of irrelevant associations between item scores within the same question spuriously inflating the reliability estimate.
8.
The same analysis was also run with unstandardized item scores. The results were very similar.
9.
This time without standardising item scores before beginning.
10.
The intercept is referred to as the “additive coefficient” in the report by Verhelst.

References

Brennan R (2001) An essay on the history and future of reliability from the perspective of replications. J Educ Meas 38:295–317
Article Google Scholar
Callender J, Osburn H (1977) A method for maximizing and cross-validating split-half reliability coefficients. Educ Psychol Meas 37:819–826
Article Google Scholar
Cronbach L (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16:297–334
Article Google Scholar
Feldt L (1975) Estimation of the reliability of a test divided into two parts of unequal length. Psychometrika 40:557–561
Article MATH Google Scholar
Guttman L (1945) A basis for analysing test-retest reliability. Psychometrika 10:255–282
Article MATH MathSciNet Google Scholar
Hunt T (2013) Lambda4: collection of internal consistency reliability coefficients. R package version 3.0. http://CRAN.R-project.org/package=Lambda4
Lumley T (2004) Analysis of complex survey samples. J Statist Softw 9:1–19
Google Scholar
Raju N (1977) A generalization of coefficient alpha. Psychometrika 42:549–565
Article MATH MathSciNet Google Scholar
Revelle W (2013) Psych: procedures for personality and psychological research. Northwestern University, Evanston. http://CRAN.R-project.org/package=psych
Revelle W, Zinbarg R (2009) Coefficients alpha, beta, omega, and the glb: comments on Sijtsma. Psychometrika 74:145–154
Article MATH MathSciNet Google Scholar
Rulon P (1939) A simplified procedure for determining the reliability of a test by split-halves. Harv Educ Rev 9:99–103
Google Scholar
Sijtsma K (2009) On the use, the misuse and the very limited usefulness of Cronbach’s alpha. Psychometrika 74:107–120
Article MATH MathSciNet Google Scholar
Ten Berge J, Socan G (2004) The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika 69:613–625
Article MATH MathSciNet Google Scholar
Verhelst N (2000) Estimating the reliability of a test from a single test administration. CITO, Arnhem. http://www.cito.com/en/research_and_development/psychometrics/~/media/cito_com/research_and_development/publications/cito_report98_2.ashx

Download references

Author information

Authors and Affiliations

Cambridge Assessment, 1 Hills Rd, Cambridge, CB1 2EU, UK
Tom Benton

Authors

Tom Benton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Benton .

Editor information

Editors and Affiliations

Department of Psychology, Arizona State University, Tempe, Arizona, USA
Roger E. Millsap
Dept. of Educational Psychology, University of Wisconsin, Madison, USA
Daniel M. Bolt
University of Amsterdam, Amsterdam, The Netherlands
L. Andries van der Ark
Department of Psychological Studies, The Hong Kong Institute of Education, Hong Kong, Hong Kong SAR
Wen-Chung Wang

Appendix: R Code to Find Best Split Using the “Start-Then-Improve” Algorithm

#Function to find best split half from a given starting split
MaxSplitHalf = function(data,xal){
#data – matrix of items scores (row=candidates,column=items)
#xal – vector of 0s and 1s specifying initial split
nite = ncol(data)
cov1 = cov(data)
v = diag(cov1)
yal = 1-xal
ones = rep(1,nite)
covxy = t(xal)%*%cov1%*%yal

#Code to examine all possible swaps
maxchg1=9;
while(maxchg1>0){
#Calculate change for swapping items in X and Y;
#This is equal to 2covxiyj+covxix+covyyj-vx-vy-covxiy-covxyj;
covxiyj = cov1
covxix = (cov1%*%xal)%*%t(ones)
covyyj = ones%*%(yal%*%cov1)
vx = v%*%t(ones)
vy = t(vx)
covxiy = (cov1%*%yal)%*%t(ones)
covxyj = ones%*%(xal%*%cov1)
result = 2*covxiyj+covxix+covyyj-vx-vy-covxiy-covxyj
for (i in 1:nite){for (j in 1:nite){if (xal[i]==xal[j])
{result[i,j]=0}}}
#Add bits for swapping with no other item
result = cbind(result,as.vector(cov1%*%xal-cov1%*%yal-v)*xal)
result = rbind(result,c(as.vector(cov1%*%yal-cov1%*%xal-v)*yal,0))
#find indices of maximum change;
maxchg=0
maxchgx=0
maxchgy=0
which1=which(result==max(result),arr.ind=TRUE)[1,]
if (result[which1[1],which1[2]]>0){maxchgx=which1[1]
maxchgy=which1[2]
maxchg=result[which1[1],which1[2]]}
maxchg1 = maxchg
if (maxchgx>0 & maxchgx<(nite+1)) {xal[maxchgx]=0}
if (maxchgy>0 & maxchgy<(nite+1)) {xal[maxchgy]=1}
if (maxchgx>0 & maxchgx<(nite+1)) {yal[maxchgx]=1}
if (maxchgy>0 & maxchgy<(nite+1)) {yal[maxchgy]=0}
covxy = t(xal)%*%cov1%*%yal}

guttman = 4*covxy/sum(cov1)
pites = sum(xal)/nite
raju = covxy/(sum(cov1)*pites*(1-pites))

v1 = t(xal)%*%cov1%*%xal
v2 = t(yal)%*%cov1%*%yal
feldt = 4*covxy/(sum(cov1)-((v1-v2)/sqrt(sum(cov1)))**2);

res = list(guttman=as.vector(guttman),
raju=as.vector(raju),
feldt=as.vector(feldt),
xal=xal)
return(res)}

#Maximise L4 starting from odd/even and 12 splits from 12x12 Hadamard matrix
library(survey)
MaxSplitHalfHad12 = function(data){
#data – matrix of items scores (row=candidates,column=items)
#start with odd vs even
nite = ncol(data)
sequence = 1:nite
xal = (sequence%%2)
res1 = MaxSplitHalf(data,xal)
#now try 12 further splits based on 12*12 Hadamard matrix
had = hadamard(11)
for (iz in 1:12){
nextra = max(nite-12,0)
resrand = MaxSplitHalf(data,c(had[,iz],rep(0,nextra))[1:nite])
if (resrand$guttman>res1$guttman){res1 = resrand}}
return(res1)}

#Maximise using exhaustive search
library(Lambda4)
MaxSplitExhaustive = function(data){
#data – matrix of items scores (row=candidates,column=items)
cov1 = cov(data)
nite = dim(data)[2]
mat1 = (bin.combs(nite)+1)/2
res1 = list(guttman=0,xal=rep(-99,nite))
for (jjz in 1:length(mat1[,1])){
xal = mat1[jjz,]
gutt1 = 4*(t(xal)%*%cov1%*%(1-xal))/sum(cov1)
resrand = list(guttman=gutt1,xal=xal)
if (resrand$guttman>res1$guttman){res1 = resrand}}
return(res1)}

#Examples of use (using data from the Lambda4 package)
data(Rosenberg)
MaxSplitHalf(Rosenberg,c(0,1,0,1,0,1,0,1,0,1))
MaxSplitHalfHad12(Rosenberg)
MaxSplitExhaustive(Rosenberg)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benton, T. (2015). An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-07503-7_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: R Code to Find Best Split Using the “Start-Then-Improve” Algorithm

Appendix: R Code to Find Best Split Using the “Start-Then-Improve” Algorithm

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation