Skip to main content
Log in

Bootstrap-Calibrated Interval Estimates for Latent Variable Scores in Item Response Theory

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys’ prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We only consider response pattern scoring here; however, our discussions can be extended to IRT scoring based on summed scores.

  2. The intervals discussed in the current work may be labeled confidence intervals, credible intervals, or prediction intervals depending on the context and one’s philosophy toward statistical inference. For simplicity, we use the unified name “interval estimate/estimator.” The interpretation of an interval estimate should be determined by the inferential target and the definition of coverage probability.

  3. When Eq. 4 is viewed as a function of \(\varvec{\xi }\), it is typically referred to as the marginal likelihood.

  4. For notational succinctness, we use \(\alpha \) for coverage in the current work; by convention, however, \(1 - \alpha \) is typically used. As a result, the \([(1+\alpha )/2]\)th quantile in our notation is the same as the \((1-\alpha /2)\)th quantile in the conventional notation.

  5. When the parameter space is unbounded, the ML estimates can be infinite and the corresponding posterior cdf may have jumps. Those irregular cases often happen with only exponentially small probability and can be removed from the calculation of \(C(\cdot ;\varvec{\xi }_0)\) and \(C^{-1}(\cdot ;\varvec{\xi }_0)\).

  6. A direct calculation of the expected information is not viable because there are \(2^{36} \approx 6.87 \times 10^{10}\) response patterns. The observed information is a consistent estimator of the expected information.

  7. Fitting the full-rank 3PL model with unconstrained pseudo-guessing parameters in small samples proves to be challenging (e.g., Han, 2012). The constrained version, however, seldom caused any convergence issue in our simulation study even when the sample size is only 250.

  8. \(N_0\) and the truncated ML estimator remain unchanged.

References

  • Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269.

    Article  Google Scholar 

  • Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Barndorff-Nielsen, O. E., & Cox, D. R. (1996). Prediction and asymptotics. Bernoulli, 2(4), 319–340.

    Article  Google Scholar 

  • Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis. London: Edward Arnold (Kendall’s Library of Statistics 7).

  • Beran, R. (1990). Calibrating prediction regions. Journal of the American Statistical Association, 85(411), 715–723.

    Article  Google Scholar 

  • Birch, M. W. (1964). A new proof of the pearson-fisher theorem. The Annals of Mathematical Statistics, 35(2), 817–824.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bishop, Y., Fienberg, S., & Holland, P. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

  • Bock, R. D., & Lieberman, M. (1970). Fitting a response model for \(n\) dichotomously scored items. Psychometrika, 35(2), 179–197.

    Article  Google Scholar 

  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive eap estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444.

    Article  Google Scholar 

  • Bolt, D. M. (2005). Limited and full-information IRT estimation. In A. Maydeu-Olivares & J. McArdle (Eds.), Contemporary psychometrics (pp. 27–71). New Jersey: Lawrence-Erlbaum.

    Google Scholar 

  • Brent, R. P. (1973). Some efficient algorithms for solving systems of nonlinear equations. SIAM Journal on Numerical Analysis, 10(2), 327–344.

    Article  Google Scholar 

  • Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2), 101–117.

    Google Scholar 

  • Brown, L. D., Cai, T. T., & Dasgupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. Annals of Statistics, 30(1), 160–201.

    Article  Google Scholar 

  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for windows. Lincolnwood, IL: Scientific Software International.

  • Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective. Boca-Raton, FL: CRC press.

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. Retrieved from http://www.jstatsoft.org/v48/i06/.

  • Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52.

    Article  Google Scholar 

  • Cheng, Y., & Yuan, K.-H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280–291.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4), 493–507.

    Article  Google Scholar 

  • Cox, C. (1984). An elementary introduction to maximum likelihood estimation for multinomial models: Birch’s theorem and the delta method. The American Statistician, 38(4), 283–287.

    Google Scholar 

  • Cox, D. R., & Snell, E. J. (1968). A general definition of residuals. Journal of the Royal Statistical Society: Series B (Methodological), 30(2), 248–275.

    Google Scholar 

  • Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14(2), 81–100.

    Article  PubMed  PubMed Central  Google Scholar 

  • Datta, G. S., & Mukerjee, R. (2004). Probability matching priors: Higher order asymptotics. New York: Springer.

    Book  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Fonseca, G., Giummolè, F., & Vidoni, P. (2014). Calibrating predictive distributions. Journal of Statistical Computation and Simulation, 84(2), 373–383.

    Article  Google Scholar 

  • Haberman, S. J. (2006). Adaptive quadrature for item response models. Technical report no. 06-29, Educational Testing Service, Princeton, NJ.

  • Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation. http://pareonline.net/getvn.asp?v=17&n=1

  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.

    Article  Google Scholar 

  • Houts, C. R., & Cai, L. (2013). flexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software manual]. Chapel Hill, NC: Vector Psychometric Group.

    Google Scholar 

  • Irwin, D. E., Stucky, B., Langer, M. M., Thissen, D., DeWitt, E. M., Lai, J.-S., et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19(4), 595–607.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. In Proceedings of the royal society of London A: Mathematical, physical and engineering sciences (Vol. 186, pp. 453–461).

  • Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction (pp. 362–412). New York: Wiley.

    Google Scholar 

  • Le Cam, L., & Yang, G. L. (2000). Asymptotics in statistics: Some basic concepts (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Lehmann, E., & Casella, G. (1998). Theory of point estimation (2nd ed.). Berlin: Springer.

    Google Scholar 

  • Liu, Y., & Hannig, J. (2016). Generalized fiducial inference for binary logistic item response models. Psychometrika, 81(2), 290–324.

    Article  PubMed  Google Scholar 

  • Liu, Y., & Hannig, J. (2017). Generalized fiducial inference for logistic graded response models. Psychometrika. doi:10.1007/s11336-017-9554-0.

  • Magnus, B. E., Liu, Y., He, J., Quinn, H., Thissen, D., Gross, H. E., et al. (2016). Mode effects between computer self-administration and telephone interviewer-administration of the PROMIS pediatric measures, self-and proxy report. Quality of Life Research, 25, 1655–1665.

    Article  PubMed  PubMed Central  Google Scholar 

  • McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34(1), 100–117.

    Article  Google Scholar 

  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092.

    Article  Google Scholar 

  • Mislevy, R. J., Wingersky, M., & Sheehan, K. M. (1993). Dealing with uncertainty about item parameters: Expected response functions. Technical report no. 94-28, Educational Testing Service, Princeton, NJ.

  • Muenks, K., Wigfield, A., Yang, J. S., & O’Neal, C. (2017). How true is grit? Assessing its relations to high school and college students’ personality characteristics, self-regulation, engagement, and achievement. Journal of Educational Psychology, 109(5), 599–620.

  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.

    Article  Google Scholar 

  • Muthén, B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29(1), 81–117.

    Article  Google Scholar 

  • Noel, Y., & Dauvier, B. (2007). A beta item response model for continuous bounded responses. Applied Psychological Measurement, 31(1), 47–73.

    Article  Google Scholar 

  • Patton, J. M., Cheng, Y., Yuan, K.-H., & Diao, Q. (2013). The influence of item calibration error on variable-length computerized adaptive testing. Applied Psychological Measurement, 37, 24–40.

    Article  Google Scholar 

  • Patton, J. M., Cheng, Y., Yuan, K.-H., & Diao, Q. (2014). Bootstrap standard errors for maximum likelihood ability estimates when item parameters are unknown. Educational and Psychological Measurement, 74(4), 697–712.

    Article  Google Scholar 

  • Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.

    Article  Google Scholar 

  • R Core Team. (2016). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.

  • Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.

  • Rousseau, J. (2000). Coverage properties of one-sided intervals in the discrete case and application to matching priors. Annals of the Institute of Statistical Mathematics, 52(1), 28–42.

    Article  Google Scholar 

  • Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3–38.

    Google Scholar 

  • Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. In Psychometrika monograph no. 17. Richmond, VA: Psychometric Society.

  • San Martín, E. (2016). Identification of item response theory models. In W. J. van der Linden (Ed.), Handbook of item response theory, volume two: Statistical tools (pp. 127–150). Boca Raton: CRC Press.

    Google Scholar 

  • San Martín, E., & De Boeck, P. (2015). What do you mean by a difficult item? On the interpretation of the difficulty parameter in a Rasch model. In R. Millsap, D. Bolt, L. van der Ark, & W.-C. Wang (Eds.), Quantitative psychology research (pp. 1–14). Berlin: Springer.

    Google Scholar 

  • San Martín, E., Rolin, J.-M., & Castro, L. M. (2013). Identification of the 1PL model with guessing parameter: Parametric and semi-parametric results. Psychometrika, 78(2), 341–379.

    Article  PubMed  Google Scholar 

  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70(3), 533–555.

    Google Scholar 

  • Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237–247.

    Article  Google Scholar 

  • Skrondal, A., & Rabe-Hasketh, S. (2004). Generalized latent variable modeling. Boca Raton, FL: Chapman & Hall. (Interdisciplinary Statistics Series).

    Book  Google Scholar 

  • Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 148–177). London: Sage Publications.

    Chapter  Google Scholar 

  • Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

  • Vidoni, P. (1998). A note on modified estimative prediction limits and distributions. Biometrika, 85(4), 949–953.

    Article  Google Scholar 

  • Vidoni, P. (2009). Improved prediction intervals and distribution functions. Scandinavian Journal of Statistics, 36(4), 735–748.

    Article  Google Scholar 

  • Welch, B., & Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. Journal of the Royal Statistical Society: Series B (Methodological), 25(2), 318–329.

    Google Scholar 

  • Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281–301.

    Article  PubMed  Google Scholar 

  • Wood, R., Wilson, D. T., Gibbons, R. D., Schilling, S. G., Muraki, E., & Bock, R. D. (2003). TESTFACT 4 for windows: Test scoring, item statistics, and full-information item factor analysis [Computer software]. Lincolnwood, IL: Scientific Software International.

  • Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to Dr. David Thissen from the Department of Psychology at the University of North Carolina at Chapel Hill for his advice and feedback on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Appendix A. Theoretical Details

Appendix A. Theoretical Details

1.1 A.1. The ML Estimator

Let \(\mathbf{f}({\varvec{\xi }})\) be the vector of all response pattern probabilities (with elements \(f(\mathbf{y}_i;{\varvec{\xi }})\) defined in Eq. 4), and \(\mathbf p\) denote the corresponding observed proportions. Consider a calibration sample of n i.i.d. item response patterns \(\mathbf{Y}_1,\dots ,\mathbf{Y}_n\), each of which is generated from the unidimensional IRT model characterized by \(\mathbf{f}({\varvec{\xi }_0})\). In some neighborhood of \(\varvec{\xi }_0\), denoted \(U_0\), suppose that standard regularity conditions (Birch, 1964; Bishop et al., 1975; Cox, 1984) hold, and that \(\mathbf{f}({\varvec{\xi }})\) has continuous fourth partial derivatives, then there exists a four-time continuously differentiable function \({\varvec{\xi }}(\cdot )\) such that \({\varvec{\xi }}_0 = {\varvec{\xi }}(\mathbf{f}(\varvec{\xi }_0))\) and the ML estimator \(\hat{\varvec{\xi }}={\varvec{\xi }}(\mathbf{p})\) for all admissible probability vector \(\mathbf{p}\in N_0\) where \(N_0\) is some neighborhood of \(\mathbf{f}({\varvec{\xi }_0})\). As n tends to infinity, \(\mathbf p\) concentrates on \(N_0\) exponentially fast: It follows from Hoeffding’s (1963) inequality that

$$\begin{aligned} P_{\varvec{\xi }_0}^\mathbf{Y}\{\mathbf{p}\notin N_0\}\le B\exp (-cn) \end{aligned}$$
(17)

for some positive constants B and c.

The following equation, which resembles Beran’s (1990) Assumption A(4), can be verified by a Taylor series expansion argument similar to Lehmann and Casella (1998, p. 430):

$$\begin{aligned} E_{\varvec{\xi }_0}^\mathbf{Y}\left[ \left( \varvec{\xi }(\mathbf{p}) - \varvec{\xi }_0\right) ^\mathbf{r}{\mathbb I}\{\mathbf{p}\in N_0\}\right] = \sum ^2_{s = \lfloor (|\mathbf{r}| + 1)/2\rfloor }n^{-s} b_{\mathbf{r},s}({\varvec{\xi }}_0)+ o(n^{-2}),\quad |\mathbf{r}|=1,\dots ,4. \end{aligned}$$
(18)

In Eq. 18, the Q-tuple of nonnegative integers \(\mathbf{r} = (r_1,\dots ,r_q){}^\top \) serves as a multi-index such that \(\varvec{\xi }^\mathbf{r} = \xi _1^{r_1}\cdots \xi _q^{r_q}\) for any q-dimensional vector \(\varvec{\xi } = (\xi _1,\dots ,\xi _q){}^\top \), and \(|\mathbf{r}| = r_1+\cdots +r_q\). \({\mathbb I}\{\cdot \}\) denotes the indicator function. The functions \(b_{\mathbf{r},s}({\varvec{\xi }}_0)\) for all \(\mathbf r\) and s are related to the partial derivatives of \(\varvec{\xi }(\cdot )\) and the moments of \(\mathbf p\) up to the fourth order. Equation 18 indicates that the first and second moments of the truncated ML estimator are of order \(O(n^{-1})\), and that the third and fourth moments are of order \(O(n^{-2})\).

1.2 A.2. The Plug-in Method

Let the cdf \(G(\cdot ;{\varvec{\xi }})\) be continuous for all \(\varvec{\xi }\) in the closure of the parameter space. In particular, we assume that \(G(\theta ; \varvec{\xi })\) is strictly monotonic in \(\theta \) and four-time continuously differentiable with respect to both \(\theta \) and \(\varvec{\xi }\) for all \(\theta \in \mathbb {R}\) and \(\varvec{\xi }\in U_0\). These assumptions imply that \(C(\alpha ; {\varvec{\xi }}, {\varvec{\xi }}_0)\) = \(G(G^{-1}(\alpha ;\varvec{\xi }); \varvec{\xi }_0)\) is always defined and has continuous fourth partial derivatives with respect to \(\varvec{\xi }\) for any \(\varvec{\xi }\in U_0\). Then, \(C(\alpha ; {\varvec{\xi }}, \varvec{\xi }_0)\) has the following Taylor series expansion at \(\varvec{\xi }=\varvec{\xi }_0\):

$$\begin{aligned} C(\alpha ; \varvec{\xi }, \varvec{\xi }_0) = \alpha + \sum _{|\mathbf{r}|=1}^3\frac{\partial ^{|\mathbf{r}|}C}{\partial \varvec{\xi }^\mathbf{r}}(\alpha , \varvec{\xi }_0, {\varvec{\xi }}_0)\frac{(\varvec{\xi } - \varvec{\xi }_0)^\mathbf{r}}{\mathbf{r}!} + \sum _{|\mathbf{r}|=4}\frac{\partial ^4C}{\partial \varvec{\xi }^\mathbf{r}}(\alpha , \bar{\varvec{\xi }}, \varvec{\xi }_0)\frac{(\varvec{\xi } - \varvec{\xi }_0)^\mathbf{r}}{\mathbf{r}!} \end{aligned}$$
(19)

in which \(\mathbf{r}! = r_1!\cdots r_Q!\), and \(\bar{\varvec{\xi }}\) lies between \(\varvec{\xi }\) and \(\varvec{\xi }_0\). Note that \(C(\alpha ; \varvec{\xi }_0)\) = \(E_{\varvec{\xi }_0}^\mathbf{Y}[C(\alpha ; \hat{\varvec{\xi }}, \varvec{\xi }_0)]\) is essentially \(E_{\varvec{\xi }_0}^\mathbf{Y}\left[ C(\alpha ; \hat{\varvec{\xi }}, \varvec{\xi }_0){\mathbb I}\{\mathbf{p}\in N_0\}\right] \) plus an exponentially small term because of Eq. 17 and the boundedness of \(C(\alpha ; \varvec{\xi }, \varvec{\xi }_0)\). It follows from Eqs. 18 and 19 that

$$\begin{aligned} C(\alpha ; {\varvec{\xi }_0}) = \alpha + n^{-1}d(\alpha ,\varvec{\xi }_0) + O(n^{-2}), \end{aligned}$$
(20)

in which

$$\begin{aligned} d(\alpha ,\varvec{\xi }_0) = \sum _{|\mathbf{r}|=1}^2\frac{\partial ^{|\mathbf{r}|}C}{\partial \varvec{\xi }^\mathbf{r}}(\alpha ,\varvec{\xi }_0,\varvec{\xi }_0)\frac{b_{\mathbf{r},1}({\varvec{\xi }}_0)}{\mathbf{r}!}. \end{aligned}$$
(21)

1.3 A.3. Predictive Calibration

Analogously, expanding \(C(C^{-1}(\alpha ; {\varvec{\xi }_0}); \varvec{\xi }, \varvec{\xi }_0)\) at \(\varvec{\xi }=\varvec{\xi }_0\) and taking expectation yield

$$\begin{aligned} C^{-1}(\alpha ; {\varvec{\xi }_0}) = \alpha - n^{-1}d(\alpha ,\varvec{\xi }_0) + O(n^{-2}). \end{aligned}$$
(22)

The assumptions we made imply that Eq. 22 still holds with the same \(d(\cdot ,\cdot )\) if we replace \(\varvec{\xi }_0\) by \(\varvec{\xi }\) in some neighborhood \(V_0\subset U_0\),Footnote 8 and \(\sup _{{\varvec{\xi }}\in V_0}|C^{-1}(\alpha ; {\varvec{\xi }}) - \alpha + n^{-1}d(\alpha ,\varvec{\xi })|\) = \(O(n^{-2})\). Plugging in the expansion of \(d(\alpha ,\varvec{\xi })\) at \(\varvec{\xi }={\varvec{\xi }}_0\), we obtain

(23)

in which \(\breve{\varvec{\xi }}\) falls between \(\varvec{\xi }\) and \(\varvec{\xi }_0\).

We now expand \(C( C^{-1}(\alpha ; \varvec{\xi }); {\varvec{\xi }}, \varvec{\xi }_0)\) with respect to the first argument at \( C^{-1}(\alpha ; {\varvec{\xi }}_0)\) and then further expand \(\frac{\partial C}{\partial \alpha }( C^{-1}(\alpha ; \varvec{\xi }_0), {\varvec{\xi }}, \varvec{\xi }_0)\) with respect to the second argument at \(\varvec{\xi }_0\):

$$\begin{aligned}&C( C^{-1}(\alpha ; \varvec{\xi }); \varvec{\xi }, {\varvec{\xi }}_0)\nonumber \\&= C( C^{-1}(\alpha ; {\varvec{\xi }}_0); {\varvec{\xi }}, {\varvec{\xi }}_0) + \frac{\partial C}{\partial \alpha }( C^{-1}(\alpha ; \varvec{\xi }_0), {\varvec{\xi }}, {\varvec{\xi }}_0)[ C^{-1}(\alpha ; {\varvec{\xi }}) - C^{-1}(\alpha ; \varvec{\xi }_0)]\nonumber \\&+ \frac{1}{2}\frac{\partial ^2 C}{\partial \alpha ^2}(\bar{\alpha }, {\varvec{\xi }}, {\varvec{\xi }}_0)[ C^{-1}(\alpha ; {\varvec{\xi }}) - C^{-1}(\alpha ; \varvec{\xi }_0)]^2\nonumber \\&= C( C^{-1}(\alpha ; {\varvec{\xi }}_0); {\varvec{\xi }}, {\varvec{\xi }}_0) + \biggr \{\frac{\partial C}{\partial \alpha }( C^{-1}(\alpha ; {\varvec{\xi }_0}), {\varvec{\xi }}_0, {\varvec{\xi }}_0) + ({\varvec{\xi }} - {\varvec{\xi }}_0){}^\top \left[ \frac{\partial ^2 C}{\partial \alpha \partial \varvec{\xi }{}}( C^{-1}(\alpha ; {\varvec{\xi }_0}), \tilde{\varvec{\xi }}, {\varvec{\xi }_0})\right] \biggr \}\nonumber \\&\cdot [ C^{-1}(\alpha ; {\varvec{\xi }}) - C^{-1}(\alpha ; \varvec{\xi }_0)] + \frac{1}{2}\frac{\partial ^2 C}{\partial \alpha ^2}(\bar{\alpha }, {\varvec{\xi }}, {\varvec{\xi }}_0)[ C^{-1}(\alpha ; {\varvec{\xi }}) - C^{-1}(\alpha ; \varvec{\xi }_0)]^2, \end{aligned}$$
(24)

in which \(\bar{\alpha }\) falls between \( C^{-1}(\alpha ; {\varvec{\xi }}_0)\) and \( C^{-1}(\alpha ; {\varvec{\xi }})\), and \(\tilde{\varvec{\xi }}\) fall between \(\varvec{\xi }_0\) and \(\varvec{\xi }\). By combining Eqs. 23 with 24 and taking expectation, we conclude that the calibrated predictive coverage matches \(\alpha \) up to an \(O(n^{-2})\) error term.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Yang, J.S. Bootstrap-Calibrated Interval Estimates for Latent Variable Scores in Item Response Theory. Psychometrika 83, 333–354 (2018). https://doi.org/10.1007/s11336-017-9582-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-017-9582-9

Keywords

Navigation