Increase of reliability by incorporating response time into the paired-comparison psychological measurement


Much scholarly attention is focused on psychological measurement through the paired-comparison format, which is considered to be tolerant to systematic response bias. The Thurstonian D-diffusion item response theory model was recently proposed to incorporate response-time information in this context. Because reliability is a fundamental measurement property, this study used the above model to conduct a preliminary investigation into the extent of the reliability increase achieved when incorporating response-time information into paired-comparison psychological measurement. Under some realistic conditions, our simulation revealed a practically relevant (but not very large) degree of increase. The same type of increase was also found during our analysis of a real psychological dataset containing measurements for the Big Five traits. As such, this study produced evidence supporting the collection and utilization of response time when conducting paired-comparison psychological measurement.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Apple MT, Neff P (2012) Using Rasch measurement to validate the Big Five factor marker questionnaire for a Japanese university population. J Appl Meas 13(3):276–296

    Google Scholar 

  2. Bock RD, Mislevy RJ (1982) Adaptive EAP estimation of ability in a microcomputer environment Adaptive EAP estimation of ability in a microcomputer environment. Appl Psychol Meas 6(4):431–444.

    Article  Google Scholar 

  3. Brown A (2016) Item response models for forced-choice questionnaires: a common framework. Psychometrika 81(1):135–160.

    MathSciNet  Article  MATH  Google Scholar 

  4. Brown A, Maydeu-Olivares A (2011) Item response modeling of forced-choice questionnaires. Educ Psychol Meas 71(3):460–502.

    Article  Google Scholar 

  5. Brown A, Maydeu-olivares A (2013) How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychol Methods 18(1):36–52.

    Article  Google Scholar 

  6. Bunji K, Okada K (2020) Joint modeling of the two-alternative multidimensional forced-choice personality measurement and its response time by a Thurstonian D-diffusion item response model. Behav Res Methods. ahead of print)

  7. Cao M, Drasgow F (2019) Does forcing reduce faking? A meta-analytic review of forced-choice personality measures in high-stakes situations. J Appl Psychol 104(11):1347–1368.

    Article  Google Scholar 

  8. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw.

    Article  Google Scholar 

  9. Cheung M, Chan W (2002) Reducing uniform response bias with ipsative measurement in multiplegroup confirmatory factor analysis. Struct Equ Model 9(1):55–77.

    MathSciNet  Article  Google Scholar 

  10. Cronbach LJ (1951) Coefficient alpha and the internal s structure of tests. Psychometrika 16(3):297–334.

    Article  MATH  Google Scholar 

  11. Ferrando PJ, Lorenzo-Seva U (2007a) An item response theory model for incorporating response time data in binary personality items. Appl Psychol Meas 31(6):525–543.

    MathSciNet  Article  Google Scholar 

  12. Ferrando PJ, Lorenzo-Seva U (2007b) A measurement model for Likert responses that incorporates response time. Multivar Behav Res 42(4):675–706.

    Article  Google Scholar 

  13. Holden R (1993) Can personality test item response latencies have construct validity? Issues of reliability and convergent and discriminant validity. Personal Individ Differ 15(3):243–248.

    Article  Google Scholar 

  14. Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85(2):59–108.

    Article  Google Scholar 

  15. Revelle W, Condon DM (2019) Reliability from α to ω: a tutorial. Psychol Assess.

    Article  Google Scholar 

  16. Tuerlinckx F, De Boeck P (2005) Two interpretations of the discrimination parameter. Psychometrika 70(4):629–650.

    MathSciNet  Article  MATH  Google Scholar 

  17. Vandekerckhove J (2014) A cognitive latent variable model for the simultaneous analysis of behavioral and personality data. J Math Psychol 60:58–71.

    MathSciNet  Article  MATH  Google Scholar 

  18. van der Maas HLJ, Molenaar D, Maris G, Kievit RA, Borsboom D (2011) Cognitive psychology meets psychometric theory: on the relation between process models for decision making and latent variable models for individual differences. Psychol Rev 118(2):339–356.

    Article  Google Scholar 

  19. Voss A, Nagler M, Lerche V (2013) Diffusion models in experimental psychology: a practical introduction. Exp Psychol 60(6):385–402.

    Article  Google Scholar 

Download references


This work was supported by JSPS KAKENHI Grant numbers 17H04787, 17J07674, and 19H00616.

Author information



Corresponding author

Correspondence to Kensuke Okada.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Kyosuke Bunji is now at Benesse Educational Research and Development Institute, Tokyo, Japan.

Communicated by Kentaro Kato.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Okada, K., Bunji, K. Increase of reliability by incorporating response time into the paired-comparison psychological measurement. Behaviormetrika 48, 169–177 (2021).

Download citation


  • Response time
  • Reliability
  • Paired-comparison
  • Thurstonian D-diffusion IRT