I Only Have One Rater Per Ratee, So What? The Impact of Clustered Performance Rating Data on Operational Validity Estimates

  • J. Kemp EllingtonEmail author
  • Samuel T. McAbee
  • Ronald S. Landis
  • Alan D. Mead
Original Paper


Test validation is fundamental to industrial-organizational psychology and to many of the interventions that we use to improve organizational performance. Problems with performance ratings as criteria are well-recognized, yet with nested designs, it is difficult to determine the degree that validity estimates are obscured due to rater variance. Using a simulation methodology, we examined attenuation under different nested validation design characteristics. Results clearly illustrate the attenuating effect that dependencies in nested performance ratings can have on observed criterion-related validity coefficients, and also indicate that attenuation decreases as the number of ratees per rater increases. Furthermore, we also examined observed intraclass correlation coefficients (ICC(1)) as potential alternative corrections to observed validity estimates. If assumptions are met regarding between-rater variability, our results suggest that corrections using a local ICC(1) estimate may be a viable alternative for disattenuating validity coefficients in nested rating designs. Our findings have practical implications for investigators using nested validation designs, for gauging the potential severity of attenuation, designing data collection efforts to minimize attenuation, and for potential post hoc corrections.


Validation research Validity coefficient Performance ratings Rater variance 



We would like to thank Paul Bliese for his feedback and advice on our simulation.


  1. Austin, J. T., & Crespin, T. R. (2006). Problems of criteria in industrial and organizational psychology: Progress, pitfalls, and prospects. In W. Bennett Jr., C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 9–48). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  2. Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied Psychology, 77(6), 836–874.CrossRefGoogle Scholar
  3. Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 349–381). San Francisco: Jossey-Bass.Google Scholar
  4. Bliese, P. D. (2016). Multilevel: Multilevel functions (Version 2.6). Retrieved from
  5. Bliese, P. D., & Hanges, P. J. (2004). Being both too liberal and too conservative: the perils of treating grouped data as though they were independent. Organizational Research Methods, 7(4), 400–417. Scholar
  6. Bureau of Labor Statistics. (2019). May 2018 national occupational employment and wage estimates united states. Retrieved from
  7. Burke, M. I., Landis, R. S., & Burke, M. J. (2017). Estimating group-level relationships: general recommendations and considerations for the use of intraclass correlation coefficients. Journal of Business and Psychology, 32(6), 611–626. Scholar
  8. Burke, M. J., Landis, R. S., & Burke, M. I. (2014). 80 and beyond: Recommendations for disattenuating correlations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 7(4), 531–535. Scholar
  9. Dalal, D. K., & Carter, N. T. (2015). Consequences of ignoring ideal point items for applied decisions and criterion-related validity estimates. Journal of Business and Psychology, 30(3), 483–498. Scholar
  10. Deloitte University Press. (2016). Global human capital trends 2016: the new organization: different by design. Retrieved from
  11. Ellington, J. K., & Wilson, M. A. (2017). The performance appraisal milieu: a multilevel analysis of context effects in performance ratings. Journal of Business and Psychology, 32(1), 87–100. Scholar
  12. Hallquist, M. N. (2017). Mplusautomation: automating mplus model estimation and interpretation (Version 0.7). Retrieved from
  13. Hoffman, B., Lance, C. E., Bynum, B., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63(1), 119–151. Scholar
  14. Hoyt, W. T. (2000). Rater bias in psychological research: when is it a problem and what can we do about it? Psychological Methods, 5(1), 64–86.CrossRefGoogle Scholar
  15. Hoyt, W. T., & Kerns, M.-D. (1999). Magnitude and moderators of bias in observer ratings: a meta-analysis. Psychological Methods, 4(4), 403–424.CrossRefGoogle Scholar
  16. LaHuis, D. M., & Avis, J. M. (2007). Using multilevel random coefficient modeling to investigate rater effects in performance ratings. Organizational Research Methods, 10(1), 97–107.CrossRefGoogle Scholar
  17. Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107.CrossRefGoogle Scholar
  18. LeBreton, J. M., Scherer, K. T., & James, L. R. (2014). Corrections for criterion reliability in validity generalization: a false prophet in a land of suspended judgment. Industrial and Organizational Psychology: Perspectives on Science and Practice, 7(4), 478–500. Scholar
  19. LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11(4), 815–852. Scholar
  20. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. Scholar
  21. McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114–140. Scholar
  22. Münsterberg, H. (1913). Psychology and industrial efficiency. Boston: Houghton Mifflin.CrossRefGoogle Scholar
  23. Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1(2), 148–160. Scholar
  24. Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks: Sage Publications.Google Scholar
  25. Murphy, K. R., & DeShon, R. (2000). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53(4), 873–900.CrossRefGoogle Scholar
  26. Muthén, L. K., & Muthén, B. O. (1998-2012). Mplus user’s guide (7th ed.). Los Angeles: Muthén & Muthén.Google Scholar
  27. O’Neill, T. A., Goffin, R. D., & Gellatly, I. R. (2012). The use of random coefficient modeling for understanding and predicting job performance ratings: an application with field data. Organizational Research Methods, 15(3), 436–462. Scholar
  28. Putka, D. J., & Hoffman, B. J. (2015). The reliability of job performance ratings equals 0.52. In C. E. Lance, R. J. Vandenberg, C. E. Lance, & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 247–275). New York: Routledge.Google Scholar
  29. Putka, D. J., Le, H., McCloy, R. A., & Diaz, T. (2008). Ill-structured measurement designs in organizational research: implications for estimating interrater reliability. Journal of Applied Psychology, 93(5), 959–981. Scholar
  30. Development Core Team, R. (2014). R: alanguage and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
  31. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed.). Thousand Oaks: Sage Publications.Google Scholar
  32. Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: an updated meta-analytic review of frame-of-reference training. Journal of Occupational & Organizational Psychology, 85(2), 370–395. Scholar
  33. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. Scholar
  34. Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: correcting error and bias in research findings (3rd ed.). Thousand Oaks: Sage.CrossRefGoogle Scholar
  35. Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability. Personnel Psychology, 53(4), 901–912. Scholar
  36. Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970.CrossRefGoogle Scholar
  37. Society for Industrial and Organizational Psychology. (2018). Principles for the validation and use of personnel selection procedures (5th ed.). Washington: American Psychological Association.Google Scholar
  38. Tett, R. P., Hundley, N. A., & Christiansen, N. D. (2017). Meta-analysis and the myth of generalizability. Industrial and Organizational Psychology, 10(3), 421–456. Scholar
  39. Van Iddekinge, C. H., Arnold, J. D., Frieder, R. E., & Roth, P. L. (2019). A meta-analysis of the criterion-related validity of prehire work experience. Personnel Psychology. Scholar
  40. Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: an updated meta-analysis. Journal of Applied Psychology, 97(3), 499–530. Scholar
  41. Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81(5), 557–574. Scholar
  42. Wang, G., Oh, I.-S., Courtright, S. H., & Colbert, A. E. (2011). Transformational leadership and performance across criteria and levels: a meta-analytic review of 25 years of research. Group and Organization Management, 36(2), 223–270. Scholar
  43. Woehr, D. J., & Roch, S. (2012). Supervisory performance ratings. In N. Schmitt (Ed.), The oxford handbook of personnel assessment and selection (pp. 517–531). New York: Oxford University Press.Google Scholar
  44. Zickar, M. J., Cortina, J. M., & Carter, N. T. (2017). Choosing a psychological assessment: reliability, validity, and more. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (2nd ed., pp. 397–405). New York: Routledge.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • J. Kemp Ellington
    • 1
    Email author
  • Samuel T. McAbee
    • 2
  • Ronald S. Landis
    • 3
  • Alan D. Mead
    • 4
  1. 1.Department of Management, Walker College of BusinessAppalachian State UniversityBooneUSA
  2. 2.Department of Psychology, College of Arts and SciencesBowling Green State UniversityBowling GreenUSA
  3. 3.Department of Psychology, Lewis College of Human SciencesIllinois Institute of TechnologyChicagoUSA
  4. 4.Talent Algorithms Inc.LockportUSA

Personalised recommendations