Abstracts
In this chapter we draw on The Standards for Educational and Psychological Testing (AERA, APA, & NCME. Standards for educational and psychological testing. Washington, DC: American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) as a general framework to discuss some key test standards that are highly relevant when constructing or selecting tests for psychosocial assessments in education. Specifically, we present basic principles that are associated with test score reliability and the validity of test score interpretations. We also elaborate on basic and more complex psychometric models and their importance for computing and understanding score reliability. In doing so, we show how results from confirmatory factor analysis can be used to compute McDonald’s omega (ω)—a model-based estimate of score reliability. Finally, we discuss some of the main challenges facing assessment of psychosocial constructs with a focus on response sets and response styles as well as general problems with test-criterion correlations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The remainder of this chapter mainly revolves around tests and questionnaires. It is important to note, though, that the following quality issues also apply to interviews and behavior observations!
- 2.
This section borrows from the article written by Brunner, Nagy, and Wilhelm (2012).
- 3.
An in-depth discussion on how to compute reliability for more complex, hierarchical constructs can be found in Brunner et al. (2012).
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association, American Psychological Association, & National Council on Measurement in Education.
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational research. Organizational Research Methods, 1, 45–87.
Barrick, M. R., & Mount, M. K. (1991). The BIG five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1–26.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74, 137–143.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bolt, D. M., Lu, Y., & Kim, J.-S. (2014). Measurement and control of response styles using anchoring vignettes: A model-based approach. Psychological Methods. doi:10.1037/met0000016.
Bonett, D. G. (2003). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27, 335–340.
Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R. Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation models: Present and future. A Festschrift in honor of Karl Jöreskog (pp. 139–168). Lincolnwood, IL: Scientific Software International.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.
Bowers, K. S. (1973). Situationism in psychology—Analysis and a critique. Psychological Review, 80(5), 307–336.
Brogden, H. E., & Taylor, E. K. (1950). The theory and classification of criterion bias. Educational and Psychological Measurement, 10(2), 159–186.
Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460–502.
Brunner, M., Nagy, G., & Wilhelm, O. (2012). A tutorial on hierarchically structured constructs. Journal of Personality, 80, 796–846. doi:10.1111/j.1467-6494.2011.00749.x.
Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105.
Cattell, R. B. (1958). What is“ objective” in“ objective personality tests?”. Journal of Counseling Psychology, 5, 285–289.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104.
Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12(1), 1–16.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The satisfaction with life scale. Journal of Personality Assessment, 49, 71–75.
Eid, M., Nussbeck, F., Geiser, C., Cole, D., Gollwitzer, M., & Lischetzke, T. (2008). Structural equation modeling of multitrait-multimethod data: Different models for different types of methods. Psychological Methods, 13(3), 230–253.
Ellingson, J. E. (2011). People fake only when they need to fake. In M. Ziegler, C. MacCann, & R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 19–33). New York: Oxford University Press.
Embretson, S. E., & Reise, S. P. (2000). Item response theory. New York: Psychology Press.
Fan, X. (2003). Two approaches for correcting correlation attenuation caused by measurement error: Implications for research practice. Educational and Psychological Measurement, 63, 915–930.
Gogol, K., Brunner, M., Goetz, T., Martin, R., Ugen, S., Keller, U., et al. (2014). “My Questionnaire is Too Long!” The assessments of motivational-affective constructs with three-item and single-item measures. Contemporary Educational Psychology, 39(3), 188–205.
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74, 155–167.
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430–450.
Heene, M., Hilbert, S., Draxler, C., Ziegler, M., & Bühner, M. (2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. doi:10.1037/a0024917.
Heene, M., Hilbert, S., Freudenthaler, H. H., & Bühner, M. (2012). Sensitivity of SEM fit indexes with respect to violations of uncorrelated errors. Structural Equation Modeling: A Multidisciplinary Journal, 19, 36–50.
Heggestad, E. D., George, E., & Reeve, C. L. (2006). Transient error in personality scores: Considering honest and faked responses. Personality and Individual Differences, 40, 1201–1211. doi:10.1016/j.paid.2005.10.014.
Hogan, J., & Roberts, B. (1996). Issues and non-issues in the fidelity-bandwidth trade-off. Journal of Organizational Behavior, 17(6), 627–637.
Hoyle, R. H., & Smith, G. T. (1994). Formulating clinical research hypotheses as structural equation models: A conceptual overview. Journal of Consulting and Clinical Psychology, 62, 429–440.
Jackson, D. N., & Messick, S. (1958). Content and style in personality-assessment. Psychological Bulletin, 55(4), 243–252.
King, G., Murray, C. J., Salomon, J. A., & Tandon, A. (2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. The American Political Science Review, 98(1), 191–207.
Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567.
Kruyen, P. M., Emons, W. H., & Sijtsma, K. (2012). Test length and decision quality in personnel selection: When is short too short? International Journal of Testing, 12(4), 321–344.
Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2013a). Assessing individual change using short tests and questionnaires. Applied Psychological Measurement, 38(3), 201–216.
Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2013b). Shortening the S-STAI: Consequences for research and clinical practice. Journal of Psychosomatic Research, 75(2), 167–172.
Kruyen, P. M., Emons, W. H., & Sijtsma, K. (2013c). On the shortcomings of shortened tests: A literature review. International Journal of Testing, 13(3), 223–248.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160.
Kyllonen, P. C., & Bertling, J. P. (2013). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 277–285). Boca Raton: CRC Press.
Lindqvist, E., & Vestman, R. (2011). The labor market returns to cognitive and noncognitive ability: Evidence from the Swedish enlistment. American Economic Journal: Applied Economics, 3, 101–128. doi:10.1257/app.3.1.101.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9(2), 151–173.
Little, T. D., Rhemtulla, M., Gibson, K., & Schoemann, A. M. (2013). Why the items versus parcels controversy needn’t be one. Psychological Methods, 18(3), 285–300. doi:10.1037/a0033266.
Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Lubke, G. H., & Muthén, B. O. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10(1), 21–39.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201–226.
MacCann, C. (2013). Instructed faking of the HEXACO reduces facet reliability and involves more Gc than Gf. Personality and Individual Differences, 55(7), 828–833.
MacCann, C., Ziegler, M., & Roberts, R. (2011). Faking in personality assessment: Reflections and recommendations. In M. Ziegler, C. MacCann, & R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 309–329). New York: Oxford University Press.
Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational Psychology, 79, 280.
Marsh, H. W. (1996). Positive and negative global self-esteem: A substantively meaningful distinction or artifactors? Journal of Personality and Social Psychology, 70(4), 810–819.
Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181–220.
McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60, 577–605.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.
McDonald, R. P. (2010). Structural models and the art of approximation. Perspectives on Psychological Science, 5, 675–686.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166.
Miller, J. D., Gaughan, E. T., Maples, J., & Price, J. (2011). A comparison of agreeableness scores from the Big Five inventory and the NEO PI-R: Consequences for the study of narcissism and psychopathy. Assessment, 18(3), 335–339.
Mischel, W. (2004). Toward an integrative science of the person. Annual Review of Psychology, 55, 1–22.
Mõttus, R., Allik, J., Realo, A., Rossier, J., Zecca, G., Ah-Kion, J., et al. (2012). The effect of response style on self-reported conscientiousness across 20 countries. Personality and Social Psychology Bulletin, 38(11), 1423–1436.
Mussel, P. (2010). Epistemic curiosity and related constructs: Lacking evidence of discriminant validity. Personality and Individual Differences, 49(5), 506–510.
Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171–189.
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599–620.
Ones, D., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in personality measurement for personnel selection. Journal of Organizational Behavior, 17(6), 609–626.
Pace, V. L., & Brannick, M. T. (2010). How similar are personality scales of the “same” construct? A meta-analytic investigation. Personality and Individual Differences, 49(7), 669–676.
Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (pp. 49–69). Mahwah, NJ: Lawrence Erlbaum Associates.
Paulhus, D. L., & Williams, K. M. (2002). The dark triad of personality: Narcissism, Machiavellianism, and psychopathy. Journal of Research in Personality, 36, 556–563.
Paunonen, S. V., & Ashton, M. C. (2001). Big five factors and facets and the prediction of behavior. Journal of Personality and Social Psychology, 81(3), 524–539.
Preckel, F. (2014). Assessing need for cognition in early adolescence: Validation of a German adaption of the Cacioppo/Petty scale. European Journal of Psychological Assessment, 30, 65–72. doi:10.1027/1015-5759/a000170.
Rajaratnam, N., Cronbach, L., & Gleser, G. (1965). Generalizability of stratified-parallel tests. Psychometrika, 30(1), 39–56.
Rauthmann, J. F. (2012). You say the party is dull, I say it is lively a componential approach to how situations are perceived to disentangle perceiver, situation, and perceiver × situation variance. Social Psychological and Personality Science, 3(5), 519–528.
Reips, U., & Funke, F. (2008). Interval-level measurement with visual analogue scales in internet-based research: VAS generator. Behavior Research Methods, 40, 699–704.
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354–373. doi:10.1037/a0029315.
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of personality: The comparative validity of personality traits, socioeconomic status, and cognitive ability for predicting important life outcomes. Perspectives on Psychological Science, 2, 313–345. doi:10.1111/j.1745-6916.2007.00047.x.
Rost, J., Carstensen, C. H., & Von Davier, M. (1997). Applying the mixed Rasch model to personality questionnaires. In J. In Rost & R. E. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences. New York: Waxmann.
Satorra, A. (1990). Robustness issues in structural equation modeling: a review of recent developments. Quality and Quantity, 24, 367–386.
Schumacher, J., Klaiberg, A., & Brähler, E. (Eds.). (2003). Diagnostische Verfahren zu Lebensqualität und Wohlbefinden [Diagnostic methods for assessing quality of life and subjective well-being]. Göttingen: Hogrefe.
Shavelson, R. J., & Webb, N. M. (2006). Generalizability theory. In J. L. Green, G. Camilli, & P. B. Elmore (Eds.), Handbook of complementary methods in education research. (pp. 309–322). Mahwah, NJ: Lawrence Erlbaum Associates.
Slaney, K. L., & Maraun, M. D. (2008). A proposed framework for conducting data-based test analysis. Psychological Methods, 13, 376–390.
Spengler, M., Lüdtke, O., Martin, R., & Brunner, M. (2013). Personality is related to educational outcomes in late adolescence: Evidence from two large-scale achievement studies. Journal of Research in Personality, 47(5), 613–625. doi:10.1016/j.jrp.2013.05.008.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292.
Stark, S., Chernyshenko, O., & Drasgow, F. (2011). Constructing fake-resistant personality tests using item response theory: High-stakes personality testing with multidimensional pairwise preferences. In M. Ziegler, C. MacCann, & R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 214–239). New York: Oxford University Press.
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job performance. Journal of Applied Psychology, 88(3), 500–517.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69.
Walker, C. M. (2011). What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29, 364–376. doi:10.1177/0734282911406666.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56–75). Thousand Oaks, CA: Sage.
West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 209–231). New York: Guilford Press.
Wetzel, E., Böhnke, J. R., Carstensen, C. H., Ziegler, M., & Ostendorf, F. (2013). Do individual response styles matter? Journal of Individual Differences, 34(2), 69–81.
Wetzel, E., Carstensen, C. H., & Böhnke, J. R. (2013). Consistency of extreme response style and non-extreme response style across traits. Journal of Research in Personality, 47(2), 178–189.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling estimates of reliability. Structural Equation Modeling, 17, 66–81.
Zickar, M. J., Gibby, R. E., & Robie, C. (2004). Uncovering faking samples in applicant, incumbent, and experimental data sets: An application of mixed-model item response theory. Organizational Research Methods, 7(2), 168–190.
Zickar, M. J., & Sliter, K. A. (2011). Searching for unicorns: Item response theory-based solutions to the faking problem. In M. Ziegler, C. MacCann, & R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 113–130). New York: Oxford University Press.
Ziegler, M., Bensch, D., Maaß, U., Schult, V., Vogel, M., & Bühner, M. (2014). Big Five facets as predictor of job training performance: The role of specific job demands. Learning and Individual Differences, 29(1), 1–7.
Ziegler, M., Booth, T., & Bensch, D. (2013). Getting entangled in the nomological net. European Journal of Psychological Assessment, 29(3), 157–161.
Ziegler, M., & Bühner, M. (2009). Modeling socially desirable responding and its effects. Educational and Psychological Measurement, 69(4), 548–565.
Ziegler, M., Danay, E., Schölmerich, F., & Bühner, M. (2010). Predicting academic success with the Big 5 rated from different points of view: Self-rated, other rated and faked. European Journal of Personality, 24(4), 341–355.
Ziegler, M., & Kemper, C. (2013). Extreme response style and faking: Two sides of the same coin? In P. Winker, R. Porst, & N. Menold (Eds.), Interviewers’ deviations in surveys: Impact, reasons, detection and prevention (Schriften Zur Empirischen Wirtschaftsforschung) (pp. 221–237). Frankfurt a. M: Peter Lang Gmbh.
Ziegler, M., MacCann, C., & Roberts, R. R. (2011). Faking: Knowns, unknowns, and points of contention. In M. Ziegler, C. MacCann, & R. R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 3–16). New York: Oxford University Press.
Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30, 121–144.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ziegler, M., Brunner, M. (2016). Test Standards and Psychometric Modeling. In: Lipnevich, A., Preckel, F., Roberts, R. (eds) Psychosocial Skills and School Systems in the 21st Century. The Springer Series on Human Exceptionality. Springer, Cham. https://doi.org/10.1007/978-3-319-28606-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-28606-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28604-4
Online ISBN: 978-3-319-28606-8
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)