Encyclopedia of Personality and Individual Differences

Living Edition
| Editors: Virgil Zeigler-Hill, Todd K. Shackelford

Validity Scales

  • Jacob A. FinnEmail author
  • Joye C. Anestis
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-28099-8_958-1


Scales developed as a means of quantifying the quality of information provided by a respondent and determining the level of confidence the test-giver can have that the resulting substantive scale(s) will be meaningfully associated with the external correlates seen in the research literature.


Objective personality assessment relies on the direct disclosure of information by the individual being assessed (i.e., self-report) or by others familiar with the individual (i.e., informant report). With face-valid questionnaire items, the assessment process becomes a meaningful communication between the test-giver and the test-taker, collecting valuable data in a structured and consistent manner across respondents. Despite the wealth of information provided via personality inventories, the predictive utility for individual cases can be undermined by intentional or unintentional biases in the respondent’s reporting. Validity scales were developed as a means of quantifying the quality of information provided by a respondent and determining the level of confidence the test-giver can have that the resulting substantive scale(s) will be meaningfully associated with the external correlates seen in the research literature. Validity scales are typically found in broadband measures of personality and psychopathology, such as the Minnesota Multiphasic Personality Inventory (MMPI) and the Personality Assessment Inventory (PAI) families of instruments. Three common areas of response distortion include non-content-based responding, content-based overreporting, and content-based underreporting.

Non-Content-Based Responding

Non-content-based responding is a broad category that includes any response pattern that is independent of the questionnaire’s item content. This can include non-responding, fixed or random responding, and extreme or “middle of the road” responding.


Non-responding includes items left blank, as well as those marked with multiple options (e.g., both true and false), making the items unscorable. Although clinicians may speculate that these response patterns were driven by item content (e.g., not responding to substance use items during a forensic evaluation), it is ultimately unclear why these responses were not responded to appropriately. Non-responding can be diffuse enough to result in the invalidation of an entire administration or specific enough to restrict the interpretation of individual scales. Although substantial non-responding impacts validity coefficients, small percentages of unscorable items (i.e., 10%) can impact scale elevations (Dragon et al. 2012). Non-responding can also be present in projective personality tests, for example, when a respondent does not provide the necessary number of responses to a Rorschach administration.

Fixed and Random Responding

Fixed responding includes patterns of answering in a singular manner to items, regardless of the item content (i.e., acquiescent/yay-saying and non-acquiescent/nay-saying), typically resulting in an inconsistent or highly infrequent presentation. For example, someone who answers true to both “I typically feel sad” and “I typically feel happy” presents an incoherent narrative in his or her responses. Alternately, someone responding false to a group of items including “I am younger than my parents” and “I need to breath to live” may have responded to the items without considering the content. As such, it is unclear how useful the responses to other items will be.

Random responding similarly occurs when an individual’s response pattern reflects inconsistency or infrequency; however, this responding is not fixed to a single response option. For example, someone who answers true to “I typically feel sad” but answers false to “I often feel unhappy” may not have read the items or may have missed the “un” in “unhappy.” Regardless, the resulting narrative is incoherent and difficult to meaningfully use. Indeed, significant levels of fixed or random responding result in degradation in predictive utility of substantive scales (Handel et al. 2010).

Extreme and Midpoint Responding

Finally, extreme and midpoint responding are related to the respondent’s use of a polytomous rating scale, such as always true, somewhat true, somewhat false, and always false, irrespective of the item content. In extreme responding, a respondent overutilizes the severe options (e.g., the “always” options in the example above). In midpoint responding, the test-taker overutilizes the more neutral options (e.g., the “somewhat” options in the example above). Although the use of the extreme and midpoint responses is not inherently problematic, the systematic use of particular options may added error variance to one’s scale scores. After all, rating scales are used to allow greater specificity in responding; however, with extreme and midpoint responding, one’s experiences are not being reflected in the responses, but rather a potential preference for or discomfort with a response option, which may not be relevant to the psychological construct of interest. Recent work has suggested that when using polytomous ratings rather than dichotomous responses, rating scales may account for an increase in reliability while offering no concomitant increase in validity, possibly due to extreme and midpoint responding issues (Finn et al. 2015).


Overreporting occurs when an individual describes himself as having types of experiences, an amount of difficulties, or both types and amounts that are uncommon within a particular population (e.g., general population, psychiatric population, medical population, etc.). Overreporting can be related to psychiatric symptoms and/or somatic and cognitive complaints, which has led to the development of unique validity scales for each type of difficulty, and can be intentional or unintentional (Ben-Porath 2003). Intentional overreporting can include the fabrication of symptoms and experiences for a secondary gain (i.e., malingering) or for the maintenance of the sick role (i.e., factitious disorder), as well as the conscious exaggeration of severity of genuine symptoms. Unintentional overreporting can include an unconscious cry for help (e.g., previous invalidating environments have not acknowledged difficulties that were not severe) or lack of normative awareness (e.g., believe one’s symptoms are extreme because others around them never experience psychological difficulties). Although both intentional and unintentional overreporting occur, overreporting validity scales typically are unable to determine intent, which creates some misconceptions. For example, although research on malingering examines scores on overreporting scales (because malingering is a type of overreporting), not all overreporting is malingering. Therefore, it is incorrect to assume that an individual who overreports psychopathology is malingering without corroborating contextual information. Additionally, the presence of overreporting is not evidence of the absence of true psychopathology. Indeed, individuals who malinger may have some kind of genuine mental health concern (Rogers 2008); however, the presence of overreporting makes the assessment of symptoms unproductive until reliable information can be obtained. Ultimately, elevated overreporting scores indicate that the quality of self-report precludes the test-giver from ascertaining what symptoms are actually experienced and to what degree; the clinician has to use other additional information to construct hypotheses about intent.


Underreporting occurs when an individual describes himself as possessing a level of psychological adjustment, socially desirable attributes, or both that are unrealistic and uncommon within the general population. Similar to overreporting, underreporting can be intentional or unintentional (Ben-Porath 2003). Intentional underreporting includes conscious attempts at positive impression management, which can include denying the existence of problems that an individual is aware of or minimizing the severity of symptoms or their impact on functioning. Unintentional underreporting can include self-deception, where the individual denies symptoms because he or she is unaware of them or denies functional impairment because he or she lacks insight into the resulting difficulties (e.g., oblivious to interpersonal friction with others). As with overreporting, underreporting validity scales also typically are unable to speak to the intentionality of the respondent.

Debate About Validity Scales

An ongoing debate within the area of personality assessment involves the relative value of validity scales and the research supporting their use. Criticisms raised about validity scale research include the use of simulation studies with questioned generalizability to clinical settings and the limited use of a moderator or suppressor effect research design for testing the impact of response bias on substantive scale utility (McGrath et al. 2010). However, responses to these criticisms (e.g., Morey 2012; Rohling et al. 2011) argue in support of the methodologies used, the large clinical samples required for the suggested analyses, and the literature and findings ignored by the critics of validity scales. Given the use of personality inventories in high-stakes assessment contexts, examination of the use and value of validity scales will likely continue.

Treatment Utility of Validity Scales

Validity scales may offer clinicians additional information beyond the quality of information provided by the respondent. Personality assessment researchers (e.g., Butcher and Perry 2008; Graham 2012) have speculated about the clinical utility of validity scales to measure a patient’s level of engagement with the treatment process, viewing the approach to the questionnaire as a microcosm of the larger approach to healthcare services. A recent study tested these notions by examining the ability of validity scales to predict premature termination in an outpatient mental health clinic: fixed responding, overreporting of rare psychiatric symptoms, and lower levels of underreporting psychological adjustment predicted patient’s unilaterally ending therapy ahead of schedule (Anestis et al. 2015). Continued exploration of the associations between response bias and treatment engagement and process variables may expand the use of validity scales.



  1. Anestis, J., Finn, J. A., Gottfried, E., Arbisi, P. A., & Joiner, T. (2015). Reading the road signs: The utility of the MMPI-2 restructured form validity scales in prediction of premature termination. Assessment, 22, 279–288.CrossRefPubMedGoogle Scholar
  2. Ben-Porath, Y. S. (2003). Assessing personality and psychopathology with self-report inventories. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology: Assessment psychology (pp. 553–577). Hoboken: Wiley.Google Scholar
  3. Butcher, J. N., & Perry, J. N. (2008). Personality assessment in treatment planning: Use of the MMPI-2 and BTPI. New York: Oxford University Press.Google Scholar
  4. Dragon, W. R., Ben-Porath, Y. S., & Handel, R. W. (2012). Examining the impact of unscorable item responses on the validity and interpretability of MMPI-2/MMPI-2-RF restructured clinical (RC) scale scores. Assessment, 19, 101–113.CrossRefPubMedGoogle Scholar
  5. Finn, J. A., Ben-Porath, Y. S., & Tellegen, A. (2015). Dichotomous versus polytomous response options for psychopathology assessment: Method or meaningful variance? Psychological Assessment, 27, 184–193.CrossRefPubMedGoogle Scholar
  6. Graham, J. R. (2012). MMPI-2: Assessing personality and psychopathology (5th ed.). New York: Oxford University Press.Google Scholar
  7. Handel, R. W., Ben-Porath, Y. S., Tellegen, A., & Archer, R. P. (2010). Psychometric functioning of the MMPI-2-RF VRIN-r and TRIN-r scales with varying degrees of randomness, acquiescence, and counter-acquiescence. Psychological Assessment, 22, 87–95.CrossRefPubMedGoogle Scholar
  8. McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450–470.CrossRefPubMedGoogle Scholar
  9. Morey, L. C. (2012). Detection of response bias in applied assessment: Comment on McGrath et al. (2010). Psychological Injury and Law, 5, 153–161.CrossRefGoogle Scholar
  10. Rogers, R. (2008). An introduction to response styles. In R. Rogers (Ed.), Clinical assessment of malingering and deception (pp. 3–13). New York: Guilford Press.Google Scholar
  11. Rohling, M. L., Larrabee, G. J., Greiffenstein, M. F., Ben-Porath, Y. S., Lees-Haley, P., Green, P., & Greve, K. W. (2011). A misleading review of response bias: Comment on McGrath, Mitchell, Kim, and Hough (2010). Psychological Bulletin, 137, 708–712.CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG (outside the USA) 2017

Authors and Affiliations

  1. 1.Minneapolis Veterans Affairs Health Care SystemMinneapolisUSA
  2. 2.The University of Southern MississippiHattiesburgUSA

Section editors and affiliations

  • Bradley A. Green
    • 1
  1. 1.University of Southern MississippiHattiesburgUSA