Peabody Picture Vocabulary Test
The PPVT is a test of receptive vocabulary – that is, it assesses the lexicon of words that a person can understand when he or she hears them. It is also designed to serve as a screening test for verbal ability. The test has a straightforward structure. Examinees see a page on an easel with four-color pictures. For each item, the examiner says a word, and the examinee responds by selecting one picture out of four that best illustrates that word’s meaning. Because the examinee points to the appropriate item, the test requires no reading, writing, or expressive verbal language, and it can be used with nonreaders and those without fluent verbal abilities. The test is untimed and individually administered. In total, the PPVT contains 228 items, divided into 19 “sets” of 12 items each; an examinee completes all items within a set. The basal level is set when an examinee correctly responds to 11 or more items in a set; the ceiling is established as the set where an examinee makes eight or more errors. Basal and ceiling levels are thought to represent the levels of difficulty below which an examinee is expected to know all items and above which the examinee is predicted to fail most items, respectively.
The PPVT was conormed with the Expressive Vocabulary Test – Second Edition, which serves as a measure of what words a person can speak aloud after seeing a pictorial representation of the item; PPVT and EVT scores were correlated at r (3540) = 0.82 in the normative sample. PPVT scores are provided as standard scores with a mean of 100 and a standard deviation of 15. Grade- and age-equivalent scores can also be computed and can help with interpreting an examinee’s relative knowledge. However, such scores are not recommended for research use given discontinuities between months or grades; that is, there are meaningful differences in the rate of development over different age periods, or between grade levels, which makes the steps between levels noncontinuous. Furthermore, consumers of such scores may not always remember that, as an average score, there must by necessity be 50% of examinees that score below the mean for a given age or grade level. Finally, users can also calculate a growth scale value, which tracks vocabulary over time.
The first edition of the PPVT was published by Lloyd M. Dunn in 1959. Subsequent revisions were published by Lloyd and Leota Dunn in 1981 (PPVT-R), in 1997 (PPVT-III), and by Douglas Dunn (son of Lloyd and Leota, Ph.D.) in (2007) The PPVT-4 is the current version. Lloyd Dunn, an expert in the fields of special education and child development, founded the Kennedy Center in 1965 and was central in efforts to train researchers in the field of mental retardation. His early work was instrumental in promoting educational mainstreaming and, more generally, addressing the educational needs of children whose abilities differ from the average. He served as the first director of the Institute on Mental Retardation and Intellectual Development, an institute of the Kennedy Center, developed the first doctoral program in special education at the Vanderbilt Peabody College of Education and Human Development, and developed a number of tools designed to assess language and cognitive skills in the service of educational intervention.
Recently, the PPVT-4 was incorporated into the NIH Toolbox – Cognition Battery (Gershon et al. 2014), based on its high test–retest reliability and strong construct validity when compared with relevant gold-standard measures. The NIH Toolbox provides a battery of very brief measures designed to assess core functions (cognitive, emotional, sensory, and motor processes) using an iPad, with the guidance of an experimenter.
Words for the PPVT were originally selected from the dictionary on the basis of their imageability; the target word and the three distractors all had to be amenable to representation via line drawings (originally in black and white, all items are now depicted in color; all can be distinguished by color-blind individuals). Items were selected from the categories of body parts, emotions, foods, clothing, toys and recreation, and so on. In earlier versions of the test, there was a high proportion of items depicting verbs, but the PPVT-4 contains a smaller proportion of these items; they were found to be disproportionately difficult for young children. All items in a given trial are balanced for detail and visual complexity. There are four training items, for which the examiner is permitted to give feedback. An examinee must correctly complete two of these items in order for testing to be valid. The training items permit the examiner to establish whether the examinee is capable of responding in a standard fashion.
The PPVT-4 was standardized on a sample of 3540 individuals between the ages of 2 years, 6 months, and 90 years. Participants were representative of the general US population (according to census data) in terms of gender (male, female), ethnicity (Latino/Latina/Hispanic, African American, White, and “other,” a group comprising American Indians, Alaska Natives, Asian Americans, Pacific Islanders, and all other groups not otherwise classified), geographic region, socioeconomic status, and special education placement. Twelve percent of the sample had parents whose educational achievement was grade 11 or lower; 28% had achieved 12th grade or a GED; 31% had 1–3 years of college; and 28% had 4 or more years of college. To establish age norms, the sample consisted of 28 age groups, with approximately 100–200 cases in each group; for ages 2–6 years, intervals were 6 months to account for the rapidly changing vocabulary levels in young children. All other groups were stratified by year. Fifty-seven percent (n = 2003) of the sample also contributed to a grade-stratified sample, comprising 26 groups (fall and spring, kindergarten through twelfth grade).
In addition, some participants in the normative sample had developmental concerns: speech or language impairment (5–15 years, n = 178 for ages 5–15; n = 60 ages 50–96), language delay (3–7 years, n = 63), intellectual disability (6–17 years, n = 70), reading disability (ages 8–14 years, n = 71), ADHD (6–17 years, n = 91), hearing impairment (4–12 years, n = 99), and several low-incidence disabilities. ASD was not specifically targeted. Proportions of special populations were intended to match population percentages. Data for average scores for these subgroups relative to the larger reference group are provided in the technical manual and range from a relatively small difference of −5.6 points for individuals with speech impairment to a relatively large difference of −29.7 points for hearing-impaired individuals with cochlear implants.
Two alternate forms are available, providing the option to repeat assessments more frequently. The reliability of these forms, which are identical in format and organization, with parallel but nonidentical words tested, was assessed at 0.94. This permits the PPVT to serve as an assessment of response to intervention and for other methods requiring multiple test administrations. Test-retest correlations range from 0.92 to 0.96.
Construct validity of the PPVT-4 was assessed using the Comprehensive Assessment of Spoken Language (CASL; Carrow-Woolfolk 1999); for ages 3–5 years (n = 68), CASL subtest scores correlated with the PPVT as follows: basic concepts, r = 0.50; antonyms, r = 0.41; and sentence completion, r = 0.54. For ages 8–12 years (n = 62), CASL subtest scores correlated with the PPVT as follows: synonyms, r = 0.65; antonyms, r = 0.78; sentence completion, r = 0.63; and lexical/semantic composite, r = 0.79. Correlations with core language scores from Clinical Evaluation of Language Fundamentals – Fourth Edition ranged from 0.73 (ages 5–8) to 0.72 (ages 9–12), with slightly lower correlations for the receptive (r = 0.67 and 0.75 for the two age groups) and expressive (r = 0.75 and 0.86 for the two age groups) subscales. These moderate-to-high correlations indicate that the other oral language assessments measure a constellation of similar but nonidentical skills. The PPVT-4 test is reliable, with all reliability and validity coefficients in the .90s range.
Research partially supports the utility of the PPVT as an assessment of general language abilities in the autism spectrum disorders. A study of 44 children with autism, ages 4–14, found that standardized measures of language ability (PPVT, EVT, and Clinical Evaluation of Language Fundamentals) correlated well with spontaneous assessments (mean length of utterance, index of productive syntax, and number of different word roots; Condouris et al. 2003). Furthermore, the assessments of vocabulary and semantic knowledge more generally were significantly correlated with assessments of grammatical knowledge. A meta-analysis of 133 publications in the field of ASD between 1999 and 2002 (Mottron 2004) found that a vocabulary measure (British Picture Vocabulary Scale, or BPVS; nearly identical in structure to the PPVT) had been used as a matching variable for 22% of publications, second only to overall IQ estimations using the Wechsler scales (47%). Thus, research in the field has acted in accord with the assumption that vocabulary skills, estimated with the PPVT, may be a useful proxy for general verbal language skills. Certainly, data from typical development indicates that vocabulary correlates highly with other language abilities and is the single best predictor of academic success for children starting school.
However, there is significant evidence that vocabulary assessed by the PPVT may be a strength for individuals with ASD relative to morphosyntax (Eigsti et al. 2007) or discourse-level comprehension (Asberg 2011). That is, PPVT scores may overestimate general verbal abilities. Supporting this proposal, Mottron (2004) reported that receptive vocabulary may be a particular area of strength in ASD, even compared to tests considered to tap into ASD-specific expertise, such as block design tasks. Thus, vocabulary skills, estimated with the PPVT, may overestimate the functional abilities of participants with ASD.
The PPVT-4 has several specific limitations. First, it assesses only vocabulary items that are imageable − primarily, concrete nouns and verbs. As noted above, verbs, function words (and, if), and grammatical markers (−ing, plural “s”) are absent from the test. Thus, if an individual has a specific deficit in morphosyntactic abilities, the PPVT will fail to identify this difficulty. Second, the PPVT is inadequate for assessing individuals who are not fluent in English. There is also a Spanish version (Test de Vocabulario en Imagenes, TVIP; 1986) whose format is similar to the English version but with distinct items; because word frequencies differ dramatically across languages, it is not possible to simply translate items into another language (see an example of the challenge in a Danish study; Brynskov et al. 2017).
A third important limitation of the PPVT, specific to the case of ASD, is the presence of atypical vocabulary skills, such as the production of jargon words or echolalia (Eigsti et al. 2007). Children with ASD have been found to show some similar word-learning biases to typically developing children, in that they are able to map novel words onto novel objects, suggesting that their word learning is constrained by a “mutual exclusivity” bias that category labels apply to mutually exclusive objects (de Marchena et al. 2011). Furthermore, they are able to sort objects according to typical semantic categories. In contrast, research suggests that individuals with ASD produce less prototypical words, have difficulty in learning words that refer to mental states, and show differential priming effects (as reviewed in Eigsti et al. 2011); they also fail to show the developmentally typical shape bias (Tek et al. 2008). Because the PPVT is organized according to how vocabularies are structured in typically developing individuals, it may fail to detect differences in ASD. This lack of sensitivity is further possible because the PPVT does not have any overt semantic organization in the scoring procedure; it is not possible to report a particular deficit in acquisition of words in any particular semantic (or structural) language category.
One final topic bears explicit mention. Clinicians have occasionally reported better responses by individuals with ASD in the PPVT when assessed using facilitated communication strategies. However, studies explicitly comparing responses by examinees with the assistance of facilitators who could or could not see the task materials have conclusively demonstrated that any improvement in scores using assistance primarily reflects the facilitator rather than the examinee’s knowledge (Beck and Pirovano 1996).
References and Reading
- Carrow-Woolfolk, E. (1999). Comprehensive assessment of spoken language (CASL). Bloomington: Pearson Assessments.Google Scholar
- Dunn, L. M., & Dunn, D. M. (2007). Peabody picture vocabulary test (4th ed.). Circle Pines: American Guidance Service.Google Scholar
- Gershon, R. C., Cook, K. F., Mungas, D., Manly, J. J., Slotkin, J., Beaumont, J. L., & Weintraub, S. (2014). Language measures of the NIH toolbox cognition battery. Journal of the International Neuropsychological Society, 20(6), 642–651. doi: 10.1017/S1355617714000411.CrossRefPubMedPubMedCentralGoogle Scholar