Computerized testing; Computerized testing of creative potential; Creativity measurement; Divergent thinking; Measurement error; Psychometrics; Reliability; Standard definition of creativity; Validity
There are a number of reliable tests that provide reliable estimates of the potential for creativity. There is, however, no such thing as a test of creativity. That is because tests are by definition samples of behavior. Good tests sample a representative amount of behavior and thus can be useful to estimate subsequent behavior in the same domain. Creativity is, however, not an easy thing to sample. This is in part because it is often spontaneous and may depend on intrinsic interests, and it is always original. Originality is a prerequisite for creativity, and by definition, original things are difficult to predict. They are often surprising, unexpected, and not connected to what came before in any linear fashion. This all makes it very difficult to obtain a representative sample for a test. Additionally, creativity tests suffer from measurement error. This is true of all tests! That is why tests must be considered estimates of behavior. Then there is the fact that creativity is not unitary but is instead a complex. Any one facet of the complex could be assessed, but no one assessment covers everything there is to creativity.
To understand creativity testing and the concept of measurement error, something must be said about reliability and validity, and a definition of creativity must be proffered. Reliability and validity are, like representative sampling and measurement error, well-established concepts in psychometrics, which is the field devoted to the measurement of human behavior. Each of these concepts is explained below, and each is related to the definition of creativity. (An extensive discussion of these same concepts applied to creativity is available in the psychometrics chapter of the new creativity textbook [Runco, in press].) The standard definition of creativity is presented and explained, as well, along with its limitations. Most important here may be that, because creativity is a complex and is not unitary, it makes no sense to think about any one test being universally useful. A single test is only reasonable when there is one thing being measured, and that certainly is not true of creativity.
The standard definition of creativity points to two things: originality and effectiveness. Thus all creative ideas, solutions, insights, and actions must be both original and effective. Both originality and effectiveness can be operationalized in more than one way. Originality may be apparent in novelty, or unconventionality, or uniqueness; effectiveness may take the form of fit, or appropriateness when the creativity is expressed in some expressive or artistic domain, effectiveness may be aesthetic appeal. When creativity is being used for some sort of problem-solving, then effectiveness is apparent when the problem is in fact solved – but solved in an original fashion. Originality is not sufficient for creativity. Some original things are just bizarre, useless, or irrelevant. There is a bit of debate about the best definition of creativity, and other dimensions have been proposed (e.g., authenticity, surprise), but the standard definition is widely used and not often questioned. It is a good working definition.
Then there are the domain differences in creativity. These must be recognized when testing. These also suggest clearly that no one test could possibly be adequate for creativity. Domain differences have been recognized since the 1930s, when Catherine Patrick published a series of reports on scientists, poets, and other professional groups. Domains were also found and clarified in the seminal work at the Institute for Personality Research and Assessment in the 1960s and 1970s. Researchers there closely examined architects, writers, and various other professional groups. Today, there are a number of recognized domains, though the most common list includes language, music, mathematics, bodily kinesthetic talent, and perhaps technology and social skill or leadership. There is some debate about the exact number of domains and the best way to define them. Still, rarely is the notion of domains questioned, and tests of creativity very frequently take domain into account.
The range of possible targets for testing is fairly well explained by the 4P framework offered by Melvin Rhodes in 1962 and widely used (with some extensions) ever since. Indeed, sometimes the interest is on the personality traits that contribute to creative behavior, but other times the focus is on the cognitive contributions or even the end result of the creative process. Rhodes described creative personality, creative processes, creative products, and the creative place. The last of these represents environmental influences on creativity. The more recent descriptions of the 4Ps tend to label this category creative place instead of press. All of the best measures of the creative place assess both environmental supports and barriers to creativity. Many of these were designed to assess organizational settings, though there are also measures of the home or the school as potentially creative places.
Much of the older work on creativity focused on personality. This research quickly identified a group of traits or “core characteristics” of creativity, including autonomy, openness, independence, intrinsic motivation, and flexibility, but not surprisingly it also quickly discovered variations from domain to domain. Creative mathematicians differed from architects, for example, and both differed from musicians. Additionally, all personality research was slowed by evidence that behavior is not very stable. People may have tendencies (e.g., towards introversion or extroversion), but these vary from setting to setting. A person can be introverted in one setting but extroverted in another. That evidence led to the State X Trait theory of personality, where traits contribute to behavior but depend on “states” (i.e., the immediate setting). This interaction makes it somewhat difficult to measure creativity. Personality assessments can provide useful information, but results must be interpreted in the light of a possible dependence on immediate states or settings. Some of the more common personality measures include the California Psychological Inventory and the Adjective Check List, and there are several measures of the Big Five.
Research on and assessments of settings and cultures and other “place” variables are quite important. That is in part because creativity does flourish when it is supported by the environment, and it can be stifled in the wrong settings. Businesses are particularly interested in assessments of settings, because it is so obvious that creativity is a prerequisite to innovation, and innovation gives them a competitive advantage. That being said, there are tools to assess the home environment and schools, in addition to assessments for businesses. These tools tend to depend on ratings which are given by employees (in the case of businesses), students or teachers (in the case of schools), or children or parents (when the home environment is being assessed). The dimensions along which the environments are rated are parallel in all settings and include resources, valuation of creativity and originality, autonomy, and flexibility, with evaluation, criticism, micromanagement, and constraint all negative indicators on the measure. As an aside, it is good for rating instruments to contain both indicative (or positive) and contraindicative (or negative) indicators. This way a respondent cannot just infer what is being measured and then stop reading the individual items. If items are in both positive (e.g., support for creativity) and negative (e.g., inhibit creativity) directions, a respondent needs to carefully consider each individual item. This too is a part of psychometric theory.
It may come as a surprise the products are often measured, but such assessments do have one enormous advantage: products can be counted and lead to highly objective measurement. The drawback is that products may not say much about the people or processes used to create them. There are lessons to be learned from assessments of products, and several methodologies have been proposed and refined for use with products. The Consensual Assessment Technique (CAT) is the best known example. It asks appropriate judges (e.g., people with experience in the same field as the products being evaluated) to rate the products, usually using creativity, technical skill, and aesthetic appeal (or likeableness) as dimensions. Judges are not given a definition of creativity, the assumption being that there is no need to articulate a definition, especially when the judges are experts in the field in question (e.g., professional artists). Judges must rate the products independently of one another, and they should compare all products in any one sample to one another. Judges should not compare a product (e.g., a collage) from one group of art students with collages from any other groups. There are, then, no absolute standards to be used in the ratings. Interestingly, the CAT, developed by Teresa Amabile, was designed to evaluate conditions (e.g., a setting where people are allowed to follow their own intrinsic interests vs. a setting where all efforts are evaluated and directed). Its intent was not to assess individual differences. CAT research is often misguided, at least when it focused on individual differences and compares individuals. The primary concern with the CAT may be that several investigations have uncovered differences between groups of judges who were asked to rate one sample of products. This raises the possibility that ratings from any one group of judges may not say much about what other groups think about the same products.
Of the 4Ps, the process perspective on creativity has received the least amount of psychometric research. That is no doubt because of the inherent difficulties. After all, any process is dynamic rather than static. Any attempt to measure it may require that a sample of the process be used for the assessment, but obviously if there is just one sample, it is not an assessment of process at all. That being said, there are a handful of efforts to assess process. Beth Hennessey, for example, in an investigation reported in the 1994 Creativity Research Journal, modified the CAT so it could be used with process. Simplifying some, she used software that captured not just the end result of some design work (by students) but also captured the steps (or preliminary designs) as the product was under construction. Hennessey found strong correlations among ratings of the end product and the process (or intermediate steps) and concluded that the CAT can be used to evaluate process. Interestingly, Dean Keith Simonton, in a 2007 article in the Creativity Research Journal, used a similar method to investigate the preliminary drawings of Picasso, done for the famous painting, Guernica. His interest was in comparing a monotonic progression of steps, which would imply that Picasso was headed in one particular direction with his painting, with a nonmonotonic or random progression. A nonmonotonic progression through the steps would imply that there was more experimentation and that not every step was towards the final product. Indeed, ratings obtained by Simonton supported the nonmonotonic process.
The third example of research assessing process is that of Gudmund Smith, from the 1990 Creativity Research Journal. Smith was interested in how interpretations are constructed. Various lines of work, including my own (see “Personal Creativity” in this volume) indicate that creative products may start with exactly this, an original interpretation. Smith displayed images to the participants in his research, but at first they were partial and incomplete. They gradually become more and more complete. As expected, the more creative participants constructed interpretations of the objects earlier than the other participants. Smith was able to reliably assess one creative process, but admittedly the assessment methodology is not something that could be used outside of an experimental laboratory.
There are other ways to categorize tests and assessments of creativity. Hocevar and Bachelor (1989), for instance, reviewed in excess of 100 tests and measures and divided them into eight categories: (a) divergent thinking tests, (b) personality inventories, (c) attitude and interest inventories, (d) ratings by teachers, supervisors, or peers, (e) eminence, (f) judgments about products, (g) self-reports of creative activities and achievements, and (h) biographical inventories. Clearly they were not interested just in tests but included all measurement options, including eminence. They reviewed some of the personality research summarized above (e.g., on architects at IPAR) under this category, which follows from the fact that those architects were indeed unambiguously creative, as evidenced by their productivity and reputations. Other work on eminence focuses on historical data. Often the number of lines devoted to any one individual in a Who’s Who volume or an Encyclopedia is taken as an index of eminence. While this is a kind of measurement, it is not dependent on a test. The same thing can be said about biographical inventories: they provide data and thus involve measurement but do not involve testing. A distinction can be made between the measurement of actual performance and the measurement of creativity relying on inferences drawn from biographical data or productivity (eminence). A test of divergent thinking, such as Alternative Uses, which asks that an individual generate original ideas in response to an open-ended task (“list alternative uses for a spoon”) exemplifies a test in the strict meaning of the word. Another important distinction is between measures that rely on ratings or observations and tests that are given to one particular examinee. Again divergent thinking tasks can be used as an example of a test, while data provided by teachers or peers or supervisors exemplify measurement relying on ratings. Inference is required for all measurement based on ratings, and care must be taken because they tend to be open to errors representing halo effects, memory, socially desirable responding, and deception.
One last point about the framework offered by Hocevar and Bachelor (1989): They concluded that creative activity and accomplishment check lists (CAACs) may provide the most useful data of all possible tests and measures. CAACs certainly have advantages: They cover multiple domains, for example, and they can be used with almost any population, with the possible exception of preliterate children. They are reliable, and there is some evidence of validity. Ratings by students of their own creativity have, for example, been found to agree with ratings given independently by their mothers. This implies that the ratings are not inflated or overly biased by halo effects, or memory, or honesty. Most of the older research on CAACs relied on quantity scales, which simply indicated how frequently a person was involved in the various creative activities (e.g., “How often have you written a short story?” “…painted a picture?” “…designed a website?” “…designed an item of clothing?” “…cooked an original dish?”), but recent research has added a quality of creative activity and accomplishment scale. The quality scale asks specifically about socially recognized examples of the various activities and accomplishments. One finding that stands out in the research using CAACs is that again and again students have been found to express more creativity when they are outside of school, in contrast to what they express when they are in school.
Then there are the tests of divergent thinking (DT), also mentioned in that review from 1989. It may sound odd that DT tests represent one entire category when only seven other categories were defined, but DT tests are probably the most commonly used measure of creative potential. There are tests that, like DT tests, focus on actual performance. These all give the examinee a task, usually a problem, and the examinee must solve the problem or generate options and ideas. This kind of performance measure is quite different from, say, a CAAC or a personality inventory, each of which is probably retrospective and may have either a True/False format or a Likert format. DT tests require that examinees solve problems and produce ideas. The examinee is not merely reporting on the frequency of particular behavior or the probability of displaying some trait.
DT tests are open-ended, which is what aligns them with the theory of DT. In this theory, ideas might result from convergent processes or divergent processes. The former are useful when there is one correct or conventional ideas or solution. Convergent thinking would useful with a question like, “What is the largest ocean in the world?” But the real world often presents more open-ended tasks and problems where there is more than one possible approach. Think about problems at home, school, or work. These are usually less well-defined and more open-ended. It is usually a good idea to think divergently, at least at first, in order to determine what options exist. In addition, DT is very useful for creativity because originality (a prerequisite for creativity) usually takes time. When presented with a problem, the first ideas a person considers are usually rote, from memory, and not original. After those are depleted, and time passes, the person starts to think of new and original options. A good amount of research has supported this view of “remote associates.”
DT tests sometimes contain realistic tasks. DT tests for students may ask about options when homework is forgotten, for example, or options when another student is distracting. Many DT tests are simpler and merely ask about alternative uses of some object (e.g., “list as many uses for a rubber band as you can”), instances of some category (e.g., “list all of the strong things you can think of”), similarities (e.g., “how are a tomato and an apple alike?”), or improvements on some product (“how can a chair be improved?”). Some DT tests are figural or visual and the examinee is given an abstract line drawing and asked to list all of the things that figure could represent. There are quite a number of different DT tests. All of them require that examinees give multiple ideas, and this must be part of their instructions. Scoring recognizes the multitude of ideas in that there are three typical scores. One is ideational fluency, which is calculated by simply counting how many ideas are given. A second is ideational originality and is calculated by determining the rarity of each idea. This is actually one huge advantage of DT tests: originality can be determined objectively, without judges. It can even be calculated by software that complies all ideas given by any one sample and determines which are the rare (and therefore original) ones. Computers can also determine the third DT score, which is ideational flexibility and determined by calculating how many different conceptual categories is used by the examinee when answering any one question. An young examinee might say “Superwoman, Superman, Spiderman, and the Hulk” when asked to “list strong things,” but all of those fall into one category, Superheroes. Another person might answer the same question by listing “Superwoman, superglue, gravity, and ants,” and earn a flexibility score of four, for the four different categories. Computers use semantic and associative networks when calculating flexibility scores.
Divergent thinking is not synonymous with creative thinking, but DT tests do offer reliable and useful information. One 50-year longitudinal study found that DT tests given as early as 1958 provided scores that were predictive of certain forms of creative activity and accomplishment – 50 years later. Another study found that when all three indices of DT are taken into account, creative activity and accomplishment is quite accurately predicted. The IQ and GPA might have negligible correlations with creative activity and accomplishment, usually below 0.30, but DT tests have predictive validity coefficients as high as 0.55. That being said, DT is probably best treated as one aspect of creativity. This of course fits entirely with the view presented earlier in this entry, namely that there is no one test of creativity.
There are many tests and assessments of creativity – too many to examine exhaustively in a short entry. The present entry does cover personality, attitude, and divergent thinking tests, as well as creative activity and accomplishment check lists, measures of products, places, and processes. Conventional psychometric standards were mentioned, including reliability and validity. Various concerns relevant to testing were also mentioned, including bias, halo effects, and representation and generalization of the results. The different tests and assessment vary in terms of the degree to which they satisfy the requirements for reliability and validity and the degree to which they avoid the problems, such as bias. The range of tests and measures reviewed herein, although not comprehensive, no doubt shows that there is no one objective for all creativity tests and that the many expressions of creativity require a sizable arsenal of measurement options. To paraphrase Frank Barron, from his 1995 book No Rootless Flower, the measurement of creativity itself requires creativity on the part of the examiner.
- Barron F. No rootless flower. Cresskill, NJ: Hampton Press. Runco, M. A. (in press). Creativity: Theories and themes: Research, development, and practice (rev. ed.). San Diego, CA: Academic Press; 1995.Google Scholar
- Hocevar D, Bachelor P (1989). A taxonomy and critique of measurements used in the study of creativity. In Glover, J. A. & Ronning, R. R. & Reynolds, C. R. (Eds). Handbook of creativity. Perspectives on individual differences. (pp. 53–75). New York, NY, US: Plenum Press.Google Scholar