Test Validation Pitfalls

Tonowski, Rich

doi:10.1007/978-3-319-11143-8_3

Rich Tonowski PhD³

651 Accesses

Abstract

The principles of test validation are well established, but there has been many a recorded slip between accepted theory and acceptable validation studies, or simply the argument that the selection method is appropriate for the situation. This disconnect has legal consequences when the test is implicated in violations of equal employment opportunity laws and regulations. This chapter discusses more than 50 pitfalls in attempting to reach successful validation, illustrated by situations arising in federal court cases. The sections include general problems, as well as those likely to arise during job analysis, content validation strategy, criterion validation strategy, scoring strategy, use of background information, and generalization of validity across selection situations.

This chapter does not offer legal advice, nor does it represent the position of any agency of the US government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
“Uniform” references resolution of a situation where there were two sets of guidelines used by different agencies.
2.
Federal agency regulations and formal guidance documents are published in the Federal Register. Final documents are categorized and retained in the Code of Federal Regulations (CFR). EEOC’s promulgation of UGESP can be found at 29 CFR 1607, which indicates title 29 (Labor), part 1607. Other signatories have the same UGESP text with different CFR title and part numbers. The CFR’s EEOC version prefixes “1607” to the section designations found in the Federal Register version. The Federal Register version contains some background material not in the CFR or some other sources. The Q&As are not in the CFR.
3.
See “Federal Courts” at http://www.uscourts.gov/FederalCourts.aspx for an overview of the federal court system. The site is maintained by the Administrative Office of the US Courts.
4.
The Seventh Circuit ruled in Adams (2014) that the procedure need not be neutral for an impact case. This is a testing case where summary judgment was affirmed for the employer, but where the appellate court corrected the legal theory applied by the lower court. The decision’s most notable effect likely was to close for future cases a potential loophole concerning subjective procedures influenced by unconscious bias. Without discriminatory intent, there could not be a treatment case; on the other hand, a biased procedure could not be neutral, so impact also did not fit. But prior to this, there were (and are) some cases filed as both treatment and impact; the parties and the courts sort out which theory applies as the case develops during evidentiary discovery.
5.
“Numerical disparity” is not defined by statute, and the courts have been correspondingly vague. “Our formulations, which have never been framed in terms of any rigid mathematical formula, have consistently stressed that statistical disparities must be sufficiently substantial that they raise such an inference of causation” (Watson 1988). Courts have picked up on “substantial” as the standard, but that does not provide any quantification.
6.
Whether these two phrases mean the same or different things divides the courts and commentators. “Business necessity” has acquired meaning as the necessity for a particular practice rather than alternatives. Q&A 36 and some commentators mention business necessity “for the safe and efficient operation of a business” as an employer’s defense apart from validation, and possibly apart from job relatedness.
7.
As with the note above, this is an area deliberately left vague by the Civil Rights Act of 1991. Some would say “comparably effective.” The 1991 amendments specified that the demonstration of alternatives “shall be in accordance with the law as it existed on June 4, 1989,” just prior to the Supreme Court’s Wards Cove (1989) decision which the amendments partially overturned.
8.
An alternative plaintiff argument is that the procedure could obviously be expected to have impact on a protected class, and so showing actual numbers is not necessary. A split appellate court disagreed in Lopez (2011). It might have been an ADA case, but the plaintiff argued theory according to Title VII; the court declined to consider whether the outcome would have been different had the plaintiff initially argued the ADA-specific language that makes unlawful those selection criteria that “tend to screen out” the disabled unless job related and consistent with business necessity.
9.
“Content validation” and “content-oriented validation strategy” as used here mean the same thing. Take “content validity” as validation demonstrated with content-oriented strategy. “Criterion validation” should be considered similarly.
10.
Schmidt (2012) defines a construct as “a variable which is defined in theoretical terms” and “a variable which is not defined directly in terms of empirical measurement operations but in terms of some particular theory” as part of an argument that cognitive ability is not a construct. This argument is important because UGESP distinguishes between knowledge, skill, and ability (collectively, KSAs) amenable to content strategy and constructs which are not. Sackett (2012) responded that all KSAs are constructs and what validation strategy is appropriate is a separate issue. This would seem to be the more simple approach, UGESP notwithstanding.
11.
Some writers would see construct contamination as more associated with irrelevant, nonperformance aspects of a criterion. The focus is on what is or is not in the job domain; ultimately that is the standard for what is or is not contamination. This discussion follows a more general focus on the measurement, be it the predictor or the criterion. If it measures what it should not, then it is contaminated; if it does not measure all it should, then it is deficient.
12.
This is essentially the position taken by Sackett (2012). He works through three examples of verbal ability testing which involve different levels of inferential leap. All three tests are valid, but not all are amenable to content strategy.
13.
A somewhat traditional categorization has “task” as a distinct work activity carried out for a distinct purpose. Broader than a task is a “duty,” which is a large segment of work that could include many tasks (e.g., provide information to the public). Smaller than a task is an element (sometimes called an activity) which is the smallest unit of work above the time-and-motion level, e.g., removes a saw from a tool chest. UGESP does not define “work behavior.” Some writers have considered it bigger than a task but smaller than a duty.
14.
Goldstein et al. (1993) discuss clustering tasks for more efficient processing for SME ratings. That is different from the problem discussed here, combining a large and disparate set of activities into a single unit of behavior.
15.
In the “hard sciences,” a phenomenon is operationally defined in terms of how it is measured. In this sense, intelligence is operationally defined by whatever test is used to measure it. This is not what UGESP intended for content validity. See § 14 C (4).
16.
Practice varies widely on the number of KSAs. For example, there seems to be a tradition for public safety positions to go with long, comprehensive lists of KSAs. The other end of the spectrum is exemplified by work with assessment center dimensions, where some would argue that dimensions become redundant when there are ten or more. Of course, the specificity of the statements tends to be different.
17.
Consider what happens if there were a dozen reading, writing, and speaking KSAs that were combined in a communications cluster. The cluster links to several important tasks, each of which involves reading or writing or speaking. Which of the 12 original KSAs is important for which task, and how is test content to be allocated among these facets of communication? The converse problem appears in Guardians (1980). There were 42 tasks grouped into five clusters; from the five clusters, five KSAs were derived that were presumably relevant to some tasks in some clusters. But here was no indication of linkage of KSA to specific task. “Only if the relationship of abilities to tasks is clearly set forth can there be confidence that the pertinent abilities have been selected for measurement.”
18.
The court was careful not to endorse the technical adequacy of the validation, having determined that the plaintiffs had placed all their argumentative eggs in the legal basket and so had no right to argue technical deficiency at the appellate stage. For example, whether the test adequately covered supervisory aspects of the job might have been relevant, but not when plaintiffs had failed to raise the issue for trial. And the folks at the New York Department of Civil Service had surveyed fire departments not only in New York State but also in cities nationwide. The district and circuit courts thought that such thoroughness made it unlikely that Buffalo’s job was unique.
19.
Note that statistical significance does not provide an index of how large or meaningful a correlation is. It just indicates the probability that a correlation other than zero came from chance factors.
20.
It is possible to have essentially the same validity in subgroups but have different regression lines. Regression lines can differ in slope or in intercept (the height of the lines where they cross the y-axis, where the criterion measure is on the y or vertical axis and the predictor measure is on the x or horizontal axis). Consider that the predictor test and the job performance criterion are both measured on 100-point scales. For group A, a test score of 40 predicts a performance score of 50. For group B, a test score of 40 predicts 55 on the performance scale. The test scores are identical, but the score in group B is “worth more” in terms of predicted performance. If selection is done with groups combined, there will be a common regression line that ignores this difference in score value by subgroup. Since 1991, making adjustments to scores according to subgroup could run into legal trouble if the subgroups are based on race or other protected class in Title VII.
21.
This is still an area for some controversy, even regarding the appropriate statistical model for fairness. See Aguinis et al. (2010) for arguments that differential prediction with cognitive tests is not a dead issue, at least as indicated by computer simulations. See Roth et al. (2014) regarding differential validity with cognitive tests likely being due to range restriction.
22.
See LeBreton et al. (2014) and related commentary articles. Using arguably unrealistically low reliability estimates spuriously inflates the validity estimate, and the problem can be amplified in validity generalization studies.
23.
See the next section regarding the Civil Rights Act of 1991 and Ricci (2009) regarding score modifications. The point of banding is to eliminate meaningless score distinctions. It is not to add or subtract points from scores based on demographics. The hoped-for outcome is less adverse impact, but accomplishing it depends on how the bands are constructed and other factors with adverse impact, such as the number selected and the selection ratio. Part of the controversy involves how ties with a band are resolved, and which demographic groups if any benefit from that process.
24.
The city did not specifically claim a validity problem until its final substantive brief to the Supreme Court. The Court’s majority concluded that the record did not support a reason for tossing the tests other than dissatisfaction with the demographics. The dissent focused on issues with the tests that had been asserted by the time of the Court’s ruling.
25.
Any decision point regarding who gets the job or advances in the selection process is open to challenge. That applies to the decision point where BQs are applied. If there is adverse impact on the BQ, it does not work to say that those failing the BQ were not bona fide candidates because they were “obviously” unqualified. The employer needs to justify the BQ.
26.
Statistical fairness has not been much of a practical concern. See, for example, Stark et al. (2004). In many situations, a study will not be feasible because there will not be enough subjects in different applicant groups. For cognitive tests, the research record indicates that fairness defined as lack of predictive bias is generally not an issue, but see the discussion on differential validity above for other tests.
27.
Schmidt and Hunter (2004) were leading supporters of the current theory for g in the workplace. See Dalliard (2013) for a technical but somewhat readable blog essay putting g in broader context and criticizing its critics. This is not a scientific, peer-reviewed piece, but it outlines the issues, introduces some of the key players, and has references to the academic literature for further reading. McDaniel and Banks (2010) have a brief overview of theory, and they discuss testing and legal issues. The chapter criticizes the enforcement agencies for failing to follow what VG research has shown.
28.
This is reinforced by another phenomenon, “positive manifold,” described below.
29.
What appears to have been meant by general ability tests can be distinguished from what tests of general cognitive ability purport to do, despite the similarity in wording and the fact that the same tests may figure in both categories. Remember that the issue in test validity is how the test is used.
30.
Q&A 81 seems to envision tests that pertain to the job as a whole, where construct validity is necessary because all job behaviors between two jobs are not the same. But as discussed previously, a test could reference only a common behavioral subset of the job domains, which makes for a transportability situation.
31.
Sackett (2012) illustrated that a verbal ability test pitched at a higher level than the job could still be valid. But there would need to be criterion evidence of this, and the degree of disparate impact and validity compared to a test more in line with the level required by the job is another matter.

References

Aamodt, M.G. (2015). Using employment checks in the employee selection process. In C. Hanvey, & K. Sady (Eds.), HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Aguinis, H. (2004). Test-score banding in human resource selection: Technical, legal, and societal issues. Westport: Praeger.
Google Scholar
Aguinis, H., Culpepper, S. A., & Pierce, C. A. (2010). Revival of test bias research in preemployment testing. Journal of Applied Psychology, 95, 648–680.
Article PubMed Google Scholar
American Education Research Association, American Psychological Association, & National Council for Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Ashe, R. L, Jr. (2007). Recent developments in scored test case law. http://eeoc.gov/eeoc/meetings/archive/5-16-07/testcase_ashe.html.
Baldwin, B. (2014). MQs: An idea whose time has passed? HR Tests – Recruitment, assessment, and personnel selection. http://hrtests.blogspot.com/2014/02/mqs-idea-whose-time-has-passed.html.
Barrett, R. S. (1998). Challenging the myths of fair employment practices. Westport: Quorum Books.
Google Scholar
Barron, L. G., & Rose, M. R. (2013). Relative validity of distinct spatial abilities: An example with implications for diversity. International Journal of Selection and Assessment, 27, 400–406.
Article Google Scholar
Biddle, D. A. (2011). Adverse impact and test validation: A practitioner’s handbook (3rd ed.). West Conshohocken: Infinity.
Google Scholar
Biddle, R. E., & Biddle, D. A. (2013). What public-sector employers need to know about promotional practices, procedures, and tests in public safety promotional processes: After Ricci v. Destefano. Public Personnel Management, 42, 151–190.
Article Google Scholar
Biddle, D. A. & Nooren, P. M. (2006). Validity generalization vs. Title VII: Can employers successfully defend tests without conducting local validation studies? Labor Law Journal, 57, 216–237.
Google Scholar
Brannick, M. T. & Levine, E. K. (2002). Job analysis: Methods, research, and applications for human resource management in the new millennium. Thousand Oaks: Sage.
Google Scholar
Buster, M. A., Roth, P. L., & Bobko, P. (2005). A process for content validation of education and experience-based minimum qualifications: An approach resulting in federal court approval. Personnel Psychology, 58, 571–599.
Article Google Scholar
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multimethod-multitrait matrix. Psychological Bulletin, 86, 81–105.
Article Google Scholar
Cascio, W. F., & Aguinis, H. (2005). Test development and use: New twists on old questions. Human Resources Management, 44, 219–235.
Article Google Scholar
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks: Sage.
Google Scholar
Dalliard. (2013) Is psychometric g a myth? Human Varieties. http://humanvarieties.org/2013/04/03/is-psychometric-g-a-myth/. Accessed 22 Dec 2014
Gibson, W. M., & Caplinger, J. A. (2007). Transportation of validation results. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence. San Francisco: Jossey-Bass.
Google Scholar
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis-content validity process. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations. San Francisco: Jossey-Bass.
Google Scholar
Gutman, A., Koppes, L. L., & Vodanovich, S. J. (2011). EEO law and personnel practices (3rd ed.). New York: Taylor and Francis.
Google Scholar
Hanvey, C., & Banks, C. (2015). Wage and hour litigation. In C. Hanvey & K. Sady (Eds.). HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Jacobs, R. & Nett, B. (2015) Validation issues in the public sector. In C. Hanvey & K. Sady (Eds.), HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Johnson, J. W., Steel, P., Scherbaum, C. A., Hoffman, C. C., Jeanneret, P. R., & Foster, J. (2010). Validation is like motor oil: Synthetic is better. Industrial and Organizational Psychology: Perspectives on Science and Practice, 3, 305–328
Article Google Scholar
Johnson, M. A., & Jolly, J. P. (2000). Extending test validation results from one plant to another: Applications of transportability evidence. Journal of Behavioral and Applied Management, 1, 127–136.
Google Scholar
Landy, F. J. (1986). Stamp collecting versus science. American Psychologist, 41, 1183–1192.
Article Google Scholar
Loverde, M., & Lahti, K. (2015). Test validation strategies. In C. Hanvey & K. Sady (Eds.), HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
LeBreton, J. M., Scherer, K. T., & James, L. R. (2014). Corrections for criterion reliability in validity generalization: A false prophet in a land of suspended judgment. Industrial and Organizational Psychology: Perspectives on Science and Practice (in press).
Google Scholar
Malos, S. (2005). The importance of valid selection and performance appraisal: Do management practices figure in case law? In F. J. Landy (Ed.), Employment discrimination litigation: Behavioral, quantitative, and legal perspectives. San Francisco: Jossey-Bass.
Google Scholar
McDaniel, M. A., & Banks, G. C. (2010). General cognitive ability. In J. C. Scott & D. H. Reynolds (Eds). Handbook of workplace assessment. San Francisco: Jossy-Bass.
Google Scholar
Morris, S. B. & Dunleavy, E. M. (2015). In C. Hanvey, & K. Sady (Eds.), HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Mueller, L., & Munson, L. (2015). Sdetting cut-scores. In C. Hanvey & K. Sady (Eds.). HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Murphy, K. R., Dzieweczynski, J. L., & Zhang, Y. (2009). Positive manifold limits the relevance of contentmatching strategies for validating selection test batteries. Journel of Applied Psychology, 94, 1018–31.
Google Scholar
Office of Federal Contract Compliance Programs (OFCCP, n. d.) Internet applicant recordkeeping rule. http://www.dol.gov/ofccp/regs/compliance/faqs/iappfaqs.htm. Accessed 22 Dec 2014
O’Leary, R. S., Pulakos, E. D., & Linton, L. L. (2007). A method for developing content-valid competency-based minimum qualifications. 31st Annual IPMAAC Conference on Personnel Assessment. http://annex.ipacweb.org/library/conf/07/oleary.pdf. Accessed 22 Dec 2014
Roth, P. L., Le, H., Oh, I.-S., Van Iddekinge, C. H., Buster, M. A., Robbins, S. B., & Campion, M. A. (2014). Differential validity for cognitive ability tests in employment and educational settings: Not much more than range restriction? Journal of Applied Psychology, 99, 1–20.
Article PubMed Google Scholar
Sackett, P. R. (2012). Cognitive tests, constructs, and content validity: A commentary on Schmidt (2012). International Journal of Selection and Assessment, 20, 24–27.
Article Google Scholar
Schmidt, F. L. (2012). Cognitive tests used in selection can have content validity as well as criterion validity: A broader research review and implications for practice. International Journal of Selection and Assessment, 20, 1–13.
Article Google Scholar
Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of work: occupational attainment and job performance. Journal of Personality and Social Psychology, 86, 162–173.
Article PubMed Google Scholar
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green: Author.
Google Scholar
Stark, S., Chernyshenko, O. S, & Drasgow, F. (2004). Examining the effects of differential item functioning and differential test functioning on selection decisions: When are statistically significant effects practically important? Journal of Applied Psychology, 89, 497–508.
Article Google Scholar
Stewart, R., Stewart, A., Bruskiewicz, K., & Vincent, J. P. (2015). Clinical psychological testing for selection. In C. Hanvey & K. Sady (Eds.). HR practitioners guide to legal issues in organizations. New York: Springer.
Google Scholar
Trattner, M. H. (1979). Synthetic validity and its application to the Uniform Guidelines validation requirements, Personnel Psychology, 35, 383–397.
Article Google Scholar
U.S. Equal Employment Opportunity Commission [EEOC]. (2010). Employment tests and selection procedures. http://www.eeoc.gov/policy/docs/factemployment_procedures.html. Accessed 22 Dec 2014
U.S. Equal Employment Opportunity Commission [EEOC]. (2012a). Disparate impact and reasonable factors other than age under the Age Discrimination in Employment Act. Federal Register, 77, 19080–19095. http://www.gpo.gov/fdsys/pkg/FR-2012-03-30/pdf/2012-5896.pdf. Accessed 22 Dec 2014
U.S. Equal Employment Opportunity Commission [EEOC]. (2012b). Enforcement guidance on the consideration of arrest and conviction records in employment decisions under Title VII of the Civil Rights Act of 1964, as amended, 42 U.S.C. § 2000e et seq. Document Number 915.002. http://www.eeoc.gov/laws/guidance/arrest_conviction.cfm#VB5. Accessed 22 Dec 2014
U.S. Equal Employment Opportunity Commission [EEOC]. (n. d.) Questions and answers on EEOC final rule on disparate impact and “reasonable factors other than age” under the Age Discrimination in Employment Act of 1987. http://www.eeoc.gov/laws/regulations/adea_rfoa_qa_final_rule.cfm. Accessed 22 Dec 2014
U.S. Equal Employment Opportunity Commission [EEOC], Civil Service Commission, Department of Labor, & Department of Justice. (1978). Uniform Guidelines on Employee Selection Procedures (UGESP). Federal Register, 43, 8294–38309. Questions and answers to UGESP (Q & As) published in the Federal Register, 44, 11996–120009 (1979); 45, 29530–29531 (1980); 69, 10152–10158 (2004). The latter were not adopted. www.uniformguidelines.com. Accessed 22 Dec 2014
U.S. Office of Personnel Management. (n. d.) Cognitive ability tests. Assessment Decision Guide. http://apps.opm.gov/adt/Content.aspx?page=3-04. Accessed 22 Dec 2014

Federal Court Cases

Adams v. City of Indianapolis, No. 12-1874 (7th Cir. February 4, 2014).
Google Scholar
Albemarle Paper Co. v. Moody, 422 U.S. 405 (1975).
Google Scholar
EEOC v. Dial Corp., 469 F.3d 735 (8th Cir. 2006).
Google Scholar
EEOC v. Ford Motor Co., et al., Civil Action No. 1:04-CV-00845 (U.S. Dist Ct.—S.D. Ohio) (filed Dec. 27, 2004).
Google Scholar
EEOC v. Ford Motor Co., et al., Civil Action No. 1:07-CV-00703 (U.S. Dist Ct.–S.D. Ohio) (filed Aug. 23, 2007).
Google Scholar
Griggs v. Duke Power Co., 401 U.S. 424 (1971).
Google Scholar
Guardians of N.Y. v. Civil Service Commission, 630 F.2d 72 (2nd Cir. 1980).
Google Scholar
Isabel v. City of Memphis, 404 F.3d 404 (6th Cir. 2005).
Google Scholar
Lanning v. Southeast Pennsylvania Transportation Authority (SEPTA), 181 F.3d 478 (3rd Cir. 1999), cert. denied, 120 S. Ct. 970 (2000).
Google Scholar
Lanning v. SEPTA, 308 F.3d 286 (3rd Cir. 2002).
Google Scholar
Lopez v. Pacific Maritime Assoc., 657 F.3d 762 (9th Cir. 2011).
Google Scholar
M.O.C.H.A. v. City of Buffalo, 689 F.3d 263 (2nd Cir. 2012).
Google Scholar
New York City v. Beazer, 440 U.S. 568 (1979).
Google Scholar
Ricci v. Destefano, 557 U.S. 557 (2009).
Google Scholar
Wards Cove Packing Co. v. Atonio, 490 U.S. 642 (1989).
Google Scholar
Watson v. Ft. Worth Bank & Trust, 487 U.S. 977 (1988).
Google Scholar

Download references

Author information

Authors and Affiliations

Equal Employment Opportunity Commission, 131 M Street, NE, 20507, Washington, DC, USA
Rich Tonowski PhD

Authors

Rich Tonowski PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rich Tonowski PhD .

Editor information

Editors and Affiliations

Berkeley Research Group, San Diego, California, USA
Chester Hanvey
DCI Consulting Group, Washington, District of Columbia, USA
Kayo Sady

Appendices

Glossary

Bias: In measurement, systematic inaccuracy.
Common method variance: The tendency for people to score similarly on tests of the same type (e.g., multiple choice) due to similarities in test design, distinct from the tests’ measuring the same thing.
Compensatory scoring: A strategy where all (sometimes weighted) components of a testing procedure are combined to produce a total score; not mutually exclusive with multiple-hurdle scoring, since subjects could be required to pass each component, and then the component scores are combined.
Criterion referenced test: A test designed to assess a construct (e.g., mathematical ability) at various levels of mastery; having mastery at a given level generally implies mastery of all lower levels, but no mastery of higher levels.
Differential item functioning (DIF): A form of bias where people from different groups (e.g., EEO protected class) with the same ability tend to differ in correctly answering a test item; differential test functioning is a related concept, and appears in discussions of test fairness.
Differential prediction: A situation where the single best equation relating test to the predicted criterion measure predicts differently for different groups.
Error: In test theory and statistics, random variation; it includes, but is not limited to, mistakes in recording test scores.
Fairness: An important value in employment selection that, because it is a value, is difficult to assign a universal definition; generally operationalized as lack of differential prediction, a concept that can be expressed statistically.
Meta-analysis: A generic term for statistical techniques to combine results across different studies.
Multiple hurdles scoring: A strategy where subjects must pass each separate component of a multipart testing procedure.
Multi-trait multi-method matrix: An analytic design used to compare correlations involving assessment of same/different constructs by same/different tests; developed by Campbell and Fiske (1959).
Power: In statistical analysis, the ability of a statistical procedure to detect a relationship (or lack thereof) between groups, given that it actually exists; an adequate amount of data is a main determinant of power.
Rorschach test: A personality test that involves the subject’s interpretation of inkblots; a “projective” technique, in that the subject projects meaning on an essentially neutral stimulus, thus revealing aspects of personality.
Standardization: In statistics, converting each raw score to a score indicating its relative position above or below the mean score; useful in combining test components where the components have different spreads of scores that would affect the relative weights of the components in producing the combined score.
Subject matter expert (SME): Someone with direct knowledge of a job who can assist with test development, usually by providing information on work behaviors and competencies to do the work, sometimes by providing information for test content.
True score: In classical test theory, the unobserved hypothetical score that represents a test taker’s actual standing on the construct being measured; in practice, there are only observed scores that are subject to measurement error.
Weights: For test components, the amount each component counts toward total test score relative to the others; nominal weights are simply whatever weights are assigned to a component, e.g., 50 % for Part A, 30 % for Part B, and 20 % for Part C; effective weights take into account the spread of scores in the components, since parts with more spread in scores will have more influence on the final outcome compared to parts where everyone scored essentially the same.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tonowski, R. (2015). Test Validation Pitfalls. In: Hanvey, C., Sady, K. (eds) Practitioner's Guide to Legal Issues in Organizations. Springer, Cham. https://doi.org/10.1007/978-3-319-11143-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-11143-8_3
Published: 27 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11142-1
Online ISBN: 978-3-319-11143-8
eBook Packages: Behavioral ScienceBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics

Test Validation Pitfalls

Abstract

Access this chapter

Notes

References

Federal Court Cases

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Recommended Readings

Glossary

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Test Validation Pitfalls

Abstract

Access this chapter

Notes

References

Federal Court Cases

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Recommended Readings

Glossary

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation