Should Employers Rely on Local Validation Studies or Validity Generalization (VG) to Support the Use of Employment Tests in Title VII Situations?

DOI10.1177/009102601003900402
Date01 December 2010
Published date01 December 2010
Subject MatterArticle
H-41 Should Employers Rely
on Local Validation
Studies or Validity
Generalization (VG) to
Support the Use of
Employment Tests in
Title VII Situations?

By Daniel A. Biddle, PhD
Since the landmark U.S. Supreme Court case Griggs v. Duke Power (1971),
employers have been subject to challenge by plaintiffs or government enforcement
agencies when they use employment tests that have adverse impact. In such
situations the 1991 Civil Rights Act requires employers to demonstrate that the test
is “…job related for the position in question and consistent with business
necessity.” Employers typically rally such a defense by addressing the Uniform
Guidelines on Employee Selection Procedures—the federal treatise framed in 1978
for the express purpose of Title VII enforcement—along with relevant professional
standards (Joint Standards, 1999; SIOP Principles, 2003). To address such
requirements, employers are faced with several viable validation alternatives,
ranging from conducting a local criterion-related validation study to relying on
validity evidence from other studies for similar positions, employers, and tests
(i.e., Validity Generalization). The strengths and limitations of both of these
strategies in a Title VII context are reviewed, and employers are ultimately
encouraged to select the local validation strategy whenever technically feasible for
a variety of reasons discussed.
There are two main reasons employers conduct validation studies on pre-
employment tests: 1) To assess whether the test is an effective tool for the
desired situation (i.e., for the target job or group of jobs, how the test should
be used in conjunction with other tests, what cutoffs may be appropriate, etc.), and
2) For defensibility—for both low-stakes (challenges brought by individual applicants)
and high-stakes (challenges under Title VII and government audits) situations.
Choices for validation strategies include both local techniques that strive to
evaluate and/or evidence connections between the test and the target position and
more global strategies that evaluate how tests seem to generalize across various
Public Personnel Management Volume 39 No. 4 Winter 2010
307

settings and positions (as with Validity Generalization, or “VG”). Both federal (the
Uniform Guidelines1) and professional (the Joint Standards2 and SIOP Principles3)
standards permit techniques that investigate the local validity of the test as well as the
more global techniques (e.g., through transportability studies that attempt to connect
existing validity evidence with the local situation).
With such choices available to practitioners, which techniques are likely to
produce the most accurate and defensible results; the local validity techniques or the
more global ones? What have the courts had to say about either technique in Title VII
situations where an employer is being required to demonstrate validity to justify their
testing practices? Answers to these questions and others will be provided in this review.
While a number of test validation techniques are available to practitioners (e.g., content
validity, construct validity, etc.), this discussion will be limited to only two: local
criterion-related validity and VG.4
Overview of Local Criterion-Related Validity
A local criterion-related validity study is conducted by statistically correlating test scores
with some measure of job performance (typically supervisor ratings or performance
evaluation scores). Following the conventional practices for the social sciences, validity
can be claimed if the correlation between test scores and some job performance metric
(i.e., the criterion) has a corresponding probability value that is less than .05, which
indicates that the correlation is a “beyond chance” occurrence. This type of validity
study is typically conducted for tests that measure abstract traits (e.g., some types of
cognitive ability, personality, etc.) that may not have obvious connections to the job (as
contrasted with content validity, which seeks to demonstrate a more rational-type of
connection between the test and the job with traits that are more concrete in nature).
The steps necessary to conduct this type of validation study are very straight-
forward. Under a predictive model, the researcher administers the test to the
applicants and then correlates test scores with some subsequent measure of job
performance. Under a concurrent model, the test is given to current job incumbents
and simultaneously correlated with job performance metrics of some type. Under
either model, having high reliability for both the test and job performance metrics is
key for making sure that the results will be accurate and reliable.
Having an adequate sample size to maximize statistical power is also important
when conducting a local study. Statistical power refers to the ability of the study to find
a statistically significant finding if it exists in the target population. Validity studies that
have large sample sizes (e.g., 300+ subjects) have high statistical power, and those with
small samples have low statistical power. For example, assume that a researcher wanted
to find out if a certain test had a validity coefficient of .25 or higher, and there were only
80 incumbents in the target position for whom test and job performance data was
available. In this situation, they could be about 73% confident (i.e., have 73% power) of
finding such a coefficient (if it existed to be found). With twice the sample size (160
subjects), power is increased to about 94%, which provides the researcher an almost
certain ability to find out whether the test was valid for the target position.
308
Public Personnel Management Volume 39 No. 4 Winter 2010

Overview of Validity Generalization
VG studies rely on a research technique called meta-analysis. Meta-analysis seeks to
combine the results of several similar research studies to form general theories about
relationships between similar variables across different situations. As early as 1977,
Schmidt & Hunter5 applied meta-analyses techniques to the field of personnel testing
and framed it as VG. Prior to this time, meta-analyses in the personnel testing and
psychological literature was very rare,6 but it has since grown to widespread use in the
academic field.
The purpose for conducting VG studies in the personnel field is to evaluate the
effectiveness (i.e., validity) of a particular type of personnel test (e.g., personality,
integrity, conscientiousness) and to describe what the findings mean in a broader
sense.7 Practically speaking, VG studies are conducted by compiling several related
local criterion-related validity studies into an aggregate analysis to determine the
overall effectiveness of the test(s) included in the study for the jobs and settings
involved. VG studies also make use of various statistical corrections (e.g., sampling
error, range restriction, and criterion unreliability) designed to the researcher to
forecast what the overall operational validity of the test(s) may, in fact, be if they were
not hampered by these suppressors.
Some researchers that conduct VG studies apply the “75 Percent Rule” to determine
whether validity can be generalized outside of the VG study to other situations. The 75
Percent Rule evaluates whether at least 75 percent of variance in the observed validities
(in the VG study) are said to be accounted for by the correctable statistical artifacts (i.e.,
sampling error, criterion unreliability, predictor unreliability, and range restriction on the
predictor), then the variance between validities is assumed to be zero because the
uncorrected artifacts would likely account for the remaining 25 percent of variance. VG
studies where at least 75 percent of the variance is explained by these correctable artifacts
are said to generalize to other settings outside those included in the study.
Another more contemporary tool used in VG research is the credibility interval
which is used by some researchers to determine the extent to which validity can be
generalized outside the VG study. The credibility interval is an estimate of the variability
of individual correlations across studies and informs the researcher the percentage of
correlations in the study that are “not likely to be zero.” For example, an 80% credibility
interval indicates that 90% of the individual correlations in the VG study excluded
zero
.8
One of the major limitations of “corrected” VG studies (as will be discussed
more in depth below) is that there is no guarantee that employers would find the level
of validity promised by the result of a VG study if a study was performed in a new local
setting. This is primarily because a host of situational factors exist in each and every
new situation that may drastically impact the validity of a test. In addition, there are a
number of limitations with typical VG studies that may further limit their relevance and
reliability when evaluating test validity in new situations (see discussion below).
However, VG studies offer useful insights into the strength of the relationship between
the test and job performance in the studies included in the VG analysis and can be
immensely useful in personnel research studies.
Public Personnel Management Volume 39 No. 4 Winter 2010
309

Federal and Professional Requirements
Surrounding Validity Generalization

Because there is a high degree of overlap and agreement between the Uniform
Guidelines and the professional standards regarding the basics involved in conducting
and interpreting local criterion-related validity...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT