Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity?

DOIhttp://doi.org/10.1111/jomf.12157
Published date01 February 2015
Date01 February 2015
AuthorRachel A. Gordon
R A. G University of Illinois at Chicago
Measuring Constructs in Family Science: How Can
Item Response Theory Improve Precision and
Validity?
This article provides family scientists with
an understanding of contemporary measure-
ment perspectives and the ways in which item
response theory (IRT) can be used to develop
measures with desired evidence of precisionand
validity for research uses. The article offers a
nontechnical introduction to some key features
of IRT, including its orientation toward locat-
ing items along an underlying dimension and
toward estimating precision of measurement
for persons with different levels of that same
construct. It also offers a didactic example
of how the approach can be used to rene
conceptualization and operationalization of
constructs in the family sciences, using data
from the National Longitudinal Survey of Youth
1979 (n=2,732). Three basic models are con-
sidered: (a) the Rasch and (b) two-parameter
logistic models for dichotomous items and (c)
the Rating Scale Model for multicategory items.
Throughout, the author highlights the potential
for researchers to elevate measurement to a
level on par with theorizing and testing about
relationships among constructs.
Constructs are fundamental ingredients of
family theories, forming the building blocks
Department of Sociology and Institute of Government and
Public Affairs, University of Illinois at Chicago, 815 West
VanBuren St., Suite 525, Chicago, IL 60607
(ragordon@uic.edu).
Key Words: measurement, methods, theory.
of research questions and hypotheses (White
& Klein, 2008). An essential component of
quantitative research is the operationalization
of such concepts, many of which are difcult to
observe. As a consequence, the reliability and
validity of instruments are central concerns of
family researchers. Although most family sci-
entists would likely agree with these statements,
reviews continue to periodically lament prob-
lems with the denition and operationalization
of key constructs in the eld. In this article I
consider the ways in which item response theory
(IRT) can help scholars determine whether they
have precisely and validly assessed constructs of
interest. A focus on IRT is important given that
many scholars are trained primarily in classical
test theory (CTT) approaches. As I discuss, IRT
is not a silver bullet: It cannot solve all mea-
surement challenges in the eld, and in many
cases CTT can be leveraged to meet similar
goals. IRT does, however, feature some aspects
of measurement that are less apparent in the
ways that applied scholars often use traditional
CTT approaches and, as such, has the potential
to help scholars rethink how they approach the
task of dening good measures.
I begin by providing a rationale for this
review, highlighting key recent publications. I
then discuss overarching principles of measure-
ment that encompass IRT and CTT. Doing so
is meant to broadly frame IRT within contem-
porary perspectives about sound measurement. I
emphasize three contemporary orientations: (a)
unied validity, (b) conceptual frames, and (c)
Journal of Marriage and Family 77 (February 2015): 147–176 147
DOI:10.1111/jomf.12157
148 Journal of Marriage and Family
local validation. I then offer a nontechnical intro-
duction to some key features of IRT and offer
references to direct readers who want to learn
more. Next, I provide an empirical example,
demonstrating some of the ways that IRT can
offer new insights. I aim to highlight the poten-
tial for researchers to elevate measurement to a
level on par with theorizing about and testing of
relationships among constructs. I end by encour-
aging publications of theoretical and empirical
research on measures in mainstream family sci-
ences journals.
W C T T
Before getting into specics of measurement
principles and IRT approaches, it is helpful to
lay out a rationale for the importance of this
topic. As already noted, most family scientists
would likely agree that good quantitative science
requires reliable and valid measures. Indeed,
most Method sections in journals like the Jour-
nal of Marriage and Family(JMF ) include some
information to make the case that the measures
are good assessments of their underlying con-
structs (e.g., pointing back to scale developers’
publications, reporting internal consistency esti-
mates for the sample in hand).
A challenge to the eld, however,is that scale
development and validation are not as integrated
into the published literature as are studies that
relate scale scores to one another with regres-
sion models. When unpublished, an instrument’s
development and renement do not benet from
the critical peer-review component of the sci-
entic process. Even when published, measure-
ment articles more often appear in specialized
methods journals than mainstream, substantively
oriented journals, making them less visible to
family scientists and somewhat divorced from
theory development. As a consequence, mea-
surement risks becoming a side exercise rather
than a central and substantial component of a
research project. Likely contributing to the sepa-
ration of measurement from mainstream science
is the fact that psychometric theory has advanced
rapidly in recent decades, with new models that
can seem quite different from (and more techni-
cal than) familiar CTT approaches. Indeed, the
limited training on measurement built into many
disciplines’ graduate programs exacerbates the
challenge to scientists of staying abreast of this
advancing eld (Aiken, West, & Millsap, 2008;
Aiken, West, Sechrest, & Reno, 1990).
Perhaps it is not surprising, then, that
Blinkhorn (1997) lamented that psychomet-
ric tools such as factor analysis are often mere
data-reduction techniques, and internal consis-
tency reliability estimates tend to be used in a
perfunctory way.Writings in other elds suggest
this issue is not unique to particular disciplines.
Several decades ago, Schwab (1980) noted that
theoretical advances in organizational studies
were hampered “because investigators have not
accorded measurement the same deference as
substantive theory (and) as a consequence, sub-
stantive conclusions have been generated that
may not be warranted” (p. 34). In the eld of
criminology, Piquero, Macintosh, and Hickman
(2002, p. 521) concluded that “researchers have
become too complacent and uncritical about the
measurement of their key dependent variable,
[self-reported delinquency].” And, in a review
of articles published in top criminology jour-
nals between 2007 and 2008, Sweeten (2012)
identied only ve of 130 studies that used IRT
approaches.
Reviewing articles published since 2000, I
likewise identied a limited but growing set of
exemplary studies, including in prominent fam-
ily science journals (e.g., Bridges et al., 2012;
Browning, 2002; Browning & Burrington,
2006; Fincham & Rogge, 2010; Funk & Rogge,
2007; Krishnakumar, Buehler, & Barber, 2004).
Several studies have demonstrated the poten-
tial ways in which IRT can be used to create
shortened versions of scales that are nearly as
informative as longer versions (Cole, Rabin,
Smith, & Kaufman, 2004; DeWalt et al., 2013;
Piquero et al., 2002); another showed how
IRT suggested that fewer response categories
were needed than in an original scale (Osgood,
McMorris, & Potenza, 2002). Reducing the
number of items and categories in these ways,
while maintaining precision of measurement,
can reduce cost and response burden in studies.
Several studies also highlighted the extent to
which items were well targeted at the popu-
lations of interest. In many cases, items were
concentrated in one pole of the construct (with
little or much of the construct), with fewer items
at the other pole (Osgood et al., 2002; Piquero
et al., 2002). IRT models helped show how esti-
mates based on these measures were imprecise
for people in the regions of the construct lacking
items, leading to reduced power for detecting
associations. Tests of differential item func-
tioning also showed that in some cases items

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT