Assessment for Monitoring of Education Systems: The U.S. Example

Date01 May 2019
AuthorErin M. Fahle,Benjamin R. Shear,Kenneth A. Shores
DOI10.1177/0002716219841014
Published date01 May 2019
Subject MatterAssessments for System Monitoring
/tmp/tmp-17fqu2G60JtXve/input 841014ANN
The Annals of The American AcademyAssessment For Monitoring of Education Systems
research-article2019
Standardized tests are regularly used as education sys-
tem monitoring tools to compare the average perfor-
mance of students living in different states or belonging
to different subgroups (e.g., defined by race/ethnicity,
sex, or parental income) and to track their progress
over time. This article describes some uses and design
features of tests in system monitoring contexts. We
provide the example of the National Assessment of
Educational Progress (NAEP), the only large-scale
Assessment for system monitoring test in the United States. The avail-
ability of NAEP data, in turn, has facilitated the con-
Monitoring of struction of the Stanford Education Data Archive
(SEDA), a publicly available database that can be used
to describe patterns of achievement for nearly all
Education
school districts in the United States. here, we discuss
progress in and challenges to the use of standardized
Systems: The tests as system monitoring tools.
U.S. Example Keywords: system monitoring; standardized testing;
National Assessment of Educational
Progress
The U.S. public education system has the
ambitious goal of providing a high-quality
By
education to approximately 50 million students
ErIN M. FAhLE,
enrolled in K–12 schools every year (McFarland
BENjAMIN r. ShEAr,
et al. 2017). Maintaining this system is a large
and
public expense: spending on K–12 education
KENNETh A. ShOrES
has remained at roughly 4.0 percent of gross
Erin M. Fahle is an assistant professor in the Department
of Administrative and Instructional Leadership at the
St. John’s University School of Education. Her research
focuses on describing and explaining variation in edu-
cational opportunity across the United States and has
appeared in Educational researcher and AErA Open.
Benjamin R. Shear is an assistant professor in the
Research and Evaluation Methodology program in the
School of Education at the University of Colorado
Boulder. His research focuses on the uses of educational
tests and applied statistical issues in educational meas-
urement and psychometrics, particularly those relevant
to validity and validation.
Correspondence: fahlee@stjohns.edu
DOI: 10.1177/0002716219841014
58 ANNALS,
AAPSS, 683, May 2019

ASSESSMENT FOr MONITOrINg OF EDUCATION SySTEMS
59
domestic product (gDP) (or, as of this writing, $707 billion) since 1965 (Snyder
et al. 2018). Monitoring the education system requires tracking a wide array of
information about the inputs and outcomes of the system. Standardized tests are
regularly used as education system monitoring tools to compare the average per-
formance of students living in different states or belonging to different subgroups
(e.g., defined by race/ethnicity, sex, or parental income) and to track their pro-
gress over time. Although there are many valued goals and outcomes of the
education system that cannot be measured with standardized tests, test scores
can (and do) serve as useful indicators of whether the education system is achiev-
ing the goal of having all students learn relevant academic content and skills.
In this article, we discuss progress in and challenges to the use of standardized
tests as system monitoring tools. We use the example of the National Assessment
of Educational Progress (NAEP), the only large-scale system monitoring test in
the United States, and describe how the availability of NAEP data has facilitated
the construction of the Stanford Education Data Archive (SEDA), a unique,
publicly available database that can be used to describe patterns of achievement
among nearly all public school districts in the United States.
We first explain how test results can be used for system monitoring. These
uses are different from classroom-based assessments (providing educators with
information about their own students’ performance; see Shepard, this volume)
and testing for accountability (which ties test score performance to explicit
consequences—see hanushek, this volume; Loeb and Byun, this volume). After
briefly discussing how the NAEP has evolved, we highlight design features that
make the NAEP well-suited for system monitoring. Finally, we describe a new
development in system monitoring, in which data from multiple testing programs
are combined to yield more detailed information about the education system,
and provide the example of SEDA.
Standardized Tests for System Monitoring
The education system has many important goals, which include supporting all
students in their learning of varied subject matter and academic skills (e.g., math-
ematics, reading, and science) and developing in them an array of important
social and behavioral capacities (e.g., civic engagement, effort, and ambition).
When used with other indicators of educational inputs and outcomes, carefully
designed standardized tests can be effective tools to measure the knowledge and
skills that students have gained. Other commonly used indicators of educational
Kenneth A. Shores is an assistant professor in Human Development and Family Studies at
Pennsylvania State University. He has published in such journals as American journal of
Sociology and Education Finance and Policy on topics of racial/ethnic test score inequality and
the effects of school finance reform.
NOTE: The authors thank jack Buckley, Ed haertel, Andrew ho, and Sean reardon for help-
ful comments; as well as two of the volume editors, Amy Berman and Michael j. Feuer; and
participants of the NAEd/AAPSS ANNALS Assessment Workshop. All errors are our own.
Authors are listed in alphabetical order.

60
ThE ANNALS OF ThE AMErICAN ACADEMy
opportunity are per-pupil expenditures, teacher qualifications, or high school
graduation rates; test scores complement these indicators by providing more
direct information about what students know and can do. On the other hand,
unlike indicators that are based on directly observable quantities, test scores
often represent unobservable constructs; therefore, scores are indirect indica-
tors, or approximations, of the underlying quantities of interest. Designing a test
for system monitoring, then, requires decisions about how to select and define
the constructs, and how to design a test or tests to measure them. Ideally, these
decisions will align with the overall goals of the education system by emphasizing
the content and skills deemed most important.
As with any measurement of educational performance, test scores become
valuable indicators not when they are generated but based on how they are used
and interpreted to provide insight into the education system. Professional assess-
ment standards require that test uses and interpretations be explicitly stated and
evaluated using theoretical and empirical evidence in a process known as valida-
tion (American Educational research Association [AErA], American Psychological
Association [APA], and National Council on Measurement in Education [NCME]
2014; Kane 2013). Validation begins by specifying an interpretation and use
argument—or IUA (Kane 2013)—that delineates the intended test uses and
interpretations, as well as key assumptions or claims underlying them. A complete
IUA will also incorporate a “theory of action” that articulates both how these uses
and interpretations are expected to lead to desired outcomes and what unintended
consequences of test use may need to be considered (Cronbach 1988; Messick
1995; haertel 2013; Kane 2013; Shepard 1993; Bennett 2010; Linn 1989). After
specifying an IUA, the next step of validation is to draw on theory and evidence to
evaluate whether the interpretations and uses specified in the IUA are warranted.
For example, experts may gather evidence to show that the content matches
specified standards and difficulty level, that all students are able to demonstrate
their knowledge equally (i.e., a lack of bias), or that decisions based on the test
scores lead to more desirable outcomes than could otherwise be achieved. Validity
thus refers to the degree to which this theory and evidence support proposed
interpretations and uses of test results articulated in the IUA (AErA, APA, and
NCME 2014), and is a property of score interpretations and uses, rather than an
inherent property of the test or testing procedure.
There are many possible uses of test score data for system monitoring. We
consider three, drawing on the framework developed by haertel (2013). First,
when reported publicly, test score data can be used to influence public opinion
or shape the dialogue around the public education system. Second, test scores
can be used to better understand how the education system is functioning or to
direct resources, by identifying regions, organizations, or groups that are per-
forming above or below expectations. Third, scores can be used as an outcome
measure, that is, the dependent variable, in program evaluations. They can
inform comparisons among different educational strategies or policies and help
to identify practices that increase educational achievement and equity (for the
general principles of education modeled as a production function, see hanushek
2007; or Murnane et al. 2000).

ASSESSMENT FOr MONITOrINg OF EDUCATION SySTEMS
61
These uses require making distinct (but often overlapping) inferences and inter-
pretations. At a minimum,...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT