Assessment for Monitoring of Education Systems: International Comparisons

DOI10.1177/0002716219843804
Published date01 May 2019
Date01 May 2019
Subject MatterAssessments for System Monitoring
/tmp/tmp-17sYpTcCgg9Gzl/input 843804ANN
The Annals of The American AcademyAssessment for Monitoring of Education Systems
research-article2019
Over the last two decades, with the increase in both
numbers of participating jurisdictions and media atten-
tion, international large-scale assessments (ILSAs) have
come to play a more salient role in global education
policies than they once did. this has led to calls for
greater transparency with regard to instrument devel-
opment and closer scrutiny of the use of instruments in
education policy. We begin with a brief review of the
history of ILSAs and describe the requirements and
Assessment for constraints that shape ILSA design, implementation,
and analysis. We then evaluate the rationales of employ-
Monitoring of ing ILSA results for different purposes, ranging from
those we argue are most appropriate (comparative
description) to least appropriate (causal inference). We
Education
cite examples of ILSA usage from different countries,
with particular attention to the widespread misinter-
Systems:
pretations and misuses of country rankings based on
average scores on an assessment (e.g., literacy or
numeracy). Looking forward, we offer suggestions on
International how to enhance the constructive roles that ILSAs play
in informing education policy.
Comparisons Keywords: international large-scale assessments;
cross-country comparisons; PISA; tIMSS;
PIrLS; validity; comparability; league
tables
By
HENry I. BrAuN
Countries jealously guard their sovereignty,
reacting negatively to suggestions that they
and
should follow the lead of—or take direction
JuDItH D. SINgEr
from—other countries’ policies. Education
offers an interesting counterexample, as
attested by the prominent roles that interna-
tional large-scale assessments (ILSAs) play in
today’s policy debates.1 Countries use ILSAs to
Henry I. Braun is Boisi Professor of Education & Public
Policy at Boston College. He is a member of the
National Academy of Education. Recent journal publi-
cations include “How Long Is the Shadow? The
Relationships of Family Background to Selected Adult
Outcomes” and “Testing International Education
Assessments: Rankings Get Headlines, But Can
Mislead” (with Judith D. Singer).
Correspondence: braunh@bc.edu
DOI: 10.1177/0002716219843804
ANNALS, AAPSS, 683, May 2019 75

76
tHE ANNALS OF tHE AMErICAN ACADEMy
benchmark their performance against peers and to glean policy insights from the
“high-flyers”—countries that consistently achieve high rankings or have shown
rapid improvements. Education policy-makers are increasingly receptive to the
advice and prescriptions offered by analysts who offer policy recommendations
based on their own interpretations of ILSA data (e.g., Barber and Mourshed
2007; Mourshed, Krawitz, and Dorn 2017).2 At the same time, there are legiti-
mate concerns regarding the possible misuses of ILSA results (Lockheed and
Wagemaker 2013).
ILSAs were not always so influential. When first fielded in the 1960s and
1970s, the small number of countries participating did so primarily to “monitor”
their students’ achievement in the sense of “observing,” “checking,” or describing
the “efficiency” of their school systems (Postlethwaite 1967). this once passive
monitoring or describing has become more activist. In the united States, the
trend toward increased influence accelerated with the landmark Nation at Risk
report (gardner et al. 1983), which warned that if our educational condition had
been caused by foreign nations it would have been seen as an “act of war” (p. 5).
Many countries have experienced similar bouts of increased preoccupation with
performance on international comparisons, leading to new terms, such as “PISA
shock” (e.g., Davoli and Entorf 2018), to characterize the effects of disappointing
performance on public and professional judgment. Media coverage has skyrock-
eted, especially reports presenting countries’ rankings as “league tables” (a term
borrowed from the British practice of ranking sports teams within leagues).
Legislators, policy-makers, policy analysts at think tanks and foundations, edu-
cational researchers, educators, the business community, and the public-at-large
are among the many stakeholders with interest in ILSA results today, in the
united States and elsewhere. Pundits from all these perspectives can be (all too)
quick to offer kitchen-table solutions to their country’s perceived education woes,
based on their interpretations of the results. When dissenting voices criticize
such interpretations, their complaints are often attributed to political or ideologi-
cal biases rather than to concerns over the quality and interpretation of the data.
How did these patterns of use evolve? What are the consequences, both posi-
tive and negative? What might the future hold? We begin with a brief history of
ILSAs, focusing especially on the ways their history is linked with, and differs
from, national assessments (see Fahle, Shear, and Shores, this volume). We then
address some major technical issues, followed by a review of uses and misuses of
ILSA data, drawing attention in particular to flaws in the translation of the results
into rankings. We conclude with recommendations for enhancing ILSAs,
particularly as they transition from conventional paper-and-pencil formats to
Judith D. Singer is senior vice provost for faculty development and diversity and James Bryant
Conant Professor of Education at Harvard University. Her scholarship focuses on improving
the quantitative methods used in social, educational, and behavioral research. Among her
numerous publications is Applied Longitudinal Data Analysis: Modeling Change and Event
Occurrence (Oxford University Pres 2003).
NOtE: the authors would like to thank Michael Feuer and Amy Berman for helpful com-
ments and suggestions.

ASSESSMENt FOr MONItOrINg OF EDuCAtION SyStEMS
77
digitally based assessments (DBA). Although our perspective is mostly u.S.-
centric, we offer examples of increased influence of ILSAs globally.
A Brief History
An ILSA is a coordinated, international effort to gather credible, comparable
evidence of achievement in various cognitive domains, as well as additional infor-
mation intended to enhance the interpretability of the findings. typically the
domains are a subset of reading, mathematics, science, and problem-solving.
ILSAs also include questionnaires that elicit background and other contextual
information. Although target populations differ, each ILSA claims to draw
nationally representative samples of students (by age or grade) or adults (by age).
today, the three best-known ILSAs—trends in International Math and Science
Study (tIMSS), Programme for International Student Assessment (PISA), and
Progress in International reading Literacy Study (PIrLS)—focus on in-school
student populations. Other ILSAs assess cognitive skills, educational attainment,
and the labor market experiences of adults, for example, the Programme for the
International Assessment of Adult Competencies (PIAAC). there are several
regional ILSAs and others with a specific thematic focus that are administered on
an irregular schedule (e.g., the International Civics and Citizenship Education
Study, International Computer and Information Literacy Study, the teaching
and Learning International Study).
the increased interest and level of participation in ILSAs can be linked in part
to governments’ growing realization of the importance of “human capital” to
compete successfully in the global and technology infused economy (see also
Hanushek, this volume). Human capital is shorthand for a broad set of cognitive
skills, knowledge, competence for collaboration and teamwork, motivation, per-
sistence, reliability, and self-discipline (Becker 1964; Kirsch and Braun 2016).
Education is regarded as the principal tool for building human capital and pre-
paring students for responsible citizenship.
the first ILSA, conducted in 1964, was ambitiously dubbed the First
International Mathematics Study (FIMS), implying hopefully (and presciently)
that there would be more to come (see Feuer 2011). FIMS researchers drew
probability samples of 13-year-old students in twelve countries to ascertain the
feasibility of obtaining comparable samples and constructing a family of assess-
ments that could yield credible comparisons of educational achievement across
nations. the success of the pilot, despite numerous logistical challenges, led the
International Association for the Evaluation of Educational Achievement (IEA),
founded in 1958, to field assessments in mathematics and science, culminating in
the establishment of tIMSS and, later, PIrLS (Schmidt et al. 2018).
Coincidentally, in the mid-1960s, Harold (“Doc”) Howe, then–u.S.
Commissioner of Education, became convinced that school input data (including
teacher-student ratios and per-pupil spending) were inadequate to tell how our
education system was doing. Howe argued that the hodgepodge of state and local

78
tHE ANNALS OF tHE AMErICAN ACADEMy
testing was not useful and that systematically collected measures of output (i.e.,
student achievement) were a necessary complement. Publication of the landmark
Equality of Educational Opportunity Report (Coleman 1966) reinforced the shift
toward measurement of outcomes. For that report, Coleman and his associates
surveyed tens of thousands of households and used the...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT