Four Parameters of Interest in the Evaluation of Social Programs.

AuthorHeckman, James

James Heckman [*]

Justin L. Tobias [+]

Edward Vytlacil [++]

This paper reviews four treatment parameters that have become commonly used in the program evaluation literature: the average treatment effect, the effect of treatment on the treated, the local average treatment effect, and the marginal treatment effect. We derive simply computed closed-form expressions for these treatment parameters in a latent variable framework with Gaussian error terms. These parameters can be estimated using nothing more than output from a standard two-step procedure. We also briefly describe recent work that seeks to go beyond mean effects and estimate the distributions associated with various outcome gains. The techniques presented in the paper are applied to estimate the return to some form of college education for various populations using data from the National Longitudinal Survey of Youth.

  1. Introduction

    The problem of evaluating the effectiveness of a social program or a "treatment" is a central problem in social science and medicine. The problem of selection bias could arise in any evaluation. Individuals observed participating in a program or receiving treatment often possess different characteristics than an average person. Evaluating the economic return to a program requires accounting for the nonrandom assignment of individuals into the treated and untreated states.

    One popular approach for dealing with selection bias, introduced in Gronau (1974) and Heckman (1974, 1976), is to specify a latent index model that relates the rule for assigning individuals from treatment to the potential treatment outcomes. The latent index has the interpretation of the expected net utility derived from receiving treatment; individuals participate in a program if net utility is positive (or nonnegative) and do not participate if net utility is negative. This approach is based on assumptions about error distributions and allows for dependence between the errors in outcome and choice equations. Although computationally convenient, this approach has been criticized for its reliance on distributional assumptions and lack of robustness to departures from normality (Goldberger 1983; Paarsch 1984; and later work by Glynn, Laird, and Rubin 1986), although the empirical relevance of this criticism is far from clear (Heckman 2001).

    In response to these criticisms, recent analysts have adopted a more robust approach and have attempted to identify and estimate various treatment parameters without imposing strong distributional assumptions (see, for example, the local average treatment effect [LATE] analysis of Imbens and Angrist 1994). Although these methods are free of parametric distributional assumptions, they typically estimate only one treatment parameter and are quite limited in the range of policy questions they can answer (Heckman and Vytlacil 2000a). Further, the assumptions imposed in LATE analysis are actually equivalent to those required to specify a nonparametric selection model (Vytlacil 2002).

    This paper uses a latent variable framework to unite the recent treatment effect literature with the classical selection bias literature. We obtain simple closed-form expressions for four treatment parameters of interest: the average treatment effect (ATE), the effect of treatment on the treated (TT), LATE (Imbens and Angrist 1994), and the marginal treatment effect (MTE) Bjorklund and Moffitt 1987; Heckman 1997; Heckman and Vytlacil 1999, 2000a, b) for the "textbook" Gaussian selection model. Our impression is that despite recent advances in nonparametric and semiparametric estimation of these parameters, many practitioners will continue to use the two-step estimator of Heckman (1976) when confronted with selection bias, and thus it is beneficial to clearly describe simple methods for estimating these parameters in the textbook selection model. For others, these expressions may be used as a starting point to illustrate the empirical importance of selection bias. Throughout this paper, we review other recent work that has relaxed the distributional requirements of this textbook model.

    In addition to presenting mean effects, we also discuss how one might approach estimation of the distributions associated with these parameters of interest. The extension to the distributions of outcome gains is not immediate, nor without difficulty, since the distributions of interest depend on the unidentified cross-potential outcome correlation parameter. We briefly mention several approaches for estimating these distributions, and provide the reader with references for further information on this topic.

    The plan of this paper is as follows. In the next section, we present a general model of potential outcomes, and define and interpret the various treatment parameters within it. In section 3, expressions for these parameters are derived under the assumption of trivariate normality. In section 4, we briefly discuss how one might approach estimation of the distributions associated with various outcome gains, and thus extend the analysis of mean effects. Section 5 applies the mean effect analysis to estimate various average gains in postschooling eamings resulting from the receipt of some form of college education. Using data from the National Longitudinal Survey of Youth (NLSY) we present point estimates of ATE, TT, LATE and MTE. The paper concludes with a summary in section [6].

  2. Treatment Parameters in a Canonical Model

    Consider a model of potential outcomes:

    [Y.sup.1] = X[[beta].sup.1] + [U.sup.1], [Y.sup.0] = X[[beta].sup.0] + [U.sup.0], [D.sup.*] = Z[theta] + [U.sup.D]

    The first two equations denote outcome equations in two possible "states" or "sectors" (college or noncollege in the application of section 5). Without loss of generality, we assume that the first state indexed by the "1" superscript represents the treated state and the "0" superscript denotes the untreated state. Each agent is observed in only one state, so that either [Y.sup.1] or [Y.sup.0] is observed for each person, but the pair ([Y.sup.1], [Y.sup.0]) is never observed for any given person. What we would like to recover is information about various expected gains from the receipt of treatment, where the gain is denoted by [delta] [equivalent to] [Y.sub.1] = [Y.sub.0].

    Let D(Z) denote the observed treatment decision, where D(Z) = 1 denotes receipt of treatment and D(Z) = 0 denotes nonreceipt. The variable [D.sup.*] is a latent variable that generates D(Z) according to a threshold crossing rule,

    D(Z) = 1[[D.sup.*](Z) [greater than or equal to] 0] = l[Z[theta] + [U.sup.D] [greater than or equal to] 0] (2)

    where 1[A] is the indicator function that takes the value 1 if the event A is true and the value O otherwise. In an extension of the Roy (1951) model, [D.sup.*] = [Y.sup.1] - [Y.sup.0] - C, where C represents the cost of participating in the treated state, so that agents choose to receive treatment if the gain from participating in the program minus costs is nonnegative. We also define the following counterfactual choice variables. For any z that is a potential realization of Z, we define the variable D(z) = l[z[theta] [greater than or equal to] [U.sup.D]]. D(z) indicates whether or not the individual would have received treatment had her value of Z been externally set to z, holding her unobserved [U.sup.D] constant. We require an exclusion restriction and denote by [Z.sub.k] some element of Z that is not contained in X. By varying [Z.sub.k], we can manipulate an individual's probability of receiving treatment without affecting the potential outcomes. Finally, we assume ([U.sup.D] [U.sup.1] [U.sup.0]) is independent of X a nd Z.

    Letting Y denote observed earnings,

    Y = [DY.sup.1] + (l-D) [Y.sup.0]. (3)

    This model has been called the switching regression model of Quandt (1972), Rubin's model (Rubin 1978), or the Roy model of income distribution (Roy 1951; Heckman and Honore 1990). [1] To illustrate how a model of this type can he applied to evaluate an interesting policy question, consider the problem of estimating the return to a college education. In this case, Y represents log earnings, [Y.sup.1] denotes the log earnings of college graduates, and [Y.sup.0] denotes the log earnings of those not selecting into higher education. The latent index maps people into either the "college" (or treated) state and the "no-college" (or untreated) state. To estimate the return to college, we might estimate the expected college log wage premium for given characteristics X, (i.e., E[[Y.sup.1] - [Y.sup.0]\X]). [2] In general, given the model described by Equations 1 and 2, we would like to have methods for estimating various average gains to program participation. In this paper, we examine four such treatment parameters, w hich measure possibly different average gains to the receipt of treatment. These four parameters are ATE, TT, LATE, and MTE. [3]

    ATE is defined as the expected gain from participating in the program for a randomly chosen individual. As before, we let [delta] [equivalent] [Y.sup.1] - [Y.sup.0] denote the gain from program participation, and note that the average treatment effect conditional on X = x can be expressed as:

    ATE(x) = E([delta]\X = x) = x([[beta].sup.1] - [[beta].sup.0]).

    The average treatment effect evaluated at the random variable X is ATE(X), which defines the treatment parameter as a function of the characteristics X. We can obtain unconditional estimates by integrating...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT