Identification problems in the social sciences and everyday life.

AuthorManski, Charles F.
PositionAssociation Lecture
  1. Introduction

    The Reflection Problem

    Here is an identification problem from everyday life: Suppose that you observe the almost simultaneous movements of a person and of his image in a mirror. Does the mirror image cause the person's movements, does the image reflect the person's movements, or do the person and image move together in response to a common external stimulus? Empirical observations alone cannot answer this question. Even if you were able to observe innumerable instances in which persons and their mirror images move together, you would not be able to logically deduce the process at work. To reach a conclusion requires that you understand something of optics and of human behavior.

    A like inferential problem, which I have called the reflection problem (Manski 1993a), arises if you try to interpret the common observation that individuals belonging to the same group tend to behave similarly. Two hypotheses often advanced to explain this phenomenon are endogenous effects, wherein the propensity of an individual to behave in some way varies with the prevalence of that behavior in the group; and correlated effects, wherein individuals in the same group tend to behave similarly because they face similar environments and have similar individual characteristics.

    Similar behavior within groups could stem from endogenous effects (e.g., group members could experience pressure to conform to group norms) or group similarities might reflect correlated effects (e.g., persons with similar characteristics might choose to associate with one another). Empirical observations of the behavior of individuals in groups, even innumerable such observations, cannot per se distinguish between these hypotheses. To draw conclusions requires that empirical evidence be combined with sufficiently strong maintained assumptions about the nature of individual behavior and social interactions.

    Why might you care whether observed patterns of behavior are generated by endogenous effects, by correlated effects, or in some other way? A good practical reason is that different processes have differing implications for public policy. For example, understanding how students interact in classrooms is critical to the evaluation of many aspects of educational policy, from ability tracking to class size standards to racial integration programs.

    Suppose that, unable to interpret observed patterns of behavior, you seek the expert advice of two social scientists. One, perhaps a sociologist, asserts that pressure to conform to group norms makes the individuals in a group tend to behave similarly. The other, perhaps an economist, asserts that persons with similar characteristics choose to associate with one another. Both assertions are consistent with the empirical evidence. The data alone cannot reveal whether one assertion or the other is correct. Perhaps both are. This is an identification problem.

    Identification and Statistical Inference

    Identification problems are problems of deductive logic. The conclusions that a researcher can logically draw are determined by the assumptions and data that are brought to bear. The available data about human behavior are typically limited, and the range of plausible assumptions is wide. So researchers who analyze the same data under different maintained assumptions may, and often do, reach different logically valid conclusions.

    Empirical researchers often ask econometricians for assistance in "solving" identification problems. This is asking too much. What econometricians can usefully do is to clarify what conclusions can and cannot logically be drawn given empirically relevant combinations of assumptions and data.

    For more than a century, methodological research in the social sciences has made productive use of probability and statistics. One supposes that the empirical problem is to infer some feature of a population described by a probability distribution and that the available data are observations extracted from the population by some sampling process. One combines the data with assumptions about the population and the sampling process to draw statistical conclusions about the population feature of interest.

    Working within this familiar framework, econometricians have found it useful to separate inference into statistical and identification components. Studies of identification determine the conclusions that could be drawn if a researcher were able to observe a data sample of unlimited size. Statistical inference seeks to characterize how sampling variability affects the conclusions that can be drawn from samples of limited size.

    Identification and statistical inference are sufficiently distinct for it to be fruitful to study them separately. The usefulness of separating the identification and statistical components of inference has long been recognized. Koopmans (1949, p. 132) put it this way in the article that introduced the term identification into the literature:

    In our discussion we have used the phrase "a parameter that can be determined from a sufficient number of observations." We shall now define this concept more sharply, and give it the name identifiability of a parameter. Instead of reasoning, as before, from "a sufficiently large number of observations" we shall base our discussion on a hypothetical knowledge of the probability distribution of the observations, as defined more fully below. It is clear that exact knowledge of this probability distribution cannot be derived from any finite number of observations. Such knowledge is the limit approachable but not attainable by extended observation. By hypothesizing nevertheless the full availability of such knowledge, we obtain a clear separation between problems of statistical inference arising from the variability of finite samples, and problems of identification in which we explore the limits to which inference even from an infinite number of observations is suspect.

    My Research Program

    I have been concerned with identification problems throughout my career. My early research concerned the problem of inference on people's preferences from observations of the choices that they make. Economists are fond of saying that choice behavior "reveals preferences." In fact, observation of the action that a person chooses only reveals that this action is weakly preferred to all other feasible actions. It does not reveal how the person ranks nonchosen actions relative to one another. Discrete choice analysis, as it is practiced in econometrics, combines data on choices with assumptions about the decision rules that individuals use when making the choices that researchers observe. The concern of my early research was to determine what can be learned about preferences given data on choices and relatively weak assumptions about the decision rules that people use. This is an identification problem.

    Over time, I have come to think that, although statistical problems contribute to the difficulty of empirical research, identification is the more fundamental problem of the social sciences. In what follows, I first describe the broad themes of a research program that I began in the late 1980s and continue today. I next show how these themes have played out in my analysis of the selection problem, a fundamental and pervasive identification problem. I then examine how the selection problem manifests itself in the econometric analysis of market demand.

  2. Broad Themes

    My 1995 book, Identification Problems in the Social Sciences (Manski 1995), puts forward four broad, related themes. They are as follows.

    Begin with the Data Alone

    The prevalent approach to empirical research in the social sciences begins by maintaining assumptions that are strong enough to identify quantities of interest and to yield statistically precise point estimates of these quantities. Concerns about the credibility of assumptions are commonly addressed through the performance of specification tests and/or sensitivity analysis. Concerns about credibility may also be addressed...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT