The Promise of Linked Historical Census Data.

AuthorEriksson, Katherine
PositionResearch Summaries

Individual records from the 1950 US Census were publicly released on April 1, 2022. Economic historians had been waiting for this day for 10 years. This data source, like the individual-level data from earlier censuses, makes it possible to locate the information reported by a specific person.

I found the records for my grandparents along with those for my mother, who was born in December 1949. They lived in rural Lincoln County, Kentucky. My grandfather, Bernard Camenisch, born in Kentucky to a Swiss father, worked 92 hours the previous week as a dairy farmer. A decade earlier, in the 1940 Census, he was living with his father, also a farmer; he worked 60 hours the week prior to answering that census survey. My grandmother Dorothy was a "sample line respondent," and so answered questions asked to only one in five individuals.

My research program, with a range of coauthors, uses publicly available census data with names and other identifying information to create large panel datasets. This research follows men--who are easier than women to track from one census to the next--across decades in the US and other countries. These linked datasets enable us to answer a range of questions about the impact of early-life shocks on adult outcomes. For example, what was the effect of closing schools during the 1918 flu pandemic on children's later-life outcomes? What did the arrival in Southern counties of the boll weevil, a cotton-boll-eating beetle, do to children's school enrollment, and ultimately, educational attainment? What effect did the huge negative shock to family wealth of Emancipation have on the later-life economic standing of children of slave-holding families? How does migration feature in individual adjustments to environmental or immigration shocks?

Creating Linked Datasets

The digitization of the 1950 Census--its transformation from scanned images to a machine-readable database--is ongoing. This process was only completed in the past decade for US decadal censuses from 1850 through 1940. Researchers can access names, birthplaces, ages, occupations, and many other rich variables for every person enumerated in a specific census. Methods to link individuals across any combination of censuses rely on the fact that name, birth year, and birthplace do not change, for men at least, across decades.

Any linking method that uses these fixed characteristics to match observations across time faces some challenges. First, names are often spelled differently in different censuses by the time the data reaches researchers. The name could have been written incorrectly in the original source, the handwriting may be difficult to read, or there could be a basic transcription error. My grandfather's first name is listed as "Benard" in 1940. That is incorrect, but the handwriting is difficult to read. Second, not everyone remembers or knows their age. Particularly in a period when many people did not have birth certificates or had not gone to school for more than a few years, ages tend to be "heaped"--individuals are more likely to report multiples of 10 and five. Lastly, sometimes there are...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT