The possibility of global data sets: an interview with Kalev Leetaru.

PositionInterview

Kalev Leetaru is the creator of GDELT (Global Database of Events, Language, and Tone), a comprehensive, open source data set that aggregates news media to track political events and protests throughout the world, as well as the people, organizations, themes, and emotions underlying them. He just finished a term as the 2013-2014 Yahoo Fellow at the School of Foreign Service at Georgetown University and is working to expand the GDELT project. Leetaru provided the Journal with insight regarding the origins of the database, some of its complexities, new developments, and his vision for how GDELT can forecast future uprisings and political violence to help those affected. (1)

Journal of International Affairs: How did GDELT get started? What was the vision behind it?

Kalev Leetaru: The genesis of the project was really because of my interests in political and social movement forecasting and the associated latent dimension of information--things like emotion and thematic undertones. This includes how people absorb the world around them, and how that is changing their understandings and interactions with the world. I did a piece in 2011, called Cultromics 2.0, and what it showed is that you can use the tone--positive to negative--of global media to give some pretty powerful forecasts of country collapse. I wanted to take it further and start forecasting protests and all different types of violence, but there simply were not data sets out there that gave you these things. There were data sets that list countries that have collapsed, but there were no data sets, for example, providing a list of all the labor protests worldwide. So this was really the catalyst of the project; there are lots of these event data sets available but each one of them was limited in scope and functionality. They each focused on a few event categories or a single area of the world, or short period of time. Data sets for many parts of the world were altogether nonexistent. For example, there were very few data sets on Latin America. So the idea was 'how do we create this database of all these different events that allows us to perform useful forecasting?'

Journal: Can you discuss how you decided to use news media to track events? What were your responses to some of the criticisms of this decision?

Leetaru: News media is the only extant cross-national historical data set. Government statistics are highly uneven and often partial. Some countries may have richly detailed statistics, other countries have very sparse statistical records. Government data, likewise, tends to contain various degrees of bias. In India, for example, there were massive...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT