Text analysis: a simple "big data" tool for local government.

Author:Reitano, Vincent

Text analysis is an intuitive and low-cost tool that can quickly analyze data. Simple algorithms display the readability of a given text passage, helping governments determine how easily staff, officials, and the general public can access its documents. A limited analysis of budget documents from major U.S. cities shows that comprehensive annual financial reports (CAFRs) are often at or above a college reading level, while popular annual financial reports (PAFRs) range from an 8th to 12th grade level.

This article develops suggestions to make documents more accessible and readable, and provides a case study for text readability measurement used by the City of Dubuque, Iowa, in a performance-measurement context.


Text analysis is a broad term that is defined in a variety of ways. It generally refers to a set of procedures that analyze written text and produce scores that capture different dimensions of the text, such as readability. Text analysis examines the structure and length of sentences and words through classification schemes such as an automated count of multisyllabic words and a numeric measure of the text's grade level. A paragraph with many multisyllabic and technical words and a high grade level is challenging for the general population to read, so it may be necessary to change the wording and structure to make interpretation consistent across segments of the population.

A common measure is the Flesch Reading Ease score, which is used to evaluate text on a scale from 0 to 100, with 0 being very difficult to read and 100 being very easy. It can be run with most types of text, from newspaper articles to technical reports. When it was published in 1949, creator Rudolf Flesch estimated that fewer than 5 percent of all U.S. adults could read at a college level, while 93 percent could read at a 5th grade level. (1) This number has likely shifted, but the average adult still reads at approximately a 6th to 8th grade level. Microsoft Word can compute the Flesch Reading Ease score for any written document. For example, the score for this article is 35.9, with a grade level of 13.8.

There are many other types of measures and many programs that quickly automate readability measures, as well as more advanced measures such as sentiment analysis, an algorithm that determines if the writing contains different sentiments (e.g., positive or negative). These measures are highly complex and require an extensive understanding...

To continue reading