In the process of reviewing literature related to assessments in medical and health education, we came across a recent article by Jamieson in Medical Education that attempts to outline some of the (alleged) abuses of Likert scales with suggestions of how researchers can overcome some of these methodological pitfalls and limitations(1). However, many of the ideas advanced in the Jamieson article relative to Likert "scales," as well as a great many of articles it cited(2)-(6), are themselves common misunderstandings, misconceptions, conceptual errors, myths and "urban legends" about Likert scales and their characteristics and qualities that have been propagated and perpetuated across decades, for a variety of different reasons, including a lack of first hand familiarity and understanding of primary sources (i.e., Likert's actual writings), and various and definitive primary empirical studies done by Likert and others (see below). In this respect, Jamieson is no different from the dozens of sources over a twenty year period she cites in her article about "Likert scales." Further, this problem is not just confined to the field of medicine and medical education, as the majority of the articles that are the source and propagators of many of the most important errors and misunderstandings currently extant concerning "Likert scales," are from psychology, education and the field of psychometrics in the fifties and early sixties(7-13). These "root of current urban legend" articles, moreover, are additionally more than just "historical curiosities" to anyone who has actually read Likert in the original or constructed and empirically developed a "Likert scale" according to his theoretical model and writings(14),(15). This article, therefore, addresses this important problem and a number of persistent misconceptions, misunderstandings, and factual and empirical errors, myths and untruths about Likert scales and their characteristics and properties with the hope of helping researchers and practitioners understand the various factors, complexities, specifications and sophisticated nuances that must be considered whenever any given measurement scale (or response format) is used, developed, or analyzed. Further, one of the central points in this article for medical and allied health educators and researchers (as well as those in other fields) is that the same level of skill, ability, theory, and rigor that goes into all scientific and biomedical measurements is also required in educational measurements of all types and kinds for the serious educational scholar and researcher, as the principles of scientific measurement are the principles of scientific measurement (and the "heart of science") in virtually all domains(16).
Scales versus Response Formats: One of the primary confusions in the Jamieson article centers on the use of the word scale (versus response format). Clearly, the author, similar to a large number of the sources she cites, is referring to a response format as opposed to a (measurement) scale (see below) in her discussion, yet no distinction whatsoever is made between the two, as if such a distinction is either unimportant or does not exist, both of which could not be further from measurement theory or the truth, or Likert's actual and original writings on these matters(14). This particular point is so central to accurately understanding a Likert scale (and other scales and psychometric principles as well) that it serves as the bedrock and the conceptual, theoretical and empirical baseline from which to address and discuss a number of key misunderstandings, urban legends and research myths.
Distinguishing between a scale and a response format is not always easy to do, or straight forward, because it first requires some linguistical analysis and close attention to word and term meanings, and the contexts in which the word or term are used. This particular point is true and important and needs to be understood as both "measurement" and "statistics" are areas of poor, careless, ambiguous, confusing, and misleading language usage, as well as areas of profuse and unthoughtful usage of a wide variety of professional slang. Of particular importance is the fact that the word and term scale and response format in the domains of measurement and statistics is like the word "interval" in these same two domains, which has several different specific meanings; namely, interval scale, data interval (obviously different from scale), confidence interval, and so on, as "interval" is a generic idea and concept that is used, defined and particularized in many important different ways in both of these domains. The key here is that the word "interval" has a qualifying term (adjective) in each of these instances. The problem, however, is that the absence (or implied presence) of the appropriate qualifying terms in a given content can create many confusions, misunderstanding, and errors of various kinds. There are many such terms, words and concepts in educational, psychological, and sociological measurement. Further, linguistic sloppiness or carelessness and slang (or "techie") usage of such words and terms by people doing work in these domains (and most particularly the alleged "specialists") is one of the major sources and causes of difficulties, which leads to multiple confusions, misunderstandings, errors, myths and urban legends, and particularly so for novices, or someone new to the particular sub-area in this field (a prime example of these points is the term "logit regression").
To clarify this problem further and elucidate its many facets, consider the following linguistic/conceptual problem. The (fictitious) 20 item Box personality test (which is a scale) has a binary response format (what would carelessly and inaccurately be called a "scale" by the majority of professionals today, which as will be seen below, has nothing to do with it being binary). The sloppy and incorrect language (and thus meaning and conceptualization) that one typically encounters relative to this example and statement (and by measurement and psychometric professionals who are often the worst of the offenders) is "the Box personality scale has a binary response scale," where the meaning of the term binary response scale in the statement is connotatively referring to a particular data type (i.e., a nominal [data] scale). This impoverished "techie slang speak" (TSS) is a careless, colloquial, (and connotative) usage of the word scale for (and to mean) data type. So we now have 3 different usages and meaning of the word "scale" in one sentence; namely, the (real) measurement scale (the Box test or instrument), the scalar properties of the response format (or lack thereof), and the data type (often also confusing called the "measurement scale" of the data). Also, it should be noted that (truly) nominal data is held not to be a scale at all because it has no underlying continuum, so the errors and carelessness is further compounded if we do not define the binary categories of the "scale" (i.e. response format). If the binary categories are "agree" and "disagree" as opposed to "yes and no" or "true and false," then we have a severely truncated ordinal response format (and data type) as opposed to a nominal response format (and data type), and there is an underlying continuum even though the response format is binary. Note well and clearly, please, that the adjective "ordinal" in the previous sentence is a scalar property of the item response format (and not the 20 item instrument, which is the real scale) and that this ordinal characteristic is actually something more than just the property of the item response format alone, as will be seen more specifically below.
Language, at every level, therefore, is not only critically important, but it should also be clearly noted that just about every intelligence and achievement test reduces the item response format used to binary form (correct and incorrect) and both are considered to be and treated statistically as interval scales, which starkly contradicts logically and empirically most of what Jamieson and the many "authorities" she cites have to say about response formats, scale types, and the do's and don'ts of statistically analyzing and interpreting them. All of these points also emphasize and illustrate that there are critical conceptual and operational differences between a response format (information capture protocol or device) and a response format scoring (meaningful coding) procedure (or protocol) that is used to transform the information captured into an element or unit of an interpretive system, and, hopefully, theory of some kind. Some response formats fuse these two item components (the capture and scoring/coding of the information) and have each subject (i.e., respondent) do their own "coding" of the (covert) information the subject is processing/experiencing in real time (thus reducing researcher burden and costs), while other response formats do not fuse the capture and coding component of the generic 3 component standard "item" model (i.e., the stem [stimulus or question etc], response capture procedure/device, and the response transformation into meaning units [coding or scoring etc] item components). So "scalar" properties at the item level tend to be the properties of the data transformation [coding/scoring] component of the standard item model rather than the response format or stem components, although scalar properties may be built into these components also for a variety of different reasons, some practical and some theoretical. So in this model of an item, an item in a patient examination protocol or scale would be: "open please" (stem), insert thermometer, wait appropriate amount of time, visually observe digital value (information capture), say "you're not sick today my boy, back to work with you" (coded and interpreted response). It should be...