In the process of conducting a meta-analysis regarding the effectiveness of field instruction in social work (Holden, Barker, Rosenberg, Kuppens, & Ferrel, 2011), our research group encountered an issue that has not been widely discussed in the social work literature. This issue has interesting implications for the use of kappa ([kappa]) in many areas of research beyond systematic reviews and meta-analyses. Our struggle with this issue resulted in the writing of this research note.
Early in the meta-analytic process, after searching 25 databases, we compiled a set of 1,680 records to review (Holden et al., 2011). Two team members independently reviewed this set of records, each making an exclude/include decision. Next, the level of agreement was determined (even though a vote for inclusion of a record by either team member resulted in the record going into the final set for full review).The results of this process are displayed in Table 1. Although the percentage of agreement was 95.5 (94.8 + 0.7), the associated [kappa] value was .22. Recall that a value for [kappa] "is one when perfect agreement occurs, zero when observed agreement equals chance agreement, and less than zero when observed agreement is less than chance agreement. Therefore, [kappa] is interpreted as agreement in excess of chance agreement" (Orme & Gillespie, 1986, p. 166).
How should one interpret these findings that at first glance appear "discrepant"? That is, how does a researcher resolve the gratifying finding of agreement in nearly 96% of the coding decisions when that finding is accompanied by a "low" [kappa] value? In the course of answering this question, we wondered if this issue was well documented among social work scholars. A review of the indexes of a convenience sample of textbooks covering research methods, statistics, or both in social work failed to uncover a single discussion of this topic (Blanksby & Barber, 2006; Bloom, Fischer, & Orme, 2009; Drake & Jonson-Reid, 2008; Grinnell & Unrau, 2008; Montcalm & Royse, 2002; Royse, Thyer, & Padgett, 2010; Rubin, 2007, 2008; Rubin & Babbie, 2007; Unrau, Gabor, & Grinnell, 2007; Weinbach & Grinnell, 2004; York, 2009).
A key word search of Social Work Abstracts using "kappa" produced 16 hits. Review of these titles revealed that all focused on the use of kappa (or weighted kappa) in specific assessments of interrater or intermeasure reliability, and only one briefly mentioned the issues discussed here (Fisher, Evans, Muller, & Lombard, 2004). This article was published by a group of authors that did not appear to contain social workers and appeared in a non-social work journal (Journal Citation Reports, 2008). Although we did not intend for this to be a comprehensive search, given Social Work Abstracts' performance deficits (for example, Holden, Barker, Covert-Vail, Rosenberg, & Cohen, 2008, 2009; Shek, 2008), two additional strategies were used.
First, a scan of records obtained from a search of Social Services Abstracts did not produce any additional articles related to our focus. In addition, we approached both traditional (that is, other scholars working in the area) and electronic invisible colleges (that is, professional Listservs) to further explore the level of familiarity with this kappa interpretation issue (Cooper, 2010). A mailing to the MSWEDUCATION Listserv resulted in one response that noted awareness among experienced qualitative researchers of alternative measures to kappa (personal communication with D. Fitch, assistant professor, School of Social Work, University of Missouri, Columbia, March 13, 2010). Queries directed to a group of research colleagues also lent support to the conclusion that this issue is not well known in social work. Two of these contacts (personal communication with S. Kirk, professor, Social Welfare, University of California, Berkeley, January 26, 2010; personal communication with J. Orme, professor, College of Social Work, University of Tennessee, Knoxville, January 26, 2010) did lead us to the most detailed discussions of problems with kappa in the social work literature (Combs-Orme & Orme, 1986; Kirk & Kutchins, 1992; Orme, 1986). Although coverage of the issues associated with kappa has been limited in social work, the issues have been addressed in other substantive fields and particularly in medicine (for example, Feinstein & Cicchetti, 1990; Lantz & Nebenzahl, 1996). A brief overview of Cohen's [kappa] is presented first, followed by a discussion of the kappa paradoxes and possible alternative agreement measures.
COHEN'S KAPPA ([kappa])
In 1960, Cohen introduced the [kappa] statistic as a chance-corrected measure of interrater agreement for nominal score data to overcome the drawback associated with simply computing the observed proportion of agreement among raters. The...