The amount of academic research on the Student Evaluation of Teaching (SET) is staggering. Perhaps the fascination with SET stems from the very introspective nature of the issue: academics researching themselves. While previous research has grappled with the notion of teaching effectiveness and teaching score validity, we shift the focus to university governance. Thus, we set ourselves aside from the managerialist perspective on higher education. Managerialism deals with the technological problem of optimizing the operations of the organization within a given set of rules, while university governance analyzes the very choice of rules that govern the allocation of power and control among financial claims. Unlike managerialism, university governance does not claim to be a value-free exercise; it approaches decision-making from an economic and political perspective.
The emergence of modern student evaluation of teaching can be traced back to the 1960s. During that period, SET was used on an experimental basis, i.e., voluntarily. In the early 1980s it became the mainstay of academic practice in North America (Centra 1993; Wachtel 1998; Murray 2005; Lohman 2006). Today, the main justification for its use stems from the belief that SET is measuring teaching effectiveness. Many universities in the United States and Canada use teaching scores (to various extents) as a basis for tenure, promotion, re-appointment, and resource allocation.
Interestingly, the impetus gained by SET in the 1960s came in the wake of the civil rights movement. Back then, the most vocal group were the students who saw SET as a conduit for making their voice heard in the affairs of the university. The Faculty seized upon the opportunity to diversify the performance criteria on which tenure and promotion were based. And, the administrators probably sensed that the use of SET would provide an aura of accountability and legitimacy. Universities are entrusted with millions of dollars of public and private money, and it was important to show that the money was well spent. From its inception, SET mollified students, taxpayers, and private donors.
We argue that the notion of teaching effectiveness--invoked today to rationalize the use of SET--has no verifiable empirical content, and therefore the question of teaching score validity is misguided. Current research shows that, at most, teaching scores reveal the extent to which the professor is able to connect to the students' cultural beliefs and live up to their expectations. This expectation-fulfilling mindset, however, is not a meaningful approach. When used for administrative purposes, SET leads to collusion between students and teachers, and generates negative economic externalities. Why then are SETs still used on such a grand scale? The main thesis of this paper is that university administrators nurture teaching scores because they represent an enabling myth (Dugger 1989). SET legitimizes managerial claims to increasing control over the affairs of the university.
We organize our paper as follows: in the next two sections we discuss the empirical content of teaching scores. Then we discuss the conflict of interests that plague the use of SETs. We analyze the relationship among teaching scores, the rising managerial elite, and university governance in the latter sections, and suggestions and concluding comments are presented at the end of the article.
In the Eye of the Beholder
The central issue in the student evaluation of teaching is the notion of teaching effectiveness. The ability of the students to gauge the quality of the instruction process rests on the argument that the wisdom of crowds is more dependable than the wisdom of elites (Surowiecki 2004). About a century ago, Galton (1907) noted that any large crowd is better than any single individual at guessing the dressed weight of an ox. From here, Galton inferred that democratic judgments are more trustworthy than otherwise believed. Unfortunately, the wisdom of crowds argument cannot be readily applied to teaching in spite of its merit. For one, the dressed weight of an ox is a relatively straight forward notion that can be easily verified by a third independent party. Teaching effectiveness, however, is only an abstract construct, subject to various interpretations. In addition, the crowd participating in the weight-measuring contest was not subject to any conflict of interest. That is, the evaluation had no significant impact on their welfare. Each individual was an objective participant motivated only by the desire to test his/her guessing skill.
Thus, we contend that the question of teaching effectiveness represents a red herring. It makes for a very good illustration of Quine's (1951) critique of empiricism. Teaching effectiveness is not an empirical statement that has a verifiable truth value; and teaching scores are not measuring any objective and independently verifiable trait of' the teacher's classroom behavior. Teaching effectiveness is merely an abstract construct whose notional content varies according to the questionnaire used to measure it. By default, SET has become the operational definition of teaching effectiveness. Once embraced as acceptable, this approach never allows us to challenge the question of relevance, implying that teaching scores are always measuring what they are supposed to measure. Obviously, an axiomatic statement can never be proven wrong. Arguably, teaching effectiveness belongs more to the metaphysical world, for it does not meet Popper's (1972) criterion of demarcation.
What does SET really gauge? McKeachie (1979), Gigliotti (1987), Koermer and Petelle (1991), and Perry et al. (1979) suggest student expectations as a recurring motive. Students are conditioned to anticipate a certain type of classroom experience. Just like Wall Street investors, they react negatively when their expectations are not met (Gigliotti 1987). When the classroom experience is consistent with their expectations, the resulting teaching scores will be high (Koermer and Petelle 1991).
Student expectations can be associated with the pre-existing tradition and culture of each university. Hoffman and Kremer (1980) find that instructors who tune into students' attitudes and culture are rewarded with higher teaching scores. Shevelin et al. (2000) show that personal charisma is the single most important instructor characteristic influencing teaching scores. More disturbingly, Ambady and Rosenthal (1993) find that consensual judgments of instructor's nonverbal behavior based on a very brief silent video--under 30 seconds--significantly predicted end of semester teaching scores.
If we are to accept the notion that teaching effectiveness is defined as a predictable classroom experience, consistent with student expectations, and delivered by a cool teacher, then SET is indeed a measure of teaching effectiveness. This is, however, un-insightful. It does not say much about the type and amount of learning that takes place. It does not point to any obvious solutions to improving one's teaching performance. The only safe approach is to be cool (according to local norms) and avoid startling the students. However, as explained later, this expectation-fulfilling mindset would entail conflicts of interests and erosion in the quality of academic standards. At this point, we might perhaps re-state the main question. Is there any portion of the university's economic output that could be gauged by the student evaluation of teaching?
The Economics of Higher Education
Universities provide a mix of public and private goods. It is important to understand in greater depth why and how the university is creating value if we want to be able to design an appropriate metric of performance. We should inquire whether there is any connection between the economic paradigm fostered by the university and the evaluative capability of SET. To our knowledge, we are the first to propose this approach, and claim this to be an original contribution of our paper.
Paulsen and Feldman (1995) use a widely popular system to describe the activities of the university. The system explains the nature of faculty work by adopting four functional categories: teaching, service, research, and academic citizenship. Paulsen and Feldman focus on how things are done. Here, we add another classification of our own that focuses on what is being done. We argue that the economic impact of universities is manifested in three discernible areas:
(i) Creation and dissemination of knowledge
(ii) Investment in human capital
(iii) Granting of degrees
Our approach emphasizes the indissoluble mix of private and public goods produced by higher education. It also draws attention to the challenge of measuring its economic output. Our system highlights why managers are shifting the focus from creation of knowledge and investment in human capital to granting of degrees. Selling degrees is more readily measurable, and thus can justify more easily managerial discretion. Finally, our system makes it easier to understand how negative externalities are generated.
(i) The creation and dissemination of knowledge takes places through teaching and research. Research produces objective knowledge, a concept coined by Popper (1972). Teaching creates subjective knowledge, which can best be understood as a "state of consciousness or individual disposition to behave or react" (Popper 1972, 108). Thus, once a student grasps the concept of relativity, new subjective knowledge has been created. The student's insight need not be exactly the same as the teacher's. It is this very difference in insights that eventually allows humans to challenge conventional wisdom, and formulate new hypotheses and insights. Human learning always creates new subjective knowledge because every individual has a unique way of internalizing even the most mundane aspects of our world.
Several economists and sociologists, such as...