[Published originally in the May 2003 edition of Computing Research News, Vol. 15/No. 3, pp. 2, 10.]
Expanding the Pipeline
Are Student Evaluations of Teaching Fair?
By Faith E. Fich
Anonymous student evaluations of teaching are widely used at universities throughout North America for tenure and promotion decisions, determination of yearly salary increases, and the choice of teaching award recipients. Their purpose is to fairly evaluate the teaching quality of faculty members and help them improve their teaching. Yet, the perception of many faculty members, including me, is that the use of student evaluations of teaching achieves neither of these goals.
Over the past few years, I have talked to many faculty members in science and engineering about student evaluations, including department chairs, deans, and directors of a number of university teaching centers. I was also a member of a committee set up by the dean of the Faculty of Arts and Science at the University of Toronto to address the issue of bias in evaluation of women science professors' teaching.
It is outside the scope of this article to give a survey of the relevant literature. However, I'll mention some of the evidence that has convinced me of the unfairness of student evaluations of teaching as they are often used. More importantly, I will discuss implications for how the results of student evaluations should be used and present a few general recommendations for evaluating teaching fairly and effectively.
I want to begin with one carefully controlled experiment that I found very compelling. Sinclair and Kunda  administered a test of 10 open-ended questions to approximately 50 male students. Each student was given feedback on his performance, randomly chosen from among four prerecorded videos. There were two evaluators, one male and one female, and two scripts, one praising the student's performance and one criticizing it. After receiving their feedback, each student was asked to rate his evaluator.
The results of this experiment are summarized by the title of their paper, "Motivated Stereotyping of Women: She's Fine if She Praised Me but Incompetent if She Criticized Me." More precisely, among ratings by students who had been given positive feedback, the two evaluators were rated roughly the same. However, among ratings by students who had been given negative feedback, the female evaluator was rated significantly lower than the male evaluator.
To isolate the cause of the difference in ratings, the experiment had a second part. In it, the test answers given by each student and the evaluation he had been given were shown to an observer, another male student who had not taken the test. Each observer was also asked to rate the evaluator. Among the observers, the ratings that evaluators received were not correlated with their gender.
Sinclair and Kunda's interpretation is that when people are criticized, they unconsciously use negative stereotypes about the criticizer to discount the validity of the criticism, as a way of maintaining self-esteem. An implication is that women professors who have high standards or who teach courses that students find difficult may well be victims of bias. They obtained similar results studying racial bias .
Another interesting experiment was performed by Kaschak . A set of 25 male students and 25 female students were asked to rate professors, given descriptions of the professors and their teaching methods. Half the professors were listed as female, the other half as male. A second set of 25 male students and 25 female students were given the same descriptions, with the genders of the professors switched. Although the gender of the professor did not affect the ratings by female students, the male students rated the female professors lower.
Other Factors Affecting Student Evaluations
There is a vast body of literature about student evaluations of teaching, containing many conflicting conclusions. The problem is that there are many variables unrelated to the quality of teaching that may affect evaluations and that interact in complex ways. Furthermore, most of this work consists of statistical analyses, where factors that are significant for a small segment of the population, for example, women computer science professors, can be insignificant in the aggregate data.
Nevertheless, the bulk of the research does show that certain factors unrelated to teaching quality do affect students' evaluations of teaching. Students in higher-level courses tend to rate professors more favorably than students in lower-level courses [1, 4]. The same is true for students taking elective courses as compared with students taking required courses [1, 4]. Both of these outcomes may be related to the students' greater interest in the course material. There is also evidence that small class size  and leniency of grading [2, 4] lead to better ratings.
Faculty who penalize students for committing plagiarism may receive unfairly low ratings from those students. In computer science courses, it is relatively easy for students to copy pieces of code from one another and there is sophisticated software to detect plagiarism. Thus, this factor may have a greater effect on our evaluations than in other disciplines.
How to Use Student Evaluations
In light of the many factors unrelated to the quality of teaching that can affect the results, it is important to recognize the limitations of student evaluations of teaching when they are used.
Only compare results from similar courses.
Avoid general subjective items.
Be careful with new courses.
Results of student evaluations have low precision.
Eliminate inappropriate student comments.
Use multiple forms of evaluation.
More detailed assessment should be done periodically.
Have a transparent teaching evaluation process.
Why should departments care about improving their teaching evaluation process? If it has been done the same way for a long time and there haven't been major problems, why should it change? One reason is that even a slightly biased process can, over time, lead to substantial inequities in salary. Another reason is that when faculty members have the perception that they are being unfairly evaluated, they feel unappreciated. This can affect their morale, the effort they are willing to put towards teaching, and their desire to stay in their department. Thus, improving the teaching evaluation process might improve retention as well as teaching.
Faith E. Fich (email@example.com) is a Professor in the Department of Computer Science at the University of Toronto.
 Anthony Greenwald and Gerald Gillmore, Grading Leniency Is a Removable Contaminant of Student Ratings, American Psychologist, 52(11), 1997, pp. 1209-1217.
 Ellyn Kaschak, Sex Bias in Student Evaluations of College Professors, Psychology of Women Quarterly, 2(3), 1978, pp. 235-242.
 Ian Neath, How to Improve Your Teaching Evaluations Without Improving Your Teaching, Psychological Reports, 78, 1996, pp. 1363-1372.
 Lisa Sinclair and Ziva Kunda, Reactions to a Black Professional: Motivated Inhibition and Activation of Conflicting Stereotypes, Journal of Personality and Social Psychology, 77(5), 1999, pp. 885-904.
 Lisa Sinclair and Ziva Kunda, Motivated Stereotyping of Women: She's Fine if She Praised Me but Incompetent if She Criticized Me, Personality and Social Psychology Bulletin, 25(11), 2000, pp. 1329-1342.
Copyright © 2007 Computing Research Association. All Rights Reserved. Questions? E-mail: firstname.lastname@example.org.