Saturday, March 27, 2010

Research on Student Evaluation of Teaching

Perhaps no practice in higher education pushes veteran faculty to cynicism and younger faculty to frustration more than SET—student evaluation of teaching. If you have ever received SETs that left you angry, scratching your head, or laughing at the irony of it all; if you have ever wished there were other ways to evaluate your teaching; if you have ever wondered about the reliability and validity of the SET process, you are not alone.

Although over thirty journal articles went into the preparation of this essay, that number represents only about 1% of all that have been published on the subject. According to Al-Isa & Suleiman (2007), 2988 journal articles on SET in higher education appeared in professional journals from 1990 to 2005. Furthermore, the ones published 30 years ago address the same concerns as the ones written in the last few years. As many of the articles echoed, faculty members routinely question the practice of SET.

Until I came to DSC, my experience with SET at five other institutions led me to conclude that SET was used to weed out the really poor teachers, but not to reward the better ones. Here I found out that I would need to earn a certain level of SET score to achieve my professional goals. I also learned it is possible to raise one’s SET scores. Few would disagree; the questions that puzzle and frustrate are whether raising one’s SET levels (1) constitutes pandering to the students and/or (2) reflects in any way that one’s teaching is actually getting better and whether the students have actually learned anything more.

Further questions revolve around whether SETs are the best way to evaluate teaching (other than being the cheapest and fastest); if the forms are reliable and, if so, reliable for what (generally, they focus on teacher behaviors, not teacher efficacy); and if students are really wise or aware enough to evaluate teaching in the first place.
Below I have attempted to summarize some of the research and bring to the surface the concerns the research addresses, or at least the concerns which motivate the research.

One of the most difficult questions to research is the correlation of student evaluation scores with real learning in the classroom. Jon Nussbaum researched communication styles of professors, specifically communication professors, at the University of West Virginia in the late ‘80s. He concluded that certain communication styles, most notably a “dramatic” one, create more affinity for a teacher, which leads to higher evaluation scores, and that this affinity leads to higher likelihood that the students would view their learning positively and change their behavior. However, he did not find that the higher affinity (popularity) resulted in more cognitive learning. After a certain level of affinity was reached, the amount of cognitive learning seemed to go down.

Richmond, Gorham, and McCroskey (1987) found the same in terms of immediacy: low immediacy correlated to low amounts of learning, moderate immediacy to moderate amounts of learning, but high immediacy did not get past the level of moderate amounts of learning, leading them to wonder if there is such a thing as too much immediacy. (Immediacy is discussed more fully below.)

But Nussbaum admits what we all suspect. If students are self-reporting on what they learned, that may not reflect what they actually learn, only what they think they have learned. It is extremely difficult to connect real learning to the scores on teacher evaluations; our methods are too “crudely measured” and the matter too “complex,” it is often argued in the literature. All anonymity would have to cease, and much of the value of SET is linked to its anonymity.

Some studies tried to get around this obstacle by focusing on perceptions of “value added” to the students’ cognitive learning rather than “raw amounts” of learning, or by assessing students’ performance in later classes. However, the concern that perception of learning has little relation to reality of learning remains.Of course, the question of correlation assumes the forms themselves ask the right questions in the right way. And of course, not everyone agrees on that point.

Another concern is that instructors will make a class easier in order to please students into giving them higher evaluations. The research conclusions are mixed on this point. Hessler and Humphreys explain,

Centra (2003) discovered that even after student outcomes of learning were controlled, expected grades generally did not affect students' evaluations of their instructors. In fact, particularly in the natural sciences, students who expected an "A" in the course rated the instructors consistently lower. In addition, the low rating of courses was due to students' perception of coursework as too elementary or too difficult. Courses rated as "just right" in difficulty level received the highest course evaluations. (2008, p. 187)

Along the same line, Yunker and Yunker (2004) found that students who had a highly rated professor for one accounting course actually did less well in the next accounting course than students who had a less popular teacher. In contrast, Ikegulu and Burham (2001) concluded that students' expectation of their course grades significantly affected the ratings of their instructors. The lower the expected course grade, the less favorable the faculty evaluation.

Therefore, the perception that popular teachers are easier teachers remains, and as long as some instructors get lower scores, will probably continue. “In brief, many university teachers believe that lenient grading produces higher SET scores and they tend to act on this belief” (Pounder, 2007, p. 185)

Related to the concern about “dumbing” down is that of discipline-specific issues in teacher evaluations. Although SET research has been done in specific fields, cross-disciplinary research and applying the findings on SET from one discipline to another is not a predominant theme in the literature. In 1982 Doyle wrote, “It seems most unlikely that any one set of characteristics will apply with equal force to teaching of all kinds of material to all kinds of students under all kinds of circumstance. . . . To try to prepare such a list entails substantial risk” (p. 27, ctd. in Stake and Cohernour, 2000, p. 68).

Further related to the concern about pandering to students is that faculty might be dissuaded from using innovative procedures or teaching methods because of students’ reactions. Many students expect the teachers to do most of the work in the classroom. Most teaching and learning experts advocate challenging that expectation and changing the practice based on it; that is, the writers advise that instructors should move from straight lecture to more learning-centered models. Those approaches don’t make it easier on the teacher, but young students may perceive them as cop-outs for the instructors, or we may simply suspect the students do, keeping us from changing to methods that make students responsible for their own learning.

A third major concern in the literature, one not really solved but one that probably motivates the biggest part of the research, is the use of the SETs for tenure and promotion. Most institutions utilize them significantly to make such decisions. Perhaps many faculty members, when hired, do not understand the place these evaluations will have in the overall process of promotion at those particular institutions, leading to quite a bit of resentment after the fact. Some faculty members truly fear the SET process. Theoretically, the forms, which began on a widespread basis in the ‘70s but go back to the ‘30s at a few large universities such as Purdue, should be used for improvement of teaching, not punitively. But many professors believe otherwise.

A fourth concern addressed in the research is the value that students put on the SETs, and how that value translates to effort and care in completing them. Whenever many of us conduct a SET for a colleague, we preface it with remarks about how important the process is to the institution. Does the message get across? Do the students really believe us, or are they so surveyed in this generation that it’s just another exercise in opinion-giving?

A fifth, but not unrelated concern, has to do with what preconceptions the students enter the classroom and how those affect the evaluation process. One word that pops up often in SET research is “immediacy.” Or as one source calls it, “an instructor's "warmth-inducing" behavior”. In fact, the research on “warmth inducing behaviors” is the most probative and frustrating, depending on one’s perspective. Research indicates that students expect these personal qualities, and sometimes at a very high level. Chonko, Tanner, and Davis (2002) surveyed business students and found that the following percentages of students expressed these expectations:
Interesting 11.9
Helps students 11.6
Communicates well 10.7
Easy to talk to 10.3
Good personality 7.9
Kind 6.0
Understanding 4.7
Interested in subject 4.0
Knowledgeable 3.4
Challenging 2.7
Enthusiastic 2.7
Fair 2.5
Loves to teach 1.9
Sense of humor 1.5
Wants students to learn 1.4
Easy-going teaching style 1.2
Experienced 1.1
Organized 1.1
Open-minded 1.1
Other 8.7
Items in the “other” category include
making class fun, listening, admitting
expenses, not belittling students, doesn’t
like to hear self talk, dynamic, easy, high
energy, gives walks, intelligent, reliable,
respectable, teaches at a reasonable pace,
well-rounded, does not make things hard. (p. 272)

Even in relation to teaching methods, student expectations may be off-kilter with our own. Kember, Jenkins, and White (2004) studied the perception of teaching methods based on the students’ orientation toward learning: students who were self-determining in their learning and viewed it as transformative vs. those who viewed learning as reproduction and teacher-based. Students judged their teachers not on the basis of their methods, but on the basis of what the students preferred. As many other studies indicated, expertise in one’s field ranked fairly low, but personal characteristics and what might be considered “communication style” characteristics ranked more highly.

What is meant by communication style? Norton defined it as “the way one verbally and paraverbally interacts to signal how literal meaning should be taken, interpreted, filtered, or understood” (1978). Communication style can be seen as part of “selling the product” and “setting it up” for students—not only how the teaching of the class’s materials and skills, but how you set up or “frame” the SET process. Many times the instructor is performing the behaviors listed on the form, but the students aren’t noticing. However, we can make them notice.

This need to prepare the students, at least a little, for the SET process is also borne out in the literature, which proposes that some students just don’t understand the questions. Their reading ability and general maturity level precludes their being able to complete the forms adequately. On top of the reading ability, the SET process is not always sensitive to cultural concerns. Does a 40-year-old Latino student view the form the same way a middle-class, white 18-year-old does? What about a student from an even more traditional cultural background ? Does a female student complete the form with the same frame of reference as a male student?

Furthermore, are the gender, race, age, accent, and fashion-consciousness of the instructor immaterial to the process? Centra and Gaubatz (2000), among many others, argue strongly for gender and ethnic bias in SET process. Does a senior approach it the way a freshman does? And are there significant generational differences in how the SETs are perceived and completed, for example, between how boomers and Gen Xers did compared to how Millennials do?

Other researchers concern themselves more about how to improve the process, either by communicating more clearly with students about the forms, by changing the forms, by controlling the timing of administration, or by using formative and not just summative assessment. Formative assessment involves asking the students for feedback earlier in the semester than at the official evaluation time.

Much of this advice is not based on hardcore data as much as on the writer’s conjecture. The thinking goes, “If I get negative evaluations, maybe it was because of when I gave them, so next time I’ll change the timing.” And does anyone really know if a teacher who uses midterm evaluations/feedback of their teaching really gets higher SET scores? Can we know, given with the multiplicity of factors involved? And does the use of midterm evaluations work because the instructor improves or because the students perceive an instructor who uses midterm feedback mechanisms as more immediate?

So, what do we do with this information? In some cases, the research supports our intuitions, experience, and prejudices about SET; in other cases, it debunks them. As mentioned before, the advice on SET is not based in research as much as it could or should be, but writers make the suggestions nonetheless. And this article will follow the same pattern, in the knowledge that an easy course does not automatically mean high scores, that students are often uninformed about SET and its goals and even the meaning of the questions, and that the expectations of the students when I walk in the classroom are sometimes wildly different from mine.

First, how should the faculty member respond to and use the forms? I have to admit to frustration with student comments and their inconsistency. Everyone reading this has had the same experience. We’ve all read those student comments that intimate the responsibility for their earning a college degree is largely ours, not theirs. It is probably best to look for trends over a couple of semesters; otherwise an instructor will become even more frustrated trying to change based on any one semester’s comments. But what really matters is separating the wheat from the chaff. If it’s a stray comment about your personality—or theirs--or a complaint about the fact that class is required, we just have to develop a tough skin. If it’s about pedagogical practice—too much PowerPoint day after day, for instance, or unclear tests, or regular but unexpected changes to the syllabus, or it’s a repeated comment, that’s something to consider.

It’s pretty clear that few faculty really like SET. But is there realistically any other way to evaluate teaching, other than more classroom observations? So my parting shots on strategies for improving SET scores revolve around making the best of the situation.

1. Teach to the test. It will help not only you but other instructors who will now have students prepared for the questions they will be answering. For example, I found students were writing that I didn’t let them ask questions when I knew I did. Now I draw their attention to it early in and throughout the semester: “You might evaluate me during this class, and it will ask if I (fill in the blank), well, I’m doing that right now.”
2. Timing is everything. Do your best to administer your forms as far away from a major test or giving back a major paper as possible. Should you do it at the beginning or the end of class? They will not be motivated to be thoughtful at the end of class, and will rush to leave, so the beginning might be better. (Of course, this suggestion depends on the availability of colleagues). Also, administer the forms as late in the semester as possible so that those who are going to fail or drop out are not there. Sometimes procrastination is helpful.
3. Pick a colleague to administer it that you know will be positive and give a nice little introduction.
4. If the form doesn’t ask what you want to know—do your own in addition. SALG—Student Assessment of Learning Gains ( is a useful online tool for finding out what students are learning--or not—and why. And you can use midterm (or earlier) evaluations or feedback, being sure you utilize the feedback in class and point it out to the students. Ignoring feedback after asking for it will only hurt your immediacy scores. (Linda Nilson suggested a simple form with the three words: Stop ____, Keep ____, and Start ____ .)
5. Give out snacks or chocolate the week and class before. Nothing spells immediacy like Little Debbies and Hershey’s Kisses. I’m kidding, of course, but not in essence. Immediacy is important, but of course it means more than sweets. Immediacy is communicated verbally and nonverbally, but the nonverbal controls the reception of the verbal strategies. Mehrabian (1967, 1971), the guru of nonverbal communication, said it is demonstrated by nonverbal behaviors of approach—forward body leaning, purposeful gestures, eye contact--leading to perception of warmth, friendliness, and liking. Kearney and Plax (1991) concluded that immediacy trumps many aspects of the classroom, such as how the instructor might try to get the students to comply with certain policies of the classroom or certain challenges of the material.

Why does immediacy work? There are two theories, according to McCroskey and Richmond (1992): (1) Arousal comes from the immediacy; the arousal leads to more attention to the learning task, which leads to more openness and thus more learning and memory; this theory relates to cognitive realm. On the other hand, (2) immediacy stimulates more motivation to learn in the student (largely because of identification and affinity) and thus to more learning, related to the affective realm. Which one is right? Does it matter? Immediacy works.

That could be a frustrating conclusion, especially for us Type-A personalities who want to get the material covered and move on, or for those who feel that the students are too needy, want a mother figure, and should just buck up and buckle down. But it’s really a liberating idea, when you think about it, and what your kindergarten told you: nice matters, even in SET.


Al-Isa, A. & Suleiman, H. (2007). Student evaluations of teaching:
Perceptions and biasing factors. Quality Assurance in Education, 15(3), 302-317.

Centra, J. A. (2003) Will teachers receive higher student evaluations by giving higher grades and less course work? Research in Higher Education, 44, 495-518.

Centra, J.A. & Gaubatz, N. B. (2000). Is there gender bias in student evaluation of teaching. Journal of Higher Education, 71(1), p. 17 ff.

Chonko, L.B., Tanner, J,F. & Davis, R. (2002, May/June). What Are They Thinking?
Students’ Expectations and Self-Assessments. Journal of Business Education, pps. 271-281.

Hessler, K. & Humphrey, J. (April 2008). Student evaluations: Advice for novice faculty. Journal of Nursing Education, 47(4), 187.

Ikegulu, T.N., & Burham, W.A. (2001). Gender roles, final course grades, and faculty evaluation. Research and Teaching in Developmental Education, 17(2), 53-65.

Kearney, P. & Plax. T. G. (1992). Student resistance to control. In Power in the Classroom: Communication, Control, and Concern. Edited by Richmond, V. P. and McCroskey, J. C. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Pps. 85-100.

Kember, D., Jenkins, W., Ng, K.C. (March 2004). Adult students perceptions of good teaching as a function of their conceptions of learning—Part 2. Implications for the evaluation of teaching. Studies in Continuing Education, 26(1),pps. 81-97.

McCroskey, J. C. & Richmond, V.P. (1992). Increasing Teacher Influence Through Immediacy. In Power in the Classroom: Communication, Control, and Concern. Edited by Richmond, V. P. and McCroskey, J. C. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Pps. 101-119

Nussbaum, J. F. (1992). Communicator style and teacher influence. In Power in the Classroom: Communication, Control, and Concern. Edited by Richmond, V. P. and McCroskey, J. C. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Pps. 145-158

Stake, R. E. & Cisneros-Cohernour, E.J. (Fall 2000 ) Situational evaluaton of teaching on campus. In Evaluating Teaching in Higher Education. Jossey-Bass. P. 51-72

Plax, T. G. & Kearney, P. (1992). Teacher power in the classroom: Defining and advancing a program of research. In Power in the Classroom: Communication, Control, and Concern. Edited by Richmond, V. P. and McCroskey, J. C. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Pps. 67-84.

Pounder, J. S. (2007). Is student evaluation of teaching worthwhile. Quality Assurance in Education 15(2), pps. 178-191.

Yunker, P. J. & Yunker, J.A. (July/August 2003). Are student evaluations of teaching valid?: Evidence from an analytic business core course. Journal of Education for Business, pps. 313-317.

No comments:

Post a Comment