Single-blind, double-gendered study reveals sex bias in student evaluations

taintorimaginarymenThis is brilliant! (Well, more like a LOLsob). From Amanda Marcotte at Slate:

One of the problems with simply assuming that sexism drives the tendency of students to giving higher ratings to men than women [in students’ course evaluations] is that students are evaluating professors as a whole, making it hard to separate the impact of gender from other factors, like teaching style and coursework. But North Carolina researcher Lillian MacNell, along with co-authors Dr. Adam Driscoll and Dr. Andrea Hunt, found a way to blind students to the actual gender of instructors by focusing on online course studies. The researchers took two online course instructors, one male and one female, and gave them two classes to teach. Each professor presented as his or her own gender to one class and the opposite to the other.

The results were astonishing. Students gave professors they thought were male much higher student evaluations across the board than they did professors they thought were female, regardless of what gender the professors actually were. When they told students they were men, both the male and female professors got a bump in ratings. When they told the students they were women, they took a hit in ratings. Because everything else was the same about them, this difference has to be the result of gender bias.

Yes, that’s right:  the presumptively “male” professors got better scores on all twelve questions than the presumptively “female” professorsalthough they were the same proffies being rated twice, once as a “man” and once as a “woman.”  Now, stuff that in your tenure and promotion files and smoke it!

Seriously, we need to consider our student and peer evaluations of teaching in light of this important information.  If you are a tenured faculty member and you see or hear of a woman being slagged for not having strong enough teaching evaluations, bring this information to your Tenure and Promotion committee meetings.  If you are a woman being hassled by colleagues about your teaching evaluations, show this study to them and ask them to look at the evaluations of the men and women in your department as a whole and compare the scores and comments.

Whoever said online education was good for nuthin’?  (Not me!  Well, not much.  Not too often.)  Discuss!

27 thoughts on “Single-blind, double-gendered study reveals sex bias in student evaluations

  1. I have been making this point in department meetings and salary meetings for years. I really believe we need to trash the whole system. Did you read the recent Italian study that demonstrated students gave lower evaluations to classes where they learned more? This especially held true for 100-level classes. the more demands placed on the students the worse they ranked the course even if the courses had a positive impact on their performance in subsequent classes in the discipline. Between structural biases (not just sex, but also race, nationality, gender-conformity, etc.) and inverse correlations to teaching effectiveness I have no idea why anyone gives any credit at all to course evaluations. Grrrrr.


  2. My sense is that evaluations were initially a kind of PR move in the 1970s to let students think their interests and concerns were important to the universities. (I don’t think they asked students their opinion before the 1960s anyway–can any old-timers or historians of education enlighten us?) Evals can offer some helpful information in the case of a professor who (for example) is a drunk, or an addict, or is just not showing up for class, but they are not much use on their own in evaluating teacher performance without peer evaluations by other faculty.

    I think they’d be much more actually useful without the bubble sheets & scores. Just ask the students to reflect on their learning in a narrative form, and ask what they think might improve *student learning*–students are sometimes very thoughtful about what works for them. But asking for a thumbs-up or thumbs-down kind of rating is stupid and yields equally stupid data.


  3. On the plus side, hiring and T&P committees that see low evals for men should think twice about their teaching skills (ha!). I do think evals are important and can play useful roles, but not in their current state. I have my students do a final reflection on the course (with a prompt that structures it) and I find those useful. I also find feedback on specific things such as readings and assignments useful (some drivel but also some helpful comments). Qualitative feedback ftw.


  4. This is fascinating, especially in light of the fact that it’s a widely-accepted “best practice” among instructional designers (some of whom are excellent, and many of whom, well, don’t get me started) and other online ed experts to inject as much of your personality as possible into an online class: picture, short bio, the occasional personal anecdote, etc., etc. Maybe we should all be pretending to be men of whatever age and interests students find most helpful as teachers, on the theory that if they think they’re getting a good educational experience, they’ll actually learn more (this idea is, of course, closely related to some of the questionable assumptions underlying the use of student evals in regular classes).

    Actually, one of the things I find intriguing about teaching online is that I sometimes don’t know for some time, or ever, what the gender of a significant portion of my online students is (not because they’re hiding their gender, but because we’ve got a pretty multicultural student body, enough so that I don’t automatically associate a gender with some of their names, and in fact some of them come from cultures where name and gender don’t track as much as in the US and its traditional feeder cultures). I do tend to form impressions from things they say (both in short intro bios and in posts, emails, etc. — as much tone as content, though both play a role), but I’d say that I’m wrong a good 1/3 of the time (I do one-on-one conferences toward the end of the term, so usually end up meeting each student, at least via Skype, though some choose an email exchange because of schedule/time-zone problems, or simply disappear before the end of the term). It may be relevant that I’m usually teaching a writing-for-scientists course, so my own cultural assumptions about who’s likely to go into engineering, various branches of medicine, etc. likely play a role. I don’t *think* my biases and assumptions are doing my students any harm, but it’s an interesting question (which, of course, also carries over to the regular classroom).


  5. A few years ago we (a SLAC of a mere 1800 students) went to on-line evaluations with lots of room for free-form comments. Many comments are in fact quite thoughtful, but I have found as a department chair and member of our tenure and promotion committee that even students who take the process seriously simply do not know enough about either the subject being taught or the process of teaching to make very helpful comments. They can catch basic problems (instructor’s disorganized, does not return papers quickly, etc) but they can’t get beyond the basic level. Which is not surprising — they’re students!

    The male = “real professor” bias is appalling, although not a total shock. I have also noticed over the years that age automatically improves one’s evaluations — my evals improved noticeably right after I got tenure, I think because the students assumed that a tenured prof must by definition be better, even though I hadn’t had time to change anything! Now I have mostly gray hair, which I swear really does affect student perceptions.


  6. Word on the age observations, NB! In academia, as opposed to “the industry” filming on the steps of the Huntington as I write this minute, gray hair and wrinkles can work for us.

    CC, I thought this study might interest you. I think you should virtually transvest in one of your sections just to see what happens!


  7. LOLSOB indeed. I once asked several of colleagues (the ones known for being fairly demanding) whether they, too, had seen the word “intimidating” regularly on their evals. The women, almost every one, saw it at least once a semester. The men just gave me puzzled stares.

    Privilege: you gots it.


  8. LOLSOB indeed. I once asked several of colleagues (the ones known for being fairly demanding) whether they, too, had seen the word “intimidating” regularly on their evals. The women, almost every one, saw it at least once a semester. The men just gave me puzzled stares.


  9. @Historiann: that would be interesting, but would take some cooperation from the tech folks, since our LMS incorporates a profile picture (and, of course, I’ve got a faculty profile on the dept webpage, and a rate my professor page, and all the other accoutrements of the modern academic. I’d probably have to pretend to be my brother or my father or something).

    The other interesting experiment would be to create a deliberately gender-ambiguous instructor “presence.” I’m guessing that such an instructor would, sadly, receive ratings lower than either identifiably-male or identifiably-female instructors.


  10. I think you might be right that gender ambiguity might be read more negatively than a clearly sexed instructor. I think that women who conform to gender stereotypes get better evals than women who challenge them, either in looks or behavior (or both). Also, gender ambiguity might be read as the instructor having a particular political position, and possibly on the subject ze teaches in some cases, which might make the students even more uncomfortable!


  11. Unfortunately, I don’t see the dependence on numerical teaching scores going away. A department chair in my college told the other chairs that a good goal would be for all faculty to have teaching scores in the highest quartile. She seemed oblivious to the fact that this cannot happen, any more than triangles can have four sides.


  12. Notorious, I’ve only been called intimidating once in an eval and that was way back in grad school when I was a TA.

    I thank my lucky stars and my union that we do not have standardized teaching evaluations. The administration would so desperately like to have them, but the union would just kick up a hornets nest. Its nice to have this study as further evidence to bolster the case against them.

    That said, I do end of semester teaching evaluations for every class. In new classes, or classes I haven’t taught in a while, I do a mid semester evaluation as well. My artisinal, hand crafted, student teaching evaluation asks specific questions about class materials, assignments, and the amount of time the student spent preparing for class in an average week.

    Some of the questions are Likert or ranking format. For example, I have them rate the books and other class readings. Other questions are long form and ask for specific examples. I ask them to tell me about their worst experience in the class. As Historiann and Northern Barbarian pointed out, these can catch basic problems and even elicit some thoughtful critique about the books and assignments.

    I can honestly say that I have made changes to my classes, almost every semester, based on the student evaluations and my own notes from how the semester went. So it is a useful endeavor, as long as you get to roll your own evaluation.


  13. The gender bias does not surprise me at all. I see it every semester in how the students (male and female) treat my female colleagues in the department and the college as a whole.


  14. I once directed a program that was required of all students at a large public university–thus, we had many, many sections, quite a few of them taught by adjuncts, and I had to make the decision of which adjuncts to put on full-time rather than course-by-course basis. I did a lot of classroom observation. But we also used student evaluations–not a standard bubble-in form but one specifically designed for this class.

    The most useful question on the evaluation form, in my view, was one that didn’t ask about the instructor at all. Instead, it said “Which reading in this course did you find the most interesting and why?” The more of your students who can articulate why they found a particular text compelling, the better teacher you likely are.


  15. At my current institution, all students are sent a standard email which asks them to comment on all their courses and teachers (at whatever level) each semester. They have a combination of generic questions where they are asked to rank their satisfaction on that topic out of 7. They rank in whole numbers (so, 1, 2, etc). They then have two open text questions – what did you like; what could be improved. Whether they fill it out is entirely voluntary and getting much above 30% response rate is unusual and predictably it tends to be the lovers and the haters, not average Joe in the middle.

    We then get our results by email (all anonymous) and can compare our rankings to the average in the department, the faculty and the university. What I have learned from this is that the university average (which is always the lowest of these three as Humss subjects generally are pretty consistently ranked as good teachers) is pretty good at 5.77 out of 7, with our Faculty at around 5.86. Many scores in my department are above 6 out of 7. Now unless we believe that students are going to start giving us full marks for everything (seems unlikely), it seems to be that it’s pretty hard to show improvement year on year. And most of it comes down to actually how many people decide to answer the questions. For one of mine which had a low answer rate, 1 student’s opinion ended up being worth 5% of the mark, which when we’re taking such tiny numbers this can shift things quite a lot. So I genuinely don’t know what they tell us, unless you are really tanking it in one area or compared to everybody else. Despite this, it’s really very competitive as a measure of who is the BEST teacher. Seriously.

    The written comments tend not be very helpful either. It tends to be a bit like goldilocks – there was too much reading, the reading was just right, there was not enough reading; there was too much assessment, the assessment was just right etc. So it’s pretty hard to take anything from that. They’re usually pretty brief and can sometimes be good for the ego. I’ve been told I’ve had a nice laugh and that people enjoy my accent, but I don’t generally get comments on appearance. Nobody has been overtly sexist, but one particularly negative set of comments this year whiffed a bit in that the rhetoric was particularly troll-like – that person thought I was too ‘leftist’ (a thing apparently) and that my male TA would have done a better job than me.


  16. My mother was a university prof, and, no, there were no student evaluations in the 1950s and 60s. I’m not sure when they became widespread, although by the early 80s I know I was filling them out at Major East Coast school.

    Interestingly, the students there didn’t feel the official multiple choice version told them much. (What does a 3.51 out of 5 rating for “assignments” actually mean?) So they had their own informal booklet, mostly for the huge classes, compiled from a few students’ impressions who’d taken the course the previous semester.

    The official evaluations were useless. The unofficial one was so accurate you could use it with confidence to decide which sections and classes to sign up for.

    As a scientist who’s used statistics (what feels like) forever, I’ve often thought there was a huge cautionary story there that needs all kinds of attention it never got.


  17. The business culture of “measurable outcomes” and “best practices” means it is highly unlikely we will ever be rid of this so-called evaluation system.


  18. My gender presentation has veered more towards soft butch in the last few years, and I’m instantly readable as a white dyke to anyone with even a nodding familiarity to US cultural codes. But, although I know both sexism and homophobia to be present in every class I teach, my numeric evals have actually gone up slightly. Honestly, I think I’m accessing white male privilege. It’s true that I’m also visibly older: grey hair seems to be a signifier of authority in the classroom. It’s also true that I’m probably more comfortable in my own skin since I’ve brought my gender expression more in line with my identity, and this might give me slightly more self-confidence in the classroom. But I swear, white male privilege is so entrenched that students will even grant it to women if they present as masculine-of-centre (as long as they are seen as cis and not trans). I think the only way for women with more feminine presentation to be read as instantly authoritative is to be two generations older than the students – then I think students project a grandmaternal-substitute thing onto them. And these intersections of sex, gender, and authority are much more complicated for people of colour whatever their gender.


  19. The university where I’m currently adjuncting doesn’t release your evaluations unless 60% of the class completes the survey- so I’ve never seen any results. Thank FSM, too, because I am not an easy grader and I need this job.


  20. Now I’d like to see the evaluations after the professor showed students this study, noting that the evidence shows that students won’t be able to evaluate the professor’s skills beyond your own gender expectations of her/him. The students at my University see themselves as devoted to social justice. So, I know many of them would take this as a challenge (to evaluate beyond – or in spite of – gender biases). The major challenge to this endeavor is lack of a control group that would help determine if students embraced their biases or tried to see beyond them.


  21. Male online teaching avatars? “It’s Pat” from SNL ambiguous avatar to ‘keep ’em guessing?’ We are working on student evaluation of learning revision of our student evaluation of teaching. The questions were informed by but we rewrote them to begin “In my experience” or “I found the lectures to be…” Can’t report back yet as they have yet to be approved/torn apart by the faculty governance structure at my institution.


  22. Pingback: Gender, history, and best-sellers | Historiann

Let me have it!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.