A very long time ago, I wrote a column for InsideHigherEd.com arguing that assessment isn’t just makework; done right, it can provide useful information to departments and other interested parties about what students actually learn. By coincidence, Horace wrote a long post about his frustration with assessing the humanities the next day, which I promised to respond to, but never have done. I’m trying to rectify that here, spurred in part by this article in Academe, which I think is a wrong-headed form of preaching to the choir.
I’ll concede James Berger’s point that writing collective mission statements doesn’t do anyone any good, but the banality of mission statements doesn’t actually tell us anything about assessment. More than this, it sounds arrogant when those of us in the humanities assert that–uniquely in the university!–our content and methods are so special that it is impossible to evaluate them honestly. (Remember Peter Mac Grawler! At this point, it also would be fair to ask how on earth one ever grades . . . ) It’s–at least as a matter of persuasion–a grave error to argue that, unlike the humanities, the rest of the university is administered and staffed by knaves in the pocket of global late capitalism. Even if that were true–and certainly there’s much that’s craven about higher education–to make that argument in those terms is to alienate the very people with whom we should be making common cause. (For a more sympathetic take on Berger’s essay, see this post at Short-Circuit Signs. Shawn Huston’s post is reasonable, but I would just say that the moment when faculty could simply say, “hey, we’re not teachers, we’re primarily researchers” and expect state legislatures and tuition-payers to tug a forelock are long gone–if it ever existed.)
Some object to assessment in the humanities because it seems like an add-on: After all, we already grade student work. But course grades tell us both too much and too little. First, they measure too much! Some professors grade on effort, some on revision, some on a curve, others on an absolute scale, some penalize for absenteeism, some if you wear a hat to class–a lot goes into a course grade. Also, in contexts where faculty are free to devise their own grading schemes–a situation which isn’t universal!–different faculty teaching the same course, or in the same area, might well emphasize different things in their classes, and grade students accordingly. Considered as an educational experience, this is a feature, not a bug; however, it *also* means that aggregate final grades don’t tell us much about what it is students are learning in a particular department.
What’s needed, then, is a system that aggregates comparable data across a department. Such data has many uses: It can spur curricular reform (Crikey! Our students need a [theory course / research course / year-long composition sequence / senior capstone]!). It can disabuse faculty of their assumptions about what students learn where (“Oh, I thought students were getting concept X in their first year.” “Probably the problem with composition is that it’s taught by part-timers.”). It can demonstrate to interested parties–deans, provosts, accreditors, donors, legislators, parents, prospective majors, or job candidates–that the department is, in fact, good at what it does (or it could be GREAT if that additional faculty line became available). It could be used by a union in making arguments about the comparable status of tenure-line and contingent faculty. The possible uses of such information are almost infinite. It’s incredibly frustrating to hear smart people willingly give up useful information about what goes on in their department!
To keep this post above the level of a rant, I thought I might mention a pilot program that’s going on in our department. I’m not on the department’s assessment committee, though I am on a university-wide assessment committee. The process I describe below was not my idea; if I recall correctly, it was presented to us by our university’s ingenious new assessment guru. (And a quondam Miltonist!! How awesome is that?) Obviously I’m speaking only for myself–but you knew that, right? This is a blog . . . .
Say that your department can identify three or four learning outcomes initially, even in the humanities. Students should be able to craft an analytic thesis; read a text’s language carefully, using quotations or other forms of textual evidence; make an argument about form; make an argument about the relationship, however vexed, between a text and its various contexts; and maybe some others.
Imagine then crafting a rubric that described–on a 3 or 5 point scale–what it would mean to achieve each outcome in ways that meet, exceed, or fail to meet expectations. The trick is that the rubric would need to be department-sanctioned.
The process then asks faculty to take 20 seconds, after they’ve grade a particular assignment (perhaps the last major paper in a semester), to also complete the rubric for each paper. The rubric might not match the assigned grade exactly, because each faculty member might have perfectly good reasons for emphasizing other factors more heavily. That’s fine.
For the price of seconds per paper, and some computing and administrative costs (which will probably be borne by the assessment office), you actually get some very interesting information. And what’s helpful about it is that the information comes in a format that’s comparable and relatively neutral. Assuming your assessment office has a decent stats package, they will be able to aggregate and disaggregate the data according to almost any factor you can imagine. (“Does student performance on crafting a thesis improve if they take the 200-level comp elective? By how much?” “Do students improve their ability to read literarily after they’ve had the intro to the major course?”) The questions you can put to the data are limited only by your imagination.
Note what this doesn’t require: It doesn’t require yielding any ground on the complexity and subtlety of literary analysis. It doesn’t mean substituting a paper for a multiple-choice test. It probably does require some work on a department’s part to achieve inter-rater reliability on the rubric. (We practiced this at a department meeting a week or two ago . . . let’s just say that right now our reliability is very low. [Though I’m delighted to say that A and I had quite similar scores, which is good for domestic tranquility!])
This isn’t a perfect system, to be sure, and obviously it’s just a pilot (though I think the assessment guru has used this model elsewhere), but it does suggest that, with some creative thinking, it ought to be possible to figure out a way to avoid the Scylla of Peter Mac Grawler and the Charybdis of sheer statistical fetishism. I think it would be nice if people just took our word for it about teaching, or decided that because we publish research for an audience of other scholars that our teaching must be ok, but I don’t think that we live in such a world. More than this, one of the things that’s frustrating about teaching (as opposed to research, or even blogging) is that so much of it is hidden–from other students (who turn to ratemyprofessor.com in desperation), from our departmental or university-wide colleagues, and from interested outside parties. Done reasonably, assessment can help give this work the attention it merits.