Kicking the Leg Out From the Table: On Contrived Controls in HCI Systems Research
Recently I’ve noticed a troublesome trend in HCI systems papers. Before I explain exactly what the problem is, though, let’s do a thought experiment.
Consider this table. This table has four legs. To run a controlled study on the user experience of this table, we kick out one of the legs. Which version of the table will users prefer?
Obviously the complete table. Part of the effect though isn’t just the performance of the table —say this table still functions with three legs — but that users sense something is missing. This is especially true in a within-group study. Users are predisposed to like a system with more features than less, when those features had an obvious purpose to begin with. (One or two users will, undoubtedly, not care about the difference.)
“What are you talking about?” you may ask. “No one would construct a table and then deliberately hamper its usefulness to ‘prove’ its worth!”
Unfortunately, I’ve noticed that some HCI systems research today does exactly that — construct a table, knock out one of the legs for the control, and “prove” that users like the four-legged table more, especially with quantitative evidence. (The point is not to single out individual authors here, but to describe a general trend, and why it’s problematic.)
Consider why running a “controlled” study against the three-legged table may appeal to authors and reviewers alike. First, authors can claim they’re following an expected method. That makes reviewers happy — after all, you’ve run a “controlled” usability study, which “everyone knows” you need to do [1]. Then, how can one disagree with the flurry of quantitative scores and small p values and askerisks*** about the significance of the fourth leg? Reviewers’ science-itch is scratched — one can present neat bar charts comparing conditions, which rhetorically function to say “look, science!” Not to mention those neat conclusions: “Users preferred the fourth leg,” authors wrote, “on a number of dimensions.”
There’s some important nuances about the situation that I’m concerned about. First, the table must be a new design —not a pre-existing, established table that we’re improving on (in which case, we’re adding cabinets or a glass top, not hampering the original design). Second, the main contribution argued for must be the table, not answering a scientific question through the table (in the latter, we’ve created a prototype just to run a study answering specific questions about behavior; in the former, we’ve designed a table and then are looking to contrive a control to prove its effectiveness).
What do we in HCI lose from kicking the leg out of a perfectly fine table? The research focus shifts — from important questions about how people use tables or how tables could be designed, to how an extra leg is “so much better” than three: stability, aesthetic appeal, etc. Well, we already knew that going in. What did we learn about how people use tables, or in what contexts they use them? No clue. If such questions are addressed at all, they will appear after quantitative evidence, buried in results.
Going forward, when you read or review systems papers, ask yourself: have these authors kicked the leg out from under their table, to “prove” its value? If they have, be skeptical: how justified was that choice?
~Ian Arawjo
[1] Greenberg & Buxton. “Usability Evaluation Considered Harmful (Some of the Time).” ACM CHI 2008.
Bio
Ian Arawjo is a Postdoctoral Fellow at Harvard University working with Professor Elena Glassman in the Harvard HCI group. In January 2024, he will be an Assistant Professor of HCI at the University of Montreal, where he will conduct research at the intersection of AI, programming, and HCI. He holds a Ph.D. in Information Science from Cornell University, where he was advised by Professor Tapan Parikh. His dissertation work studied the intersection of programming and culture.