PinnedEvalGen: Helping Developers Create LLM Evals Aligned to Their PreferencesEval maxim: To grade outputs, people need to externalize and define their evaluation criteria; however, the process of grading outputs…May 14May 14
What AI engineers can learn from qualitative research methods in HCIMeet inductive coding and grounded theory, the new bread-and-butter of LLMOpsJan 9Jan 9
AstroBot Must Win Game of the Year, or the Games Industry is DoomedWhy are two-thirds of GOTY contenders RPGs?Nov 28, 2024Nov 28, 2024
LLM Wrapper Papers are Hurting HCI ResearchNaming the problem and deciding what to do about itJun 6, 2024A response icon1Jun 6, 2024A response icon1
No, ChatGPT does not have seasonal affective disorderA Perilous Tale of Why We Check Assumptions Before Running Statistical TestsDec 12, 2023Dec 12, 2023
Kicking the Leg Out From the Table: On Contrived Controls in HCI Systems ResearchConsider this table. This table has four legs. To run a controlled study on the user experience of this table, we kick out one of the legs.Oct 4, 2023Oct 4, 2023
Evaluate LLMs, right in your browser. Share your experiments with others.No installation or login requiredJul 5, 2023Jul 5, 2023
Introducing ChainForge: A visual programming environment for prompt engineeringWhat prompt should I use? What model should I use? At what settings?May 23, 2023A response icon2May 23, 2023A response icon2
On Notational Programming for Notebook EnvironmentsWhen we speak of “writing” code, we — almost always — imagine typing it. But will we always?Oct 15, 2021A response icon1Oct 15, 2021A response icon1