Skip to main content
This assignment is due on Tuesday, April 30, 2024 before 01:45PM.

Project Milestone 4: Human Evaluation

Now that you’ve created an outline for your presentation that may or may not be missing experimental results, it’s time to populate some of these results! Since many of you are creating interactive experiences for your final projects, it’s important to think about how to design evaluations for those experiences and analyze the data you collect.

For this Milestone, we are asking you to think about what metrics you want to evaluate your system on and design an interface to perform this evaluation. We’ll use the last two class meetings to participate in other groups’ evaluations, so that each group gets a chance to collect human feedback.

Although this milestone is due before class on Tuesday, 4/30 (the last day of class), we will also be using Thursday, 4/25 for human evaluation. This milestone is due before class.

To be specific, for this Milestone, you should:

  • Design a human evaluation for your interactive system - this involves not just building the system and choosing metrics to examine, but also thinking about what research questions these metrics answer.
  • Create an interface for conducting your human evaluation, to be used in class on 4/25 and 4/30.
  • Write detailed instructions for your classmates to participate in your evaluation.
  • Write up your methods in a submission for this milestone.

Writeup Sections

The writeup should have the following sections. Much of the content will also be helpful when writing up your final report, so you might want to write it in the same format as your report.

Title & Author Names.

  1. Evaluation Metrics: You should have begun thinking about metrics in the previous milestone. In this milestone, you should explain why you chose these metrics, what parts of your interactive experience they test, and how they help answer the research question(s) you put forth.
  2. Collection Methodology: How do you plan to collect these metrics? This is a good section to include screenshots of your evaluation interface. If you’re using additional quantitative metrics that don’t require a human judge, include how you plan to collect them here too. You should go into specific detail about your system.
  3. Instructions to Evaluators: For a human evaluation, what do the evaluators need to do? Give them specific, detailed instructions for what to look for and how to interact with your interface. These should be written with your classmates as the audience in mind: they won’t necessarily be familiar with your system.

In addition to being an important part of a research paper’s methodology, conferences like ACL require authors to include information like this in their submissions (see section D of the Responsible NLP Checklist).


  • Metrics - 2 points
  • Methodology - 2 points
  • Instructions - 2 points
  • Interface - 2 points
  • In-class participation - 2 points

What to Submit

Submit the following to Gradescope:

  • milestone4.pdf which contains your milestone 4 writeup. To make grading easier, your writeup should include section headers corresponding to each of the bulleted points.
  • A zip file containing the code for your evaluation interface, if any. The code should include a README describing its high-level layout, any setup instructions, and how to run the interface.