Conversations

Browse the 200 human-AI conversations the benchmark was scored against. Each conversation includes the participant's pre- and post-chat PANAS scores, emotion tags per turn, and the model's response. Download the dataset on GitHub →