Can an LLM pick out a lie?

Published in

Klaviyo Engineering

5 min readFeb 16, 2024

Context

In a previous post, I shared the history of a Klaviyo tradition. For years, we played a game of Two Truths and a Lie with each new hire. And beginning in 2019, we started capturing this game (the statements, the lies, and how people voted) as structured data.

Revisiting this game got me thinking. We have all of this data around the games. And we know when the participant stumped us and when they did not. How would a Large Language Model (LLM) be at playing?

I started by selecting a game at random and asking ChatGPT 3.5 to play:

Nice, it guessed correctly. I looked back in Slack to see how Klaviyo voted. It was close but we also guessed B (44% of votes were for B).

As ChatGPT reported, it based its decision on which of the three statements seemed least probable. While that’s a reasonable approach, in my experience the game isn’t always that simple.

I tried another game:

This time the guess was incorrect:

Klaviyo, interestingly, had gotten this one right, with 50% of the votes coming in for B. I have an idea as to why Klaviyo got this one right. In playing this game hundreds of times, Klaviyos (employees) may have noticed that the “I am a twin” statement is more often than not the lie. From our database, I see that the word “twin”:

has appeared in 10 lies: “I have a twin sister” (4x), “I have a twin brother (2x)”, “I have identical twin brothers”, “I’m a twin (2x)”, “I have a fraternal twin sister”.
has only appeared in 5 truths (and truths are twice as common): “I’m a twin”, “My mum is a twin”, “I share my birthday with my sister, but we aren’t twins”, “I have a twin sister”, “I have a twin brother who is 7 inches taller than me”.

From my work in 2021 on Klaviyo’s subject line assistant, I know how sensitive these models can be to the contents and structure of the prompt. From the model’s responses I felt like it was basing its guess almost entirely on the probability of the statement being true. I gave it a tip (the tip in bold below):

You are an assistant that plays two truths and a lie. Your job is to guess which of the three statements provided is the lie. Keep in mind that the person providing the statements may be trying to trick you.

I started to see better results using this prompt, so decided to put it to the test. I fine-tuned a model with the first 100 games, and tested it against the remaining 660. For fine-tuning, I used the following format:

{"messages":[{"role":"system","content":"You are an assistant that plays two truths and a lie. Your job is to guess which of the three statements provided is the lie. Keep in mind that the person providing the statements may be trying to trick you."},{"role":"user","content":"A: I play guitar in a novelty Halloween surf-rock band\nB: I was on Nickelodeon’s Double Dare\nC: I have waited in line overnight for SNL standby tickets"},{"role":"assistant","content":"B: I was on Nickelodeon’s Double Dare"}]}

One nice thing about the finetuned model is it always answers in the format of how I had trained it. It was easy, then, to evaluate whether or not it responded with the correct lie.

With this prompt, and the finetuned model, it was able to guess correctly 38% of the time. While this is better than random (33%), Klaviyo guessed correctly 46% of the time on the same 660 games.

I gave it more data. After fine-tuning on 200 games, the model had the advantage.

This was striking. The model outperformed Klaviyo even though our employees (called Klaviyos) had a lot of additional data at their disposal. They knew where the contestant worked, their name and job title, and they could evaluate intonation and body language displayed during the game. That seems like a huge advantage.

I decided then to reach out to a Klaviyo who has frequently guessed the lie, to see if he could provide insights that could help the model. Chas Rickarby has guessed correctly 73% of the time across 41 games. That’s impressive.

I tried adding Chas’s tips to the prompt. It seemed that the model was getting confused.

For example, for these statements:

A: I used to play the trumpet
B: I've met the chainsmokers
C: I speak fluent Spanish

The model replies: “Based on the advice given, I would guess that statement B is the lie. While it may seem outrageous, it could be a deliberate attempt to throw off the guesser. Statements A and C seem equally believable, so I would go with B as the lie.”

It recognized that B was least likely, but then went against Chas’s advice and still picked B. I tried turning his tips into a step-by-step algorithm, with some success, but the finetuned models with this algorithm in the prompt still underperformed the model with the more simple prompt.

At this point, I stopped experimenting. A model that could guess correctly 58% of the time was pretty good. It was time for a live demo. On Jan 25, 2024, we invited three new Klaviyos to play. After Klaviyo voted, I changed tabs to a Jupyter notebook and had the finetuned model guess the lie. Klaviyo only won one out of the three games. The model went 3 for 3!

Olivia Voler, a new Software Engineer at Klaviyo (whose three statements are at the top of the screenshot above) kindly allowed her game to be shared as part of this post. The game did not disappoint.

Can an LLM pick out a lie?

Context

Written by Ezra Freedman