Creating Samantha from “Her” by Fine-tuning GPT-3 on the Movie Script
--
If you ever talked to ChatGPT, you know how boring it can be.
But even with a carefully crafted prompt, it won’t be as accurate as if you fine-tuned the model on a specific question/answer dataset.
If you’ve seen the film Her with Joaquin Phoenix and Scarlett Johansson you know the kind of futuristic AI assistant I’m excited about.
The things Samantha can do in the film is quite fascinating but probably mainly because she can access Theodore’s computer, read emails, make phone calls and access the internet.
Fine-tuning
ChatGPT is great but you can take things further by fine-tuning the GPT-3 model with some training data.
You don’t need a lot of data but it needs to be in a very specific format:
[{
"prompt": "USER: Hey, how are you?###",
"completion": "ASSISTANT: I'm good thank you!END"}
}]
The prompt and completion pair is essentially and example for the model to see how to respond to certain inputs.
The important thing to note here is that this won’t need to be an exact match later on when we interact with the assistant, this is just an example from which it can learn a style.
Using the movie script
Conversations in movies are perfectly suited to be used directly for fine-tuning because the prompt/completion can simply be a question/answer interaction between the film characters.
I searched online and immediately found the script for “Her” on a website:
I downloaded the text, and spent around 30 minutes removing all unrelated stuff to only leave me with the conversation between Theodore and Samantha.
I then went ahead and formatted it into the necessary "prompt"
and "completion"
pairs: