OpenAI API Generated Video Game Dialog With Real-Time Text-to-Speech
Watching the recent progress in AI has been so fascinating that I wondered if it would be possible to use the OpenAI API to generate dialog for a video game.
I’ve been working on an FPS with proceduraly generated levels, so AI-generated dialog seemed like a logical fit. After some hacking, I got it to work!
NPC dialog lines are generated on the fly, different every time, facilitated by a custom prompt for each line in our Unity dialog editor.
Instead of writing the dialog directly, you tell the AI what kind of possibility space to write in, and give it some background on the character, the setting, and the particulars of what’s going on. It’s sort of like prepping a kid for an improvisational play.
You can additionally tweak the AI temperature variable, which controls the randomness of the generated output.
How it’s done
I’m using OpenAI’s Text-Davinci-003
model to generate the results. Each prompt is sent over the internet to the AI and gets back a response from the model that attempts to follow the prompt’s instructions. Generating this response from the pre-trained model is called AI inference. Some results are better than others.
For instance, here’s the prompt I’m currently using for our character Big Brain’s first line of dialog:
"Please provide a dialog line for a satirical science fiction game. No formatting. You are Big Brain, ruler and overseer of this domain. You talk like a snotty commander. The player is here in your room of the Capital City to see you at your request, from a long ways away.. You are wondering what he is thinking. The player is a small kill drone and is completely your underling. You feign concern for his welfare, but he is here to do your bidding. You are not to ask him what you can do for him, but instead enlighten him on what he must do for you. You are going to send him on a mission. Please continue in one or two very short sentences: "
And in response, I get something back like:
"Welcome, drone. I have a task for you. Listen carefully."
Or, on a different run:
"Welcome, kill drone. I have an assignment for you. Look no further for purpose or direction, for I have it all stored away in my extraordinary brain."
At first, this may seem like a lot of prompt for such a curt reply, but it pays to be specific when instructing the AI what to print back at you. If you don’t specify not to ask the player something, the AI will happily go ahead and do something like that—which doesn’t always make sense in the one way conversation that is our specific game.
There’s a bit of latency in the response from OpenAI, so the game asynchronously fetches all its lines at the beginning of the scene. For characters who speak audibly, like Big Brain, I send each dialog string to Google’s Cloud Text-to-Speech as needed, and then apply some live processing on the voice audio I get back.
It could get even cooler if I fed into the prompt some state information from the game, so it can surprise you with observations gleaned from your interactivity with the systems. It’s still early days with this kind of AI-generated dialog stuff, but I thought it was a cool milestone.
The cost
When sending prompts to the AI, you specify how many tokens you want back. A token accounts for a bit less than a word on average. For these lines of dialog for Big Brain, I’m asking for 100 tokens per line. OpenAI is currently charging 2 cents per thousand tokens requested of the Text-Davinci-003
model.
Here’s what that looked like in dollars to develop and test this scene:
So about $3.50 to develop this demo. Too expensive to deploy in a live game without a token limit, or perhaps caching the most common AI responses.
Quality of response
My prompts are pretty naive, with simple references. My guess is that refining or rewriting the prompts to use weirder and more specific references could help in improving the results I get back.
This process is something I refer to as ‘AI Whispering‘. I’m a novice at it, but I believe the potential is wide open, especially as AI models continue to get better.
Future directions
Procedurally crafting prompts from within the game is an obvious next step, providing each prompt with more background information gleaned from various game state variables.
Another obvious step would be to allow the player to talk back. We’ve seen what wild adventures these GPT-3 based interactive games can go on with experiments like AI Dungeon. But the flexibility of direct interaction with GPT-3 means it can easily veer off into situations that traditional game logic cannot currently cope with.
One solution would be to allow the AI to control the game from a specialized set of messages it learns in the prompt. You could let the AI do stage direction for your scene, or control the movements and actions of characters. Things like that.
It might be wise for AAA studios to create their own Large Language Models so they can conjure real-time procedural dialog without outsourcing to an external API like OpenAI’s.
Another interesting development is the announcement of LAION-AI’s Open Assistant. It is an open source Large Language Model, an effort that aims to democratize direct access to a ChatGPT style LLM.
It’s still early days for this kind of technique, and I’m excited to see what unfolds as these techniques become more commonplace and explored.
If you enjoyed this post, feel free to say hi on Twitter or Mastodon and ask any questions you might have.