The non-player characters (NPCs) you encounter in video games might be skilled at fighting or giving you valuable items, but nobody has ever accused them of being clever conversationalists. Some simply grunt or utter two-word replies. Even the more elaborate NPCs often only possess a few preprogrammed lines of dialog, each recorded by a human voice actor, leaving the computer characters liable to repeat themselves if you talk to them for more than a minute.
AI is upgrading NPCs’ conversational skills. Since the release of OpenAI’s ChatGPT, developers have been modifying existing games to include NPCs that fuse GPT with text-to-speech technology to generate original, audible speech. Some of these so-called mods even take players on brand-new storylines, including one in the Grand Theft Auto V universe where you play as a police officer in the Los Santos Police Department (LSPD).
Voice AI could be a major evolution in game design. You can catch a glimpse of it through a recently released demo from Replica Studios that features a modified version of the game Matrix Awakens that lets you use your own microphone to converse with NPCs on city streets. The demo offers a clunky yet fascinating glimpse into how voice AI could soon make video games far more immersive, allowing game studios to scale up the social interactability of virtual worlds in a way that would’ve been practically impossible with human voice actors.
To give NPCs conversational abilities, Replica’s Smart NPCs system fuses ChatGPT with text-to-speech technology. The software, which runs on Unreal Engine 5, also automatically syncs the NPC characters’ lip and body movements with their speech. On the manual side, developers can customize each character to have a unique voice, backstory, and set of emotional dispositions and motivations — a process that determines not only what the characters say to you but also to each other.
When I tried the demo, I didn’t hear NPCs conversing among themselves, but I did have a few conversations that offer a glimpse of the current quality of the technology. Every character I spoke to was willing to have some kind of conversation, and they were even able to recall past topics we had discussed when I caught up with them a couple of minutes after beginning the conversation.
But not all NPCs seemed excited to talk. On a street corner, I asked a woman what her name was and she replied, “My name is Chelsie, and I come from a world of disappointment.” I asked her if she knew she was in a computer game. “I don’t know what I am or where I am. Nothing makes sense anymore.”
Fair enough: She is trapped in the Matrix, after all. I tried to see whether the characters had been instructed to avoid breaking the fourth wall by never talking about the famous movie series on which the game is based. A few characters implied having no knowledge of the films. But then I asked one character to name a science fiction movie from 1999: She offered up The Matrix, saying it’s one of her all-time favorites.
To test the limits of the characters’ knowledge of themselves, I asked a man dressed in a grey suit what color his clothes were. He said he was wearing a rainbow shirt. When I told him that was wrong, he said he must have been daydreaming.
Replica’s Matrix Awakens project was a demo, so it’s likely that future games using AI voice systems would be able to fine-tune NPCs to make their speech align more closely with the reality of the game world. Still, these hiccups raise questions about “jailbreaking” NPCs in future games: Considering that jailbreakers have already gotten ChatGPT and GPT-4 to speak in ways their developers never intended, it remains an open question whether creative gamers will be able to get NPCs to say — or even do — similarly surprising things.
A new era of indie games: In terms of building massive, socially immersive worlds, AI voice systems may soon enable independent game studios to more easily compete with the industry’s big names. Games made by major companies (often called AAA studios) can feature tens of thousands of lines of dialog. For example, Rockstar Games’ Red Dead Redemption 2, which had an estimated budget upward of $500 million, contains 500,000 lines of recorded dialog spoken by more than 1,000 voice actors.
Using AI, independent studios could affordably hit that kind of quantity. Quality is a separate issue. Professional human voice actors still outperform synthetic speech systems in terms of sounding natural, emotive, and compelling. But gamers might not care much about the realism of NPC speech. After all, independent games are becoming increasingly popular even though they generally lack the kind of hyperrealistic graphics seen in AAA games.
To get the best of both worlds, developers might also opt for a mix of AI-generated and voice-actor dialog, as Replica Studios wrote in a blog post:
“We think one way of shipping games with 10x or 100x more voice acting would be through a combination of AI voices with generative AI models like ChatGPT so that some % of the ‘voice acting’ is done autonomously, by NPCs responding directly to player actions and commands in the game.”
Nonlinear narratives: The biggest evolution that voice AI could bring to games — whether indie or AAA — arguably lies in story. Today’s games often use sophisticated tricks to give players the illusion that they’re controlling their fate, such as karma systems that track your behavior and give you a reputation that influences how NPCs interact with you. These tricks can change the storyline and trigger certain lines of NPC dialog, but in reality, this “branched storytelling” strategy only ever lets players select from a small batch of predetermined fates.
AI could someday enable game storylines far less linear, potentially freeing up players to create narratives the developers never dreamed of.