Video game voice actors are fearing that the ability for generative AI to replicate their voices may cost them work and, more fundamentally, control of their own voice.
Video game voice actors are fearing that the ability for generative AI to replicate their voices may cost them work and, more fundamentally, control of their own voice.
Provided you can get the model to land at a place where it’s replying like a well constructed character and not, well, an AI model (hopefully through the input and effort of a talented and well-supported writing team) I don’t see a future where this isn’t where this kind of tech lands. Games are always striving for some sense of realism (some correct away from reality but the driving force of the games industry is in that direction) and while I don’t think absolute realism is a healthy direction for people to aim, realistically realizing characters has a lot of room for unique and incredible games to strive for. Obviously bespoke writing is still the heart and soul of a good narrative but there’s some areas it can’t really cover and I think that tech like this is great for covering that uncharted water
Given that, the voice actors who train these models for moment-to-moment interactions and other stuff that can’t really be easily written for if a game’s content creates a need for it really NEED to be properly compensated otherwise it’s an incredibly unhealthy precedent for the industry. The speed at which this sort of AI develops outstripping proper precedent (legally and professionally) is much scarier to me than some sort of like, ai-overlord type future
I think the forseeable future will give us a hybrid solution where a writing team creates most of the content (dialogue for the main story and important side quests, character backstory, distinctive mannerisms) and AI fills in the rest.
One of the main problems with branching narratives is that it makes writing and recording dialogue very expensive. The upcoming Baldur’s Gate 3 has something like 170 hours of cutscenes and players will see less than 10% in a single playthrough. Not to mention hundreds of thousands of dialogue lines. Developers have to find techniques to reuse as much as possible which leads to situations where the ending consists of a loosely connected list of applicable scene snippets. Now imagine that AI can fill in the gaps between those snippets to make them seem like a single continuous sequence.
AI can also fill in events that the developers could never anticipate. Imagine you killing a random blacksmith in Skyrim. With current technology, NPCs would either not react at all or give a generic “killing innocents is bad”. How awesome would it be if the game would automatically generate a prompt from the basic facts: npc refuses to give discount, player kills npc, npc was blacksmith, player steals dead npc’s wares, wares are needed for sidequest, … and then use that to provide not only companion dialogue but also possible replies for the player. If this happens multiple times, maybe the companion will mention it in other situations or confront the player when they’re alone. Imagine if during a long walk through the wilderness, your companions start talking about what happened during the last few days.
With a fully AI-generated character, this would all become very generic and unnatural but if every character can extrapolate from a few hundred handwritten lines to match their tone, this could actually work.
I think one of their biggest concerns is that the majority of the work will be done by the AI and then a significantly smaller team of random writers will be hired on a very short term contract to merely check the work over and dot the t’s and cross the I’s.
I’m actually not too concerned about that. Yes, companies will try it because it saves money. But that will have a serious impact on quality and I still have hope that players will finally learn to just not buy a bad product. Sure, the bigger publishers will be able to sell through brand recognition alone for a while but not forever. This year, we’ve seen a lot of unfinished games and at least reviewers are starting to notice. The difference is that bugs can be fixed to recover from a bad launch. Bad content not so much.
I think the example of the NPC follower in Skyrim is a great place for it, being able to ask your follower “which way to Riften?” is kind of neat, giving them commands like “go stand by the rock and attack the wolf when they get close”. For actual dialogue, take Johnny in Cyberpunk or Liara in Mass Effect, there’s no way AI can stand in for voice actors.
I’m thinking shop owners, random NPCs, etc. Little bits of dialogue.