You can’t use Meta‘s Voicebox AI – it’s too dangerous

A caged robot with microphone.
(Image credit: Laptop Mag / Rael Hornby (Base image generated by Bing))

Meta released MusicGen, an AI text-to-music generator, open source for the public this week, allowing the world at large to make musical mayhem in 12 second installments until their heart's content. Now, Meta has introduced Voicebox, the most powerful AI text-to-speech generation software we’ve seen to date. So powerful, in fact, that you can’t have it – because you can’t be trusted to have it.

Meta did their homework on this one, they know that throwing this software out into the world would cause nothing but mayhem. Not an hour would pass before the internet was flooded with voice clips made by ner-do-wells of the most vitriolic things possible said through the voice of others. No. A tool of this magnitude should be used with incredible responsibility. Locked away tight and used by only the most trusted and reliable of society.

Which is why Mark Zuckerberg wants to use it to make NPCs in the Metaverse sound cool.

Meta AI Voicebox text-to-speech AI generator logo

(Image credit: Meta)

What is Meta’s Voicebox? 

Voicebox is a state of the art AI model for not just speech generation but speech recording tasks, such as editing, sampling and restyling. The multipurpose generative AI tool is somewhat of a jack of all trades, suited to both converting text to human speech and editing the results. It can remove unwanted noises in recordings, reduce background static, as well as sample and modify existing recordings across six different languages.

While Voicebox, like many generative AI tools, was trained with over 50,000 hours of recorded speech (and transcripts from public domain audiobooks,) Meta have developed a new approach to learn directly from raw audio and an accompanying transcription. This allows Voicebox to better recognise samples fed into it, and for it to better alter specific parts of the recording, without having to regenerate the entire clip.

The product of which boils down to producing high quality audio samples that are genuinely representative of how people actually talk to one another in the real world – with Meta ensuring a diverse sampling of speech to accurately apply the same principle to other languages. The results are impressive too, with Meta hosting a selection of them on their recent blog post. I’m not even kidding when I tell you I have a suspicion that Zuckerberg’s voice over might actually be a product of the tool itself.

Meta believes that one day this technology will be vital to help creators and content producers with editing audio tracks, allowing the visually impaired to hear written messages from friends (in their voices,) and allow people to speak any foreign language in their own voice. That’s right, Mark Zuckerberg just oversaw the invention of the Babelfish.

And you can't have it.

Sadly, this isn’t one of the tools Meta feels comfortable about handing out so freely to the public at large. While Meta researchers have developed a “highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox,” the team still feels that there is a “potential for misuse and unintended harm.” No kidding.

While Meta don’t wish to share the final product, they have revealed the steps they took to get there – believing that publicly announcing this technology is something they possess and that they understand the risks and potential harms it poses while working on tools to authenticate real and generated audio to be the most ethical resolution.

Microsoft Twitter chatbot Tay

Microsoft Tay: A stark reminder of how quickly people can abuse AI tools when given the opportunity. (Image credit: Laptop Mag / Rael Hornby)

And you know what? Hats off to Meta on this one. It is the most ethical thing to do in that situation. While some would say that the most ethical thing to do would be to never develop it in the first place, it’s good to know that Meta are spending their resources on mitigating the damage such a tool could cause if misused. And it’s far better to announce it publically than one day be exposed as hoarding this technology, only for the most suspicious among us to wonder what Meta may have been using it for after all that time in the shadows.

The big Meta AI push is an interesting one to observe, with a genuine diversity of goals being explored all at once. 

Back to MacBook Air
Storage Size
Screen Size
Screen Type
Storage Type
Any Price
Showing 10 of 628 deals
Load more deals
Rael Hornby
Content Editor

Rael Hornby, potentially influenced by far too many LucasArts titles at an early age, once thought he’d grow up to be a mighty pirate. However, after several interventions with close friends and family members, you’re now much more likely to see his name attached to the bylines of tech articles. While not maintaining a double life as an aspiring writer by day and indie game dev by night, you’ll find him sat in a corner somewhere muttering to himself about microtransactions or hunting down promising indie games on Twitter.