You can’t use Meta‘s Voicebox AI – it’s too dangerous

(Image credit: Laptop Mag / Rael Hornby (Base image generated by Bing))

Meta released MusicGen, an AI text-to-music generator, open source for the public this week, allowing the world at large to make musical mayhem in 12 second installments until their heart's content. Now, Meta has introduced Voicebox, the most powerful AI text-to-speech generation software we’ve seen to date. So powerful, in fact, that you can’t have it – because you can’t be trusted to have it.

Meta did their homework on this one, they know that throwing this software out into the world would cause nothing but mayhem. Not an hour would pass before the internet was flooded with voice clips made by ner-do-wells of the most vitriolic things possible said through the voice of others. No. A tool of this magnitude should be used with incredible responsibility. Locked away tight and used by only the most trusted and reliable of society.

Which is why Mark Zuckerberg wants to use it to make NPCs in the Metaverse sound cool.

Meta AI Voicebox text-to-speech AI generator logo

(Image credit: Meta)

What is Meta’s Voicebox?

Voicebox is a state of the art AI model for not just speech generation but speech recording tasks, such as editing, sampling and restyling. The multipurpose generative AI tool is somewhat of a jack of all trades, suited to both converting text to human speech and editing the results. It can remove unwanted noises in recordings, reduce background static, as well as sample and modify existing recordings across six different languages.

While Voicebox, like many generative AI tools, was trained with over 50,000 hours of recorded speech (and transcripts from public domain audiobooks,) Meta have developed a new approach to learn directly from raw audio and an accompanying transcription. This allows Voicebox to better recognise samples fed into it, and for it to better alter specific parts of the recording, without having to regenerate the entire clip.

Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.More details on this work & examples ⬇️June 16, 2023

The product of which boils down to producing high quality audio samples that are genuinely representative of how people actually talk to one another in the real world – with Meta ensuring a diverse sampling of speech to accurately apply the same principle to other languages. The results are impressive too, with Meta hosting a selection of them on their recent blog post. I’m not even kidding when I tell you I have a suspicion that Zuckerberg’s voice over might actually be a product of the tool itself.

Meta believes that one day this technology will be vital to help creators and content producers with editing audio tracks, allowing the visually impaired to hear written messages from friends (in their voices,) and allow people to speak any foreign language in their own voice. That’s right, Mark Zuckerberg just oversaw the invention of the Babelfish.

And you can't have it.

Sadly, this isn’t one of the tools Meta feels comfortable about handing out so freely to the public at large. While Meta researchers have developed a “highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox,” the team still feels that there is a “potential for misuse and unintended harm.” No kidding.

While Meta don’t wish to share the final product, they have revealed the steps they took to get there – believing that publicly announcing this technology is something they possess and that they understand the risks and potential harms it poses while working on tools to authenticate real and generated audio to be the most ethical resolution.

Microsoft Twitter chatbot Tay — Microsoft Tay: A stark reminder of how quickly people can abuse AI tools when given the opportunity. (Image credit: Laptop Mag / Rael Hornby)

And you know what? Hats off to Meta on this one. It is the most ethical thing to do in that situation. While some would say that the most ethical thing to do would be to never develop it in the first place, it’s good to know that Meta are spending their resources on mitigating the damage such a tool could cause if misused. And it’s far better to announce it publically than one day be exposed as hoarding this technology, only for the most suspicious among us to wonder what Meta may have been using it for after all that time in the shadows.

The big Meta AI push is an interesting one to observe, with a genuine diversity of goals being explored all at once.

Back to MacBook Air

Acer

Apple

Asus

Lenovo

Microsoft

AMD Ryzen 5

AMD Ryzen 7

Intel Core i5

Intel Core i7

Intel Core i9

4GB RAM

8GB RAM

16GB RAM

24GB RAM

32GB RAM

64GB RAM

128GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

4TB

8TB

13.3-inch

13.5-inch

13.6-inch

14-inch

Black

Blue

Silver

White

New

Open Box

Refurbished

LED

OLED

Showing 10 of 308 deals

Filters☰

Apple MacBook Air M2 2022

(13.6-inch 256GB)

(13.5-inch 1TB)

Lenovo IdeaPad Duet 5 Chromebook

(13.3-inch 64GB)

Our Review

☆☆☆☆☆

(256GB SSD)

Asus ROG Strix Scar 18

(1TB 64GB RAM)

Our Review

☆☆☆☆☆

$3,039

View

Lenovo ThinkPad X1 Yoga (Gen 7)

(512GB 16GB RAM)

$749.77

View

Apple MacBook Pro 14-inch (2023)

(14-inch 512GB)

Our Review

☆☆☆☆☆

$1,464.95

$1,399

View

Microsoft Surface Laptop Studio

(256GB 16GB RAM)

Our Review

☆☆☆☆☆

$1,107.75

View

Microsoft Surface Laptop 4 13.5"

(13.5-inch 256GB)

$589.95

View

HP Envy 16

(1TB Silver)

Our Review

☆☆☆☆☆

$1,169.99

$760.99

View

Rael Hornby, potentially influenced by far too many LucasArts titles at an early age, once thought he’d grow up to be a mighty pirate. However, after several interventions with close friends and family members, you’re now much more likely to see his name attached to the bylines of tech articles. While not maintaining a double life as an aspiring writer by day and indie game dev by night, you’ll find him sat in a corner somewhere muttering to himself about microtransactions or hunting down promising indie games on Twitter.