Descript launched their podcast studio app.
As I wrote in DT #18
, Descript is a great example of a productized AI company:
Descript takes an audio file (like a podcast or conference talk recording) as input and transcribes it using machine learning. Then, it lets you edit the transcript and audio in synchrony, automatically moving audio clips around as you cut, paste, and shuffle around bits of text.
The team has now launched a multitrack podcast production app using this same technology. As they put it, it’s “the version of Descript we’ve dreamed of since conceiving of the company.” The podcast studio allows you to edit multiple speakers’ audio tracks by editing the transcribed text of what they said; Descript takes care of splicing and syncing all the audio.
It also comes with some crazy new (beta) functionality called Overdub. The feature lets you replace a few words of a transcript and then uses your newly inserted text to generate an audio version of what you typed in your own voice.
Sounds amazing! But also dangerous—what if someone has a recording of your voice? Can they just make a convincing audio clip of you saying whatever they want? Nope. Lyrebird, the team behind the feature, has built in safeguards to prevent that from happening:
Invariably, to first experience Overdub is to experience wunderschrecken—a simultaneous feeling of wonder and dread. Rest assured, you can only use Overdub on your own voice. We built this feature to save you the tedium of re-recording/splicing time every time you make an editorial change, not as a way make deep fakes.
The Lyrebird team deserves credit for figuring this out — in order to train a voice model, you need to record yourself speaking randomly generated sentences, preventing others from using pre-existing recordings to create a model of your voice.