On Wednesday, Replicate developer Charlie Holtz combined GPT-4 Imaginative and prescient (generally referred to as GPT-4V) and ElevenLabs voice cloning know-how to create an unauthorized AI model of the well-known naturalist David Attenborough narrating Holtz’s each transfer on digital camera. As of Thursday afternoon, the X post describing the stunt had garnered over 21,000 likes.
“Right here we now have a outstanding specimen of Homo sapiens distinguished by his silver round spectacles and a mane of tousled curly locks,” the false Attenborough says within the demo as Holtz appears on with a smile. “He is carrying what seems to be a blue cloth overlaying, which may solely be assumed to be a part of his mating show.”
“Look carefully on the delicate arch of his eyebrow,” it continues, as if narrating a BBC wildlife documentary. “It is as if he is within the midst of an intricate ritual of curiosity or skepticism. The backdrop suggests a sheltered habitat, presumably a communal feeding space or watering gap.”
How does it work? Each 5 seconds, a Python script referred to as “narrator” takes a photograph from Holtz’s webcam and feeds it to GPT-4V—the model of OpenAI’s language mannequin that may course of picture inputs—by way of an API, which has a particular immediate to make it create textual content within the model of Attenborough’s narrations. Then it feeds that textual content into an ElevenLabs AI voice profile skilled on audio samples of Attenborough’s speech. Holtz offered the code (referred to as “narrator”) that pulls all of it collectively on GitHub, and it requires API tokens for OpenAI and ElevenLabs that value cash to run.
Whereas a few of these capabilities have been out there individually for a while, builders have not too long ago begun to experiment with combining these capabilities collectively because of API availability, which may create shocking demonstrations like this one.
Throughout the demo video, when Holtz holds up a cup and takes a drink, the faux Attenborough narrator says, “Ah, in its pure surroundings, we observe the subtle Homo sapiens participating within the crucial ritual of hydration. This male particular person has chosen a small cylindrical container, possible crammed with life-sustaining H2O, and is tilting it expertly in direction of his consumption orifice. Such grace, such poise.”
In a special demo posted on X by Pietro Schirano, you’ll be able to hear the cloned voice of Steve Jobs critiquing designs created in Figma, a design app. Schirano used the same approach, with a picture being streamed to GPT-4V by way of API (which was prompted to answer within the model of Jobs), then fed into an ElevenLabs clone of Jobs’ voice.
We have previously covered voice cloning know-how, which is fraught with moral and authorized considerations the place the software program creates convincing deepfakes of an individual’s voice, making them “say” issues the true individual by no means mentioned. This has authorized implications concerning a star’s publicity rights, and it has already been used to scam people by faking the voices of family members searching for cash. ElevenLabs’ terms of service prohibit folks from making clones of different folks’s voices in a manner that may violate “Mental Property Rights, publicity rights and Copyright,” however it’s a rule that may be troublesome to implement.
For now, whereas some folks expressed deep discomfort from somebody imitating Attenborough’s voice with out permission, many others appear bemused by the demo. “Okay, I’ll get David Attenborough to relate movies of my child studying how you can eat broccoli,” quipped Jeremy Nguyen in an X reply.