It's impossible to "spoof" someone's voice using audio alone.

I was reading about voice spoofing, the new buzzword. But I really don't think it's possible to "spoof" someone's voice using audio alone. I think what this really means is that deepfakes are getting better to the point where you can convincingly make a model do the voice of someone and say what you want them to say. But I don't think that means you can currently use all the audio of the person and just create 100% unique speech. It's like they are calling this modification of facets of styleGAN voice models "spoofing" without it actually being that. It's just a better voice model. Real voice spoofing means they can take the facets of this voice model and derive styleGAN parameters that are then indistinguishable from the real person, and that just doesn't seem like it's at that point. Real voice spoofing would mean that they can make a convincing voice model out of a single audio clip with 10 seconds of speech.<br><br>Edit: I stumbled upon a research paper for voice spoofing which clearly states that voice spoofing is voice transformation. They took the voice of one person and ran it through a voice transformer using the voice model of another person, and it's almost indistinguishable from the real person. They call it voice spoofing, I call it voice transformation. The terms are used interchangeably, but one should never be equated to the other imo.

It's impossible to "spoof" someone's voice using audio alone.

Comments (25) 43924 👁️