Researchers from London have proven that modern speech synthesis technologies have reached a level where even attentive listeners are unable to determine where a human is speaking and where it is artificial intelligence.
Researchers from Queen Mary University of London and University College London have found that it is becoming increasingly difficult for people to distinguish between real voices and voices created by artificial intelligence. The results were published in the journal PLoS One.
In the experiment, participants were asked to listen to 80 voices—half were recorded by real people, and the other half were created using neural networks. Participants had to determine which of the voices belonged to a human.
It turned out that 58% of listeners mistook the deepfake voices for genuine ones. Meanwhile, only 62% of participants identified the authentic human voices as real—no statistically significant difference was found in the perception of the two voice types.
The scientists note that modern speech synthesis algorithms have reached a level where even trained listeners cannot confidently distinguish between real and generated voices. This not only opens up new technological possibilities but also raises serious concerns in the areas of security, copyright, and ethics.
According to the researchers, voice deepfakes could be used by malicious actors to bypass authentication systems, conduct phone scams, or manipulate people’s trust.


