Scientists from the University of Stanford, the Max Planck Institute for Informatics, Princeton University and Adobe Research presented a software that allows users to edit the transcribed text of an address to add, delete, or change the words of a person in video.
In the presentation, which shows the original excerpts of the phrases and then the work of the computer program in the speech, one can see the change, for example, when one of the participants says:”I love the smell of napalm in the morning “—I love the smell of napalm in the morning— and after applying the software:”I love the smell of french toast in the morning ” —I love the smell of French toast in the morning.
This new program-which is not yet commercialized— is one of the innovations of deepfake technology, which revived the Mona Lisa, Marilyn Monroe, Salvador Dalí and Albert Einstein, but which for some time also makes it increasingly difficult to distinguish on the internet what is false, and poses a serious ethical problem, when it has already been used to discredit important public figures.
Nancy Pelosi, Speaker of the House of Representatives, was one of them. Donald Trump’s lawyer Rudy Giulliani shared on Twitter a modified video of the Democrat, in which she appeared incoherent, stuttering and apparently drunk.
The representation was seen by more than 2.4 million people before Facebook removed it from its original source, the Political Watch Dog account, for violating its terms and conditions.
Another video related to the deepfakes was released by Jordan Peele —director and screenwriter, winner of an Oscar, who mimicked the voice of the former u.s. president Barack Obama, saying, “complete idiot” to Trump, and superimposed on an original video, to raise awareness of the political risks that might have this tendency.
To create these videos the software isolates the phonemes the person says in the speech, and combines them with the visemas —the facial expressions that correspond to each sound. He then creates a 3D model of the lower face from the original video.
The fake videos produced were viewed by a group of 138 volunteers, 60 percent of whom thought they were real. 80 percent stated that the unedited cuts examined were legitimate, although researchers clarified that this may be because the test participants knew the purpose of the experiment and could have been mediated by it.
So far, the algorithms used only work in close-ups, that is, in videos where only the person’s bust is filmed, and require 40 minutes of audio-visual data. The tone of the person’s voice cannot be changed, nor can there be any visual obstacles in the recording —that there is a handshake, for example— because the program can stop working.