Unity between audio and visual cues, key to detecting deepfakes

Monash University

As deepfake videos become increasingly difficult to detect, advanced artificial intelligence (AI) is being applied to uncover the disharmony between the audio in a video and the visual of the person speaking.

In a collaboration between Monash University and the Indian Institute of Technology Ropar, researchers have trained machine learning algorithms to detect deepfake videos based on the dissimilarity in patterns between the audio and visual cues.
The algorithm breaks down the video based on segments and analyses each section to produce a ‘dissonance score’ based on the disharmony it has detected between the audio and visuals. This could be anything such as unnatural facial and lip movements or a lag in the audio.
A deepfake is a video that has been manipulated to show someone saying or doing something that never happened. As technology becomes more advanced, the lines between reality and fake news are becoming increasingly blurred, leading to the spread of misinformation on crucial issues like the current COVID-19 pandemic, and the upcoming US election.
As authorities and tech companies struggle to keep up with the advancements in deepfakes, this research offers a potential solution to identify manipulated videos circling the internet.
Project Lead, Dr Abhinav Dhall from the Faculty of Information Technology (IT) at Monash University, says the dual deepfake detection approach is essential to overcoming misinformation online.
“The machine learning method we’ve developed is applying a detection technique similar to watching a foreign film with overlaid audio that is not in sync with the lip movements. This disharmony between the audio and the visual leads the viewer to notice that the video isn’t quite right, which is what we’re mimicking with the machine learning algorithm,” he said.
“By producing a ‘dissonance score’, the algorithm detects if something isn’t quite right in a video and then identifies the exact part of a video that has been manipulated. The machine learning algorithm independently learns from these discriminative features, further advancing its ability to detect future deepfakes.”
Initial research experiments on existing deepfake datasets of over 18,000, has shown that this particular approach has outperformed other advanced deepfake detection methods and can correctly distinguish between real and manipulated videos, with a success rate of 91.5%.
Associate Professor Ramanathan Subramanian, from the Indian Institute of Technology Ropar, explains the urgent need for advanced deepfake detection methodologies.
“Deepfakes are becoming an increasingly major concern worldwide. They build upon the problems created by fake news and pose a huge potential threat to democracy. With the upcoming US election, AI-generated images, audio and video are increasingly affecting our ability to separate fact from fiction in the political sphere. A reliable deepfake detection algorithm is needed now more than ever,” said Associate Professor Subramanian.
The research paper was presented at the 2020 ACM Multimedia Conference. To see how the platform works, please visit this video link.
/Public Release.