The Impact of Deepfakes – CSCC46 2020 Course Blog

An interesting application of machine learning and artificial intelligence that has become easily accessible and usable in recent years has been deepfakes, which are alterations of visual and audio elements in a video to create realistic and possibly indiscernable new videos which often overlay a new persona onto a scene. Deepfakes have become very popular in internet culture, with the use of it in memes which take popular celebrities and media figures images and make them sing the song “Baka Mitai”.

I’m sure most of you have seen one of these memes if you browse social media, and when I first saw these memes, it got me interested in how it was possible to make a historical figure who has been deceased for decades to sing a song from a game made in the 21st century.

Image depicting the First Order Motion Model, which can make deepfakes from a single image and keypoints. (https://aliaksandrsiarohin.github.io/first-order-model-website/)

Upon looking deeper into this subject, I found that this had a lot of possible applications, many of which were negative, such as identity/financial fraud, which has made this a dangerous technique that requires monitoring and limitations. For this new wave of deepfakes, they make use of a recently published technique from a 2019 paper, which only requires a single image, and utilizes a dense motion network in determining the changes needed to be made to the image, as well a generation network for creating the new video.

These networks are made through the encoding and decoding of the keypoint layers, in what is called an autoencoder, and is represented in the same way as the graphs of networks we have been learning in class.

Figure 4. Autoencoder: A DNN architecture commonly used for generating deepfakes Deepfakes: Trick or treat? – Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Autoencoder-A-DNN-architecture-commonly-used-for-generating-deepfakes_fig2_338144721 [accessed 2 Oct, 2020]

You can see how these networks are represented by a directed graph, with keypoints being the nodes, and edges representing connections to keypoints of other keypoint layers in the network. This is a simplified version, but represents what the dense motion network and generation network are doing, by encoding and decoding the keypoints in the graph, they can identify the keypoints that need to be modified to create the desired output motion on the input image. Another interesting thing I noticed about this graph is that it is a directed acyclic graph (DAG), which makes sense as the algorithm should not need to go back to a previous frame in the video it has already modified.

It is interesting how such a complex application can be abstracted down to where anyone can use it, and has resulted in the use for such mundane things as internet humour, despite the much more sinister uses it could and likely has already been used for. As this technology continues to advance, it could many security issues, but I don’t think that should stop us from developing it further and spreading this knowledge through dumb memes. I encourage you to try it for yourself, as it is extremely easy to try it out and you might learn something new, as I have only very briefly outlined the basics of how this works.

Article: https://www.technologyreview.com/2020/08/28/1007746/ai-deepfakes-memes/

About the Algorithm:

https://aliaksandrsiarohin.github.io/first-order-model-website/

Leave a Reply Cancel reply