Categories
Uncategorized

NVIDIA Maxine – using neural networks to resolve bandwidth issues in video calls

With the recent pandemic, many infrastructures are moving online. Be it businesses, schools, government, they all utilize some form of the video call to replace face-to-face communication. However, this is not a perfect solution due to the serious issue of bandwidth. Bandwidth can cause a video call to appear disrupted and interrupt the video call experience.

NVIDIA Maxine, as defined on its website, is a platform SDK that allows video conferencing developers to utilize AI to create models in the cloud. It works by establishing neural networks to predict facial features based on a set of static points. This technology is similar to image recognition where neural networks are also utilized. Maxine works by sending an initial image to base the calculations on, then sends just key points around the eyes, nose, and mouth in order to adjust the initial image. A visual demo can be seen here in this video:

Visual demo of NVIDIA Maxine

As mentioned in the scholarly article by Xie et al., image recognition can be explored via randomly wired neural networks. These are similar to what we learned in CSCC46, such as the Erdos-Renyi Model, Watts-Strogatz Model, and Kleinberg Model. All of them utilize a form of randomness to populate the edges in the graph. In the scholarly article, the experimenters actually used the ER model and WS model to compare against the actual results, as demonstrated below.

Figure 3
Figure 4

Surprisingly, it was found in the study that no random rewiring actually resulted in worse performance when it came to image recognition. This was tested using WS model with P=0, meaning there is no probability that a edge will be rewired. The results of this can be seen in Figure 3, where no randomness performed worse than those with randomness.

Relating to NVIDIA Maxine, the random model results on image recognition can provide a foundation in the prediction of movements. Furthermore, we can also measure the damage that a node removal or an edge removal introduces. This would help understand the importance of articulation points and bridge edges. The results of such a process on the models can be seen below:

In conclusion, using neural networks, we can resolve bandwidth issues by ensuring less data is transported over the Internet. To do this, neural networks can use image processing and image recognition techniques to predict frames simply by using a number of static points for reference. With NVIDIA Maxine, we may see more online video calls in the future without the issues of bandwidth.

Sources:

  • https://www.theverge.com/2020/10/5/21502003/nvidia-ai-videoconferencing-maxine-platform-face-gaze-alignment-gans-compression-resolution
  • https://openaccess.thecvf.com/content_ICCV_2019/papers/Xie_Exploring_Randomly_Wired_Neural_Networks_for_Image_Recognition_ICCV_2019_paper.pdf
  • https://www.youtube.com/watch?v=eFK7Iy8enqM

Leave a Reply