The last month of the year cannot be called a good moment for large-scale announcements, since most are switching to the “let's go after the holidays” mode, but judging by this rich collection in the field of machine learning, work was in full swing in December. Therefore, with a slight delay, please meet the twelfth issue of the digest, in which we will tell you about the most important thing that happened in ML at the end of 2020.
MuZero
DeepMind unexpectedly published an article about MuZero , an algorithm that can play both popular logic board games like chess, Shogi and Go, and Atari video games like Pac-Man.
MuZero tries to model not the entire environment, but only certain aspects that are important for the agent's strategic decision-making process. The algorithm constantly collects information about the current and previous state of the game - thus studying prohibitions and rewards. So, for example, the model understands that in chess the goal of the game is to checkmate, and in pakman it is to eat the yellow dot.
There is another important advantage: MuZero reuses the learned model to improve planning, rather than to collect new data about the environment. For example, in Atari games with a complex changing environment, the algorithm used the learned model 90% of the time to reschedule what should have been done in past game sessions.
Why is it important. Essentially, MuZero is a general-purpose model that can be used to solve complex real-world problems that are difficult to reduce to simple rules. DeepMind offers such an analogy - the new approach is similar to how a person in cloudy weather decides to take an umbrella to stay dry, while previous approaches would try to simulate the order in which the raindrops would fall.
Infinite nature
Everyone has at least once seen a spectacular drone footage flying along the picturesque coastline. An algorithm trained on similar videos from youtube synthesizes video from one static image.
The task is very difficult, as it is necessary to generate new images, which can be very different from the input data - the photo often contains trees and rocks that obscure the fragments of the landscape located behind them.
The novelty of the approach is that it is able to synthesize images taking into account the geometry of the scene, which covers large distances over hundreds of frames. The dataset is already available , but the source code will have to wait.
Time Travel Rephotography
A neural network for the restoration and colorization of old photos, reminiscent of DeOldify. Unlike conventional image restoration filters, which apply independent operations such as noise reduction, painting and upscaling, StyleGAN2 is used here to synthesize a face close to the original. The output is portrait photographs in color and high resolution. The code is also promised to be rolled out later.
pi-GAN
Another GAN model that generates a 3D representation of an object from several unallocated 2D images. The demo shows how the model can be used to rotate the head, similar to what Nvidia previously demonstrated in Maxine.
Neural Scene Flow Fields
A new NeRf method that builds a dynamic scene representation from a video captured with a conventional camera. This allows, for example, to freeze the frame and move the camera, or vice versa to fix the camera, but as if to rewind the time. The algorithm draws an environment with a complex structure, for example, with thin objects like gratings, and moving objects like soap bubbles.
YolactEdge
The first instant image segmentation method that works in real time on weak devices. The source code is already available .
ModNet
A technology that allows you not only to qualitatively remove the background from portraits, but also to replace the background with video. In fact, this can be a good replacement for a chroma key. Unlike the paid remove.bg, there is also a source code , a collaboration and even a web application with a simple interface, in which you can only test working with photos.
Svoice
Facebook has finally published the source code of an algorithm that detects the voices of several speaking people on audio recordings.
Hypersim
Apple has published a dataset with segmentation masks for fake scenes. Nearly two terabytes of ultra-high resolution room renders. The marking of the data here is at the level of individual pixels.
ArtLine
An open model that transforms a photographic portrait into a pencil sketch. So far, it does not cope well with textures of clothes and with shadows, but in general it gives decent results. It is based on the DeOldify architecture, which allows for good facial recognition.
That's all, December turned out to be so surprisingly intense. The beginning of the year also promises to be interesting. We can't wait to see what's coming in January based on OpenAI's Dall-E. As they say, stay tuned!