As you know, we always include the most interesting publications on the topic of machine learning in the collection, and priority is given to projects with non-empty repositories. So, February pleased me with a number of services in this regard, so let's start with them. Go:
Papers with Datasets and Libraries
There is such a resource Papers with Code, the mission of which directly corresponds to the name - to aggregate publications from the field of machine learning that have code, as well as give the opportunity to offer their own implementation.
This month, they launched the Available Datasets section , which has already indexed over 3,000 research datasets. In the catalog, you can search for datasets by frequency of mentions, scope, data type and supported language.
In addition, they added the ability to search for pretrained image classification models that can be fine-tuned on your own datasets. At the moment, there are already 300+ of them, and the catalog will continue to grow.
Google Model Search
Accessibility: project page , repository
The success of a neural network often depends on how widely it can be applied to various tasks. When creating a model, you have to make a number of complex architectural decisions - how deep the neural network should be, what types of layers to use in it, etc.
Google has presented a platform that will help you find the right architecture for your dataset and task, which will reduce configuration and coding time and require less computational resources.
The library allows you to run algorithms out of the box on your data - regardless of the subject area, automatically select the optimal architecture, correct ensembles of models or distilled models.
ZenML
Accessibility: project site / repository
MLOps framework that simplifies the transfer of pipelines from laptops to production environments. Guaranteed reproducibility of training experiments due to versioning of data, code and models. The platform also allows you to quickly switch between on-premises and cloud environments, provides ready-made helpers for comparing and visualizing parameters and results, caching pipeline states for fast iterations, and much more.
TensorFlow 3D
Accessibility: Article / Repository
With the proliferation of devices that capture 3D data, such as lidars and depth cameras, the need for technology to process this data and understand the 3D scene has increased. This is necessary to navigate and work in the real world of self-driving cars and robots, as well as to improve AR technologies.
Google unveiled a modular library for applying deep learning to 3D data in TensorFlow. It contains training and assessment pipelines for 3D semantic segmentation, scene classification, 3D object detection, and more.
MeInGame
Availability: article / repository
In computer games, there is often a character editor that allows you to change the appearance of the player using the settings of various parameters. The MeInGame algorithm allows you to create a custom character with just one photo. The neural network predicts the shape of the face and its texture. Although methods based on the 3D Morphable Face Model (3DMM) can generate a 3D portrait from individual images, the mesh topology is usually different from those used in most games. The authors of this algorithm claim to have solved this problem.
SAM
Accessibility: article / repository
Simulating aging from a single photograph of a face is extremely difficult, as it is necessary to simulate changes in individual facial features and even the shape of the head, while maintaining a person's identity.
Internally, a StyleGAN is used, but here the researchers also use a pretrained age regression network with which the encoder generates hidden codes corresponding to the target age. The method treats the continuous aging process as a regression problem between the input age and the target age, providing precise control over the generated image. The model allows you to edit the generated images.
VOGUE
Availability: project page / interactive demo
New StyleGAN application case for virtual clothing fitting. The algorithm transfers clothes from a photograph of one person to a photograph of a person, which is submitted to the entrance. The method is based on the interpolation of the hidden space, taking into account the pose of StyleGAN2, which works with the body shape, hair, skin color of the target person. The algorithm allows the garment to deform according to a given body shape, while maintaining the pattern and details of the material. The output is photorealistic images at a decent 512x512 resolution.
NeRViS
Accessibility: Project Page / Repository
Existing video stabilization techniques either severely crop out frame boundaries or create artifacts and distortion. This algorithm
preliminary estimates dense deformation fields and uses adjacent frames to synthesize a complete stabilized frame. The novelty of the approach is a learning-based hybrid spatial synthesis that eliminates artifacts caused by inaccurate optical flow and fast moving objects.
Stable View Synthesis
Availability: article / repository
Based on a set of photographs depicting a scene from freely distributed viewpoints, the algorithm synthesizes new views of the scene. The method works on geometric scaffolding, which is calculated based on SfM photogrammetry. The target view is rendered by a convolutional network from the tensor of characteristics synthesized for all pixels.
The article was published back in November last year, but the code became available only now.
JigsawGan
Accessibility: article
Generative self-supervised neural network trained to solve puzzles. As an input, the model accepts randomly located parts of the image and without prompts restores the original image from them, that is, the model does not know what the image was originally.
CharacterGAN
Availability: article / repository A
generative neural network, which can be trained on only a few images of a character in different poses, to generate new poses based on the location of key points. This allows you to animate static images. The novelty of the approach is that the image is divided into layers, each of which is processed separately. This solves the problem of obstructions when a foreign object comes to the fore. For convenience, a GUI has been added that allows you to manually adjust poses by key points.
Discrete VAE
Accessibility: The Repository
In the last issue, we talked about the amazing DALL-E. At the end of February, OpenAI created a repository with the name of the model, but the model itself has not yet been released - only a part of the model is inside, namely the PyTorch package for discrete VAE. This is a variational autoencoder that, in our case, generates images from textual descriptions.
Deep nostalgia
Availability: online service
And finally, it's always nice when a simple and understandable product is made based on models. So, the company MyHeritage, which deals with the issues of genealogy and pedigrees, apparently took the First Order Model algorithm, screwed on a convenient user interface and made a service on its basis to โanimateโ photos.
The result is tons of generated custom content and huge viral reach. They also say that the AI โโbusiness is useless.
That's all, thank you for your attention and see you in a month!