Spatial Transformer Networks in MATLAB

This article will raise the topic of building custom layers of neural networks, using automatic differentiation and working with standard deep learning layers of neural networks in MATLAB based on a classifier using a spatial transformation network.





The Spatial Transformer Network (STN) is one example of differentiable LEGO modules that you can use to build and improve your neural network. STN, applying a trainable affine transform with subsequent interpolation, deprives images of spatial invariance. Roughly speaking, the task of STN is to rotate or reduce / enlarge the original image so that the main classifier network can more easily identify the desired object. An STN block can be placed in a convolutional neural network (CNN), working in it for the most part independently, learning on the gradients coming from the main network (for more details on this topic, see the links: Habr  and Manual ).





In our case, the task is to classify 99 classes of car windshields, but, first, let's start with something simpler. In order to get acquainted with this topic, we will take the MNIST database from handwritten numbers and build a network of MATLAB deep learning neural layers and a custom affine image transformation layer (you can see the list of all available layers and their functionality here ).





To implement a custom transformation layer, we will use a custom layer template and MATLAB's ability to automatically differentiate and build back propagation of the error derivative, which is implemented using deep learning arrays for custom training loops - dlarray (you can familiarize yourself with the template by the link  , you can familiarize yourself with dlarray structures for link ).





In order to implement the capabilities of dlarray, we need to manually register the affine transformation of the image, since the MATLAB functions that implement this feature do not support dlarray structures. Below is the transformation function we have written, the entire project is available here .





, , . , . , , - - .  





, . Y, . (, ), . .





Y

































, , STN. MNIST.





Network structure.
.
Learning outcomes.
.

, , , , , .





, MNIST, .





— , — , — RGB, , , . . , , , , 2, , 0, , , . , , STN  , , , . , . STN - , , dropout  STN.





, , [0;255], [0;1], — . .





Network input data with numbers.
.
Data at the entrance to the network with glasses.
.

, 255 0.3 0.75, . , .





Normalization layer in and out.
.

, , , , [-10;10] [-50; 50]. MATLAB, dlarray . .





.





Network structure.
.
Learning outcomes.
.

, , , , 90. , , , , , . , , , , .





, STN, . .





Learning outcomes.
.

, .





, , STN  .








All Articles