Image classification with PyTorch
Deep learning textbooks are full of professional, incomprehensible terminology. I try to keep it to a minimum and always give one example that can be easily extended as you get used to working with PyTorch. We use this example throughout the book to demonstrate how to debug a model (Chapter 7) or deploy it to production (Chapter 8).
From now on until the end of Chapter 4, we will compile the image classifier. Neural networks are commonly used as image classifiers; networks offer a picture and ask a simple question: "What is this?"
Let's start by creating our application in PyTorch.
Classification problem
Here we will create a simple classifier that can distinguish a fish from a cat. We will iterate the design and development process of our model to make it more accurate.
In fig. 2.1 and 2.2 depict a fish and a cat in all their glory. I'm not sure if the fish has a name, but the cat's name is Helvetica.
Let's start by discussing some of the standard classification problems.
Standard difficulties
How to write a program that can tell a fish from a cat? Perhaps you would write a set of rules describing whether a cat has a tail or that a fish has scales, and apply those rules to an image so that the program can classify the image. But this will take time, effort and skill. What if you come across a Manx cat? Although it is clearly a cat, it has no tail.
These rules become more and more complex when you try to describe all possible scenarios using them. Also, I must admit that visual programming is terrible for me, so the thought of having to manually write code for all these rules is terrifying.
You need a function that returns a cat or fish when you enter an image. It is difficult to construct such a function by simply listing all the criteria in full. But deep learning essentially forces the computer to do the hard work of creating all these rules that we just talked about, provided that we create the structure, provide the network with a lot of data, and let it know if it gave the right answer. This is what we are going to do. In addition, you will learn some basic techniques for using PyTorch.
But first the data
First, we need data. How much data? Depends on various factors. As you'll see in Chapter 4, the idea that any deep learning technique requires huge amounts of data to train a neural network is not necessarily true. However, we are now going to start from scratch, which usually requires access to a lot of data. Many images of fish and cats are required.
One could spend some time downloading a bunch of images from an image search on Google, but there is an easier way: the standard collection of images used to train neural networks is ImageNet. It contains over 14 million images and 20 thousand image categories. This is the standard by which all image classifiers compare. Therefore, I take images from there, although you can choose other options if you want.
Besides the data, PyTorch should have a way to define what a cat is and what a fish is. It's easy enough for us, but it's harder for a computer (that's why we create a program!). We use labeling attached to the data and this is called supervised learning. (If you don't have access to any of the labels, then you guessed it, unsupervised machine learning is used.)
If we use ImageNet data, their labels won't be useful because they contain too much information. Marking a tabby cat or trout for a computer is not the same as a cat or a fish.
It is required to relabel them. Since ImageNet is a vast collection of images, I have compiled the image and tagging URLs of fish and cats (https://oreil.ly/NbtEU).
You can run the download.py script in that directory and it will download the images from the urls and place them in the appropriate training locations. Re-labeling is simple; the script stores images of cats in the train / cat directory and images of fish in the train / fish directory. If you don't want to use a script to download, just create these directories and place the corresponding images in the right places. We now have data, but we need to convert it to a format that PyTorch can understand.
PyTorch and data loaders
Loading and transforming data into training-ready formats is often one area of data science that takes too long. PyTorch has developed established data interaction requirements that make it pretty straightforward whether you're working with images, text, or audio.
The two main conditions for working with data are datasets and data loaders. A dataset is a Python class that allows us to get the data that we send to the neural network.
A data loader is what transfers data from a dataset to the network. (This may include information such as: How many worker processes are uploading data to the network? How many images are we uploading at the same time?)
Let's take a look at the dataset first. Each dataset, whether it contains images, audio, text, 3D landscapes, stock market information, or anything else, can interact with PyTorch as long as it meets the requirements of this abstract Python class:
class Dataset(object):
def __getitem__(self, index):
raise NotImplementedError
def __len__(self):
raise NotImplementedError
It's pretty simple: we have to use a method that returns the size of our dataset (len) and one that can retrieve an element from the dataset in a pair (label, tensor). This is called by the data loader as it feeds the data to the neural network for training. So we have to write a body for the getitem method that can take an image, convert it to a tensor and put it back and mark it back so that PyTorch can work with it. It's all clear, but obviously this scenario is common enough, so maybe PyTorch will make the task easier?
Creating a training dataset
The torchvision package includes an ImageFolder class that does pretty much everything, assuming our images are in a structure where each directory is a label (for example, all cats are in a directory named cat). Here's what you need for the cat and fish example:
import torchvision
from torchvision import transforms
train_data_path = "./train/"
transforms = transforms.Compose([
transforms.Resize(64),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225] )
])
train_data = torchvision.datasets.ImageFolder
(root=train_data_path,transform=transforms)
Something else is added here because torchvision also lets you specify a list of transformations to be applied to the image before it enters the neural network. The default transformation is to take the image data and turn it into a tensor (the trans forms.ToTensor () method shown in the previous code), but it also does a few other things that might not be as obvious.
First, GPUs are built to perform fast, standard sized computations. But we probably have an assortment of images in many resolutions. To improve processing performance, we scale each input image to the same 64x64 resolution using the Resize transform (64). Then we convert the images to a tensor and finally normalize the tensor around a specific set of mean and standard deviation points.
Normalization is important because a large number of multiplications are expected to be performed as the input passes through the layers of the neural network; keeping the input values between 0 and 1 prevents large increases in values during the learning phase (known as the exploding gradient problem). This magic incarnation is just the mean and standard deviation of the ImageNet dataset as a whole. You can calculate it specifically for a subset of fish and cats, but these values are fairly reliable. (If you were working on a completely different dataset, this mean and variance would have to be computed, although many simply use ImageNet constants and report acceptable results.)
Composable transformations also make it easy to perform actions such as image rotation and image shift for data augmentation, which we will return to in Chapter 4.
In this example, we are resizing the images to 64x64. I made this random choice to speed up the computation on our first network. Most of the existing architectures, which you will see in Chapter 3, use 224x224 or 299x299 for their input images .. Generally, the larger the input file size, the more data the network can learn from. The flip side of the coin is that you can usually fit a smaller batch of images into GPU memory.
There is a lot of other information about datasets, and that's not all. But why should we know more than we need to if we already know about the training dataset?
Validation and reference datasets
Our training dataset is set up, but now we need to repeat the same steps with the validation dataset. What's the difference here? One of the pitfalls of deep learning (and in fact all machine learning) is overfitting: the model is really good at recognizing what it was trained on, but does not work on examples it hasn't seen. The model sees a picture of a cat, and if all the other pictures of cats are not very similar to this, the model decides that it is not a cat, although the opposite is obvious. To prevent the neural network from behaving like this, we load the control sample into download.py, that is, into a series of images of cats and fish that are not in the training dataset. At the end of each training cycle (also known as an epoch), we compare this set to make sure the network isn't wrong. Don't be alarmed, the code for this check is incredibly simple:this is the same code with several variable names changed:
val_data_path = "./val/"
val_data = torchvision.datasets.ImageFolder(root=val_data_path,
transform=transforms)
We just used the transforms chain instead of defining it again.
In addition to the validation dataset, we also need to create a validation dataset. It is used to test the model after completing all training:
test_data_path = "./test/"
test_data = torchvision.datasets.ImageFolder(root=test_data_path,
transform=transforms)
At first glance, the different types of sets can be complex and confusing, so I put together a table to indicate which part of the training uses each set (Table 2.1).
We can now create data loaders with a few more lines of Python code:
batch_size=64
train_data_loader = data.DataLoader(train_data, batch_size=batch_size)
val_data_loader = data.DataLoader(val_data, batch_size=batch_size)
test_data_loader = data.DataLoader(test_data, batch_size=batch_size)
New and noteworthy in this code is the batch_size command. She says how many images will go through the network before we train and update it. In theory, we could assign batch_size to a series of images in the test and training datasets so that the network sees each image before refreshing. In practice, this is usually not done because smaller packets (more commonly known in the literature as mini-packets) require less memory and there is no need to store all information about each image in the dataset, and a smaller packet size leads to faster learning since the network updates much faster. For PyTorch data loaders, the batch_size is set to 1 by default. You will most likely want to change it. Although I chose 64, you can experiment to understandhow many mini-packages can be used without running out of GPU memory. Experiment with some additional parameters: for example, you can specify how the datasets are fetched, whether the entire dataset is shuffled each time it is run, and how many workflows are involved to retrieve data from the dataset. All of this can be found inPyTorch documentation .
This is about passing data to PyTorch, so let's now imagine a simple neural network that will start classifying our images.
Finally, a neural network!
We'll start with the simplest deep learning network - an input layer that will work with the input tensors (our images); an output layer the size of the number of our output classes (2); and a hidden layer in between. In the first example, we will use fully linked layers. In fig. 2.3 shows an input layer of three nodes, a hidden
layer of three nodes and an output of two nodes.
In this example, each node in one layer affects a node in the next layer, and each connection has a weight that determines the strength of the signal from that node to the next layer. (These are the weights that will be updated when we train the network, usually from random initialization.) When the input passes through the network, we (or PyTorch) can simply matrix multiply the weights and biases of that layer by the input. Before passing them on to the next function, this result enters the activation function, which is simply a way of introducing nonlinearity into our system.
Activation functions
Activation function sounds tricky, but the most common activation function you can find now is ReLU, or rectified linear unit. Again clever! But this is just a function that implements max (0, x), so the result is 0 if the input is negative, or just the input (x) if x is positive. It's that simple!
Another activation function that you are most likely to come across is the multivariate logistic function (softmax), which is a little more complicated in a mathematical sense. Basically, it generates a set of values from 0 to 1, which adds up to 1 (probabilities!), And weights the values in such a way as to increase the difference, that is, it produces one result in a vector that will be larger than all the others. You will often see it used at the end of a classification network to make sure that the network will make some prediction about what class it thinks the input data is.
Now that we have all these building blocks, we can start building our first neural network.
Neural network creation
Building a neural network in PyTorch is similar to programming in Python. We inherit from a class called torch.nn.Network and populate the __init__ and forward methods:
class SimpleNet(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(12288, 84)
self.fc2 = nn.Linear(84, 50)
self.fc3 = nn.Linear(50,2)
def forward(self):
x = x.view(-1, 12288)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x))
return x
simplenet = SimpleNet()
Again, this is not difficult. We make the necessary settings in init (), in this case we call the superclass constructor and three fully connected layers (called Linear in PyTorch, they are called Dense in Keras). The forward () method describes how data is transmitted over the network, both in training and in prediction (inference). First, we have to transform the 3D tensor (x and y plus 3-channel color information - red, green, blue) in the image - attention! - into a one-dimensional tensor so that it can be passed to the first Linear layer, and we do this using view (). So we apply the layers and activation functions in order, returning softmax output to get the prediction for this image.
The numbers in the hidden layers are arbitrary, except for the output of the last layer, which is 2, which matches our two classes - cat or fish. Requires the data in the layers to shrink as it shrinks in the stack. If the layer goes, say, from 50 inputs to 100 outputs, then the network can learn by simply passing 50 links to fifty out of a hundred outputs, and consider its work completed. By reducing the size of the output in relation to the input, we are forcing this part of the network to learn the representativeness of the original input with fewer resources, which presumably means that the network defines some distinguishing features of the images: for example, it learned to recognize a fin or a tail.
We have a forecast and can compare it with the actual labeling of the original image to see if it was correct. But it needs some way to allow PyTorch to quantify not only the correctness or incorrectness of a prediction, but also how correct or incorrect it is. The loss function does this.
ABOUT THE AUTHOR
Ian Poynter (Ian Pointer) - Engineer data science, specializing in solutions for machine learning (including in-depth teaching methods) for a number of clients in the Fortune 100. At present, Yang works in Lucidworks, which is engaged in the development of advanced applications and NLP.
»More details about the book can be found on the website of the publishing house
» Table of Contents
» Excerpt
For Habitants a 25% discount on coupon - PyTorch
Upon payment for the paper version of the book, an e-book is sent by e-mail.