In today’s post, I am going to show you how to create a Convolutional Neural Network (CNN) to classify images from the dataset CIFAR-10. This tutorial is the backbone to the next one, Image Classification with Keras and SageMaker. This post mainly shows you how to prepare your custom dataset to be acceptable by Keras.
To proceed you will a GPU version of Tensorflow, you can find instruction on how to install it here.
The following post covers the following
- Download and process your dataset
- Bring your data to a format acceptable by the CNN
- Build the model
- Train and save the model
Download and process your dataset
The CIFAR-10 dataset consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
You can download the dataset from here. Extract the data to a folder and in the same folder create a script to open your dataset.
To open your dataset run the following:
Bring your data to a format acceptable by the CNN
Now we have to preprocess the data and transform them into a specific form for Keras.
As you can see we normalize the data to 0,1 and we one-hot encoding the labels. That means if we have 10 classes a feature has as a label a vector with the length of 10. If for example, the first sample belongs to the third class, the one-hot vector will be [0, 0, 1, 0, 0, 0, 0, 0, 0, 0,], same for the 7th class [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]. In general, we assign one to the index of the one-hot vector and the rest we leave it as 0s.
Build the model
Now it is time to create the model. I’ve used an already existing model which I found it here by .
Train and save the model
Last but not least, we have to augment the pictures with the help of a Keras native library ImageDataGenerator. Essentially, you manipulate the pictures in order to cover more ground. It rotates, flipping and shifting the pictures and generates “distorted” images from the initial dataset. You can find more information about this technique here. Finally, we have to compile the model and train the algorithm.
We will train the model for 10 epochs.
What we’ve done so far:
- We downloaded the CIFAR-10 dataset
- We preprocessed the dataset
- Created a model and compiled it
- Trained the algorithm and saved the model
You can tweak the parameters to get better accuracy. I ran it for 100 epochs and I got almost 78% accuracy. It can surely go much further since it was still undertrained! Let me know what your model scored in the comment section below or tweet it to me at @siaterliskonsta! Also, let me know if you need a tutorial on how to install the GPU version of Tensorflow.
In the next tutorial, we will see how we can deploy that code to Amazon’s SageMaker, how to train it and how we can retrieve the model in order to do predictions.
That’s it for today! Any questions you may have you can send them to my Twitter account. You can find me on Twitter @siaterliskonsta, I would love to hear them all and I will do my best to answer them! The following SageMaker post will be way simpler! We will use the AWS’s algorithms! Till next time, take care and bye bye!