Peter Scott Miller

This project was inspired by a smaller Sketcher application developed by Zaid Alyafeai.

There are two models present on this page. Both were trained on the full 345 class dataset from Google.

The link to an earlier, simpler* Colaboratory Notebook prototype with only 160 classes is available here: Colab

*This smaller prototype was trained using Google's Tensor Processing Unit (TPU)

The architecture of training is defined by the following Convolutional Neural Network structure:

The specific architecture was arrived at through 50 rounds of optimization, performed by scikit-optimize, selecting from the following parameters:

The optimization was run on a subset of the full dataset, specifically only 60,000 training and 5,000 validation sets.

Here is the convergence plot of the optimization:

The full dataset contained 1,656,000 sets, with an 80/20 train/validation ratio.

Below are two charts, showing the accuracy and loss, respectively, over the training epochs:

Minor overfitting is present after roughly the 18th epoch.

The 28x28 pixel model was the first one trained. When reviewing the performance, it was found that the model had difficulty differentiating between visually similar classes.

For example, the model had a difficult time successfully differentiating between the following class groups: [circle, stop sign, octagon, hexagon] or [house, barn] or [butterfly, bowtie, ant].

The raw dataset contained 256x256 images, and so my theory was that performing a less significant downsampling would result in a higher accuracy within (at least) these similar groups.

However, increasing the training dataset size by a factor of four (784 pixels to 3,136) led to a number of challenges based on hardware limitations used in training.

The training setup was via Google Colab, chosen for price, accessibility, and ease of CUDA integration. However, I was limited to the hardware Google provided, which included only 15GB of GPU RAM.

This was a sufficient amount of RAM to load the entire 1.6 Million training samples when they were only 784 pixels each, but unfortunately required that I split the dataset up into 3 batches once they were 3,136 pixels each.

The training itself was run via the following training scheme:

Initial training & basic setup.

Differences between the 28x28 and 52x52 models: