• This project was inspired by a smaller Sketcher application developed by Zaid Alyafeai.
  • There are two models present on this page. Both were trained on the full 345 class dataset from Google.
  • The link to an earlier, simpler* Colaboratory Notebook prototype with only 160 classes is available here: Colab

    *This smaller prototype was trained using Google's Tensor Processing Unit (TPU)

  • The architecture of training is defined by the following Convolutional Neural Network structure:

    CNN_Model.PNG

  • The specific architecture was arrived at through 50 rounds of optimization, performed by scikit-optimize, selecting from the following parameters:
  • Optimize_Parameter.PNG

  • The optimization was run on a subset of the full dataset, specifically only 60,000 training and 5,000 validation sets.
  • Here is the convergence plot of the optimization:
  • Convergence_plot.PNG

  • The full dataset contained 1,656,000 sets, with an 80/20 train/validation ratio.
  • training.PNG

  • Below are two charts, showing the accuracy and loss, respectively, over the training epochs:
  • Accuracy_over_epoch.PNG Loss_over_epoch.PNG

  • Minor overfitting is present after roughly the 18th epoch.
  • The 28x28 pixel model was the first one trained. When reviewing the performance, it was found that the model had difficulty differentiating between visually similar classes.
  • For example, the model had a difficult time successfully differentiating between the following class groups: [circle, stop sign, octagon, hexagon] or [house, barn] or [butterfly, bowtie, ant].
  • The raw dataset contained 256x256 images, and so my theory was that performing a less significant downsampling would result in a higher accuracy within (at least) these similar groups.
  • However, increasing the training dataset size by a factor of four (784 pixels to 3,136) led to a number of challenges based on hardware limitations used in training.
  • The training setup was via Google Colab, chosen for price, accessibility, and ease of CUDA integration. However, I was limited to the hardware Google provided, which included only 15GB of GPU RAM.
  • This was a sufficient amount of RAM to load the entire 1.6 Million training samples when they were only 784 pixels each, but unfortunately required that I split the dataset up into 3 batches once they were 3,136 pixels each.
  • The training itself was run via the following training scheme:
  • training.PNG