Sunday, February 15, 2015

IFT6266 Week 5

The convent has gone no farther this week since massively overfitting, but I had a few interesting discussions with Roland about computationally efficient pooling which should be useful once I solve my current issues.

I also got the convolutional VAE working for MNIST. If I can get a good run for CIFAR10 it might also be useful to slap a one or two layer MLP on the hidden space representation to see if that gets above 80% for cats and dogs. If not it would also be fun to train on the dataset itself, folding in all the data and then finetune for prediction. This is a sidetrack from "the list" but could be fruitful.

Here are some samples from the ConvVAE on MNIST (also a link to the current code here https://gist.github.com/kastnerkyle/f3f67424adda343fef40/9b6bf8c66c112d0ca8eb87babb717930a7d42913 ).


Monday, February 9, 2015

IFT6266 Week 4

I got the convolutional-deconvolutional VAE working as a standalone script now - training it on LFW to see the results. The code can be found here: https://gist.github.com/kastnerkyle/f3f67424adda343fef40

I have also completed coding a convnet in pure theano which heavily overfits the dogs and cats data. See convnet.py here: https://github.com/kastnerkyle/ift6266h15

Current training stats:
Epoch 272
Train Accuracy  0.993350
Valid Accuracy  0.501600
Loss 0.002335

The architecture is:
load in data as color and resize all to 48x48
1000 epochs, batch size 128
SGD with 0.01 learning rate, no momentum

layer 1 - 10 filters, 3x3 kernel, 2x2 max pool, relu
layer 2 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 3 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 4 - fully connected 3610x100,  relu
layer 5 - softmax

The next step is quite obviously to add dropout. With this much overfitting I am hopeful that this architecture can get me above 80%. Other things to potentially add include ZCA preprocessing, maxout instead of relu, network-in-network, inception layers, and more. Also considering bumping the default image size to 64x64, random subcrops, image flipping, and other preprocessing tricks.

Once above 80%, I want to experiment with some of the "special sauce" from Dr. Ben Graham - fractional max pooling and spatially sparse convolution. His minibatch dropout also seems quite nice!

Sunday, February 1, 2015

IFT6266 Week 3

Alec Radford shared some very interesting results on LFW using Convolutional VAE (https://t.co/mfoK8hcop5 and https://twitter.com/AlecRad/status/560200349441880065). I have been working to convert his code into something more generally useable, as his version (https://gist.github.com/Newmu/a56d5446416f5ad2bbac) depends on other local code from Indico.

This *probably* won't be the thing that gets above our 80% baseline, but it would be cool to get it working for another dataset. It may also be interesting for other projects since we know convolutional nets can work well for sound.