The convent has gone no farther this week since massively overfitting, but I had a few interesting discussions with Roland about computationally efficient pooling which should be useful once I solve my current issues.
I also got the convolutional VAE working for MNIST. If I can get a good run for CIFAR10 it might also be useful to slap a one or two layer MLP on the hidden space representation to see if that gets above 80% for cats and dogs. If not it would also be fun to train on the dataset itself, folding in all the data and then finetune for prediction. This is a sidetrack from "the list" but could be fruitful.
Here are some samples from the ConvVAE on MNIST (also a link to the current code here https://gist.github.com/kastnerkyle/f3f67424adda343fef40/9b6bf8c66c112d0ca8eb87babb717930a7d42913 ).
Sunday, February 15, 2015
Monday, February 9, 2015
IFT6266 Week 4
I got the convolutional-deconvolutional VAE working as a standalone script now - training it on LFW to see the results. The code can be found here: https://gist.github.com/kastnerkyle/f3f67424adda343fef40
I have also completed coding a convnet in pure theano which heavily overfits the dogs and cats data. See convnet.py here: https://github.com/kastnerkyle/ift6266h15
Current training stats:
Epoch 272
Train Accuracy 0.993350
Valid Accuracy 0.501600
Loss 0.002335
The architecture is:
load in data as color and resize all to 48x48
1000 epochs, batch size 128
SGD with 0.01 learning rate, no momentum
layer 1 - 10 filters, 3x3 kernel, 2x2 max pool, relu
layer 2 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 3 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 4 - fully connected 3610x100, relu
layer 5 - softmax
The next step is quite obviously to add dropout. With this much overfitting I am hopeful that this architecture can get me above 80%. Other things to potentially add include ZCA preprocessing, maxout instead of relu, network-in-network, inception layers, and more. Also considering bumping the default image size to 64x64, random subcrops, image flipping, and other preprocessing tricks.
Once above 80%, I want to experiment with some of the "special sauce" from Dr. Ben Graham - fractional max pooling and spatially sparse convolution. His minibatch dropout also seems quite nice!
I have also completed coding a convnet in pure theano which heavily overfits the dogs and cats data. See convnet.py here: https://github.com/kastnerkyle/ift6266h15
Current training stats:
Epoch 272
Train Accuracy 0.993350
Valid Accuracy 0.501600
Loss 0.002335
The architecture is:
load in data as color and resize all to 48x48
1000 epochs, batch size 128
SGD with 0.01 learning rate, no momentum
layer 1 - 10 filters, 3x3 kernel, 2x2 max pool, relu
layer 2 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 3 - 10 filters, 3x3 kernel, 1x1 max pool, relu
layer 4 - fully connected 3610x100, relu
layer 5 - softmax
The next step is quite obviously to add dropout. With this much overfitting I am hopeful that this architecture can get me above 80%. Other things to potentially add include ZCA preprocessing, maxout instead of relu, network-in-network, inception layers, and more. Also considering bumping the default image size to 64x64, random subcrops, image flipping, and other preprocessing tricks.
Once above 80%, I want to experiment with some of the "special sauce" from Dr. Ben Graham - fractional max pooling and spatially sparse convolution. His minibatch dropout also seems quite nice!
Sunday, February 1, 2015
IFT6266 Week 3
Alec Radford shared some very interesting results on LFW using Convolutional VAE (https://t.co/mfoK8hcop5 and https://twitter.com/AlecRad/status/560200349441880065). I have been working to convert his code into something more generally useable, as his version (https://gist.github.com/Newmu/a56d5446416f5ad2bbac) depends on other local code from Indico.
This *probably* won't be the thing that gets above our 80% baseline, but it would be cool to get it working for another dataset. It may also be interesting for other projects since we know convolutional nets can work well for sound.
This *probably* won't be the thing that gets above our 80% baseline, but it would be cool to get it working for another dataset. It may also be interesting for other projects since we know convolutional nets can work well for sound.
Subscribe to:
Posts (Atom)