Wednesday, March 25, 2015

IFT6266 Week 8

After looking at batch normalization, I really think that the gamma and beta terms are correcting for the bias in the minibatch estimates of mean and variance, but have not confirmed. I am also toying with ideas along the same lines as Julian, except using reinforcement learning to choose the optimal minibatch that gives the largest expected reduction in training or validation error rather than controlling hyperparameters as he is doing. One possible option would be something like CACLA for real valued things, and LSPI for "switches".

Batch normalization (and nesterov momentum) seem to help. After only 11 epochs, an ~50% smaller network is able to reach equivalent validation performance.

Epoch 11
Train Accuracy  0.874000
Valid Accuracy  0.802000
Loss 0.364031

The code for the batch normalization layer is here:
https://github.com/kastnerkyle/ift6266h15/blob/master/normalized_convnet.py#L46

With the same sized network as before, things stay pretty consistently around 80% but begin to massively overfit. The best validation scores, with .95 nesterov momentum are:

Epoch 10
Train Accuracy  0.875050
Valid Accuracy  0.813800
Loss 0.351992

Epoch 36
Train Accuracy  0.967650
Valid Accuracy  0.815800

Epoch 96
Train Accuracy  0.992100
Valid Accuracy  0.822000

I next plan to try batch normalization on fully connected and convolutional VAE. First trying with MNIST, then LFW, then probably cats and dogs. It would be nice to try to either a) mess with batch normalization and do it properly *or* simplify the equations somehow b) do some reinforcement learning like Julian is doing, but on the minibatch selection process. However, time is short!

No comments:

Post a Comment