Adding rescaling rmsprop with nesterov momentum as the optimizer, instead of sgd with nesterov, has proved to be quite valuable. The feedforward model now trains to "good sample" level within about 45 minutes. The current code is here https://github.com/kastnerkyle/ift6266h15
However, the convolutional model takes 3 days! Something might be wrong...
Samples from the feedforward model:
Reconstructions from feedforward:
Samples from the convolutional model:
Reconstructions from the convolutional model: