In [3]: import numpy as np
In [4]: a = np.array([1., 2., 3., 4., 5.])
In [5]: b = np.array([[1.], [2.], [3.], [4.], [5.]])
In [6]: a
Out[6]: array([ 1., 2., 3., 4., 5.])
Out[6]: array([ 1., 2., 3., 4., 5.])
In [7]: b
Out[7]:
array([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[ 5.]])
In [8]: (a == b).mean()
Out[8]: 0.20000000000000001
The correct way to print these is (a.flatten() == b.flatten()).astype('float32').mean()
For a task with 2 classes, this will always give ~50% accuracy but the number will change slightly due to small changes in the prediction!
It is an obvious bug in hindsight, but it took a long time to find. Now after 100 epochs, the results are as follows:
Epoch 99
Train Accuracy 0.982200
Valid Accuracy 0.751000
Loss 0.021493
Out[7]:
array([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[ 5.]])
In [8]: (a == b).mean()
Out[8]: 0.20000000000000001
The correct way to print these is (a.flatten() == b.flatten()).astype('float32').mean()
For a task with 2 classes, this will always give ~50% accuracy but the number will change slightly due to small changes in the prediction!
It is an obvious bug in hindsight, but it took a long time to find. Now after 100 epochs, the results are as follows:
Epoch 99
Train Accuracy 0.982200
Valid Accuracy 0.751000
Loss 0.021493
The general network architecture is:
Resize all images to 64x64, then crop the center 48x48
No flipping, random subcrops, or anything
32 kernels, 4x4 with 2x2 pooling and 2x2 strides
64 kernels, 2x2 with 2x2 pooling
128 kernels, 2x2 with 2x2 pooling
512x128 fully-connected
128x64 fully connected
64x2 fully connected
All initialized uniformly [-0.1, 0.1]
ReLU activations at every layer
Softmax cost
I plan to try:
Adding random horizontal flipping
ZCA with 0.1 bias
dropout or batch normalization
PReLU
dark knowledge / knowledge distillation
No comments:
Post a Comment