Aditya Rastogi
2 min readApr 24, 2020

--

Thanks for asking, it’s a good question.

From the paper,

Once the training is done, the authors throw away the projection head g(.) (which is an MLP with one hidden layer) and use encoder f(·) and representation h for downstream tasks.

The authors conjecture that the importance of using the representation before the nonlinear projection is due to the loss of information induced by the contrastive loss.

Below is the complete neural network architecture I used.

After the training was completed, I threw away ‘fc3’ and ‘added_relu2’ layers from my architecture, hence these two defined my projection head. Note that because the projection head contains a relu layer, it’s still a non-linear transformation, but it doesn’t have one hidden layer as the authors have in the paper.

The authors observe that a nonlinear projection is better than a linear projection (+3%), and much better than no projection (>10%).

Therefore, if I throw away ‘fc2’ as well, I can, in principle, achieve better accuracy. So, I tried it and removed ‘fc2’ as well. And I indeed got an improved accuracy of 65.2% on the test set. Here is the t-SNE visualization of the outputs of the ‘added_relu1’ layer (100 dimensional) on the test and train (10%) datasets.

t-SNE visualization of the outputs of the added_relu1 layer on the test and 10% train datasets

And here are the accuracy and loss graphs vs the number of epochs while training the linear classifier on top of the frozen 100-dimensional outputs of the ‘added_relu1’ layer.

Plots of accuracy and loss vs. the number of epochs obtained while training the linear classifier on top of the frozen 100-dimensional outputs of the ‘added_relu1’ layer.

I wondered if we can get further improvements in accuracy by throwing away more layers. But after removing ‘added_relu1’ as well, the accuracy dropped down to 63.2% and after removing ‘fc1’ too, it drastically dropped down to 53.2%.

Thank you for your time in reading this update.

--

--

Aditya Rastogi
Aditya Rastogi

Written by Aditya Rastogi

Interested in learning about computations that make perception, reasoning and action possible.

Responses (2)