Lessons from training deep models, part I

or: Lessons from Deep Reinforcement Learning (CS 295) Mini Project

I wrote these things at the height of our cramming for our CS 295 Mini-project last summer (March-April 2019). We were attempting to improve an image captioning model by training an agent to minimize the difference between the contexts (as provided by BERT, if I recall correctly) between the predicted caption and the ground truth caption.

This has lived in my notes since. Untouched. Even though I originally planned to rewrite to make the thoughts more complete. Reading it now, more than a year later, I can’t remember most of why I wrote these things, but I am laughing now at all my stress and frustrations back then. Hahaha.

May 9, 2019

DO NOT IGNORE UserWarnings!!! When it says it’s deprecated, IT’S DEPRECATED. Adjust your fucking code.
Google is a PASSIVE treasure trove. (It’s actually hard to get answers from people over at StackOverflow or elsewhere. Don’t take it for granted., or, idk.)
Do not underestimate the equations given in research papers. I ignored one equation because I thought it was unnecessary for the overall equation. I was wrong and wasted a few hours trying to make the network learn without that stupid softmax. (In hindsight, it’s pretty obvious and im just a dumb shit)
Good coding practices should be a mandatory course for ALL programmers. Being a researcher who never had a software engineering job should not be an excuse! Seeing source codes with 2-3 letter variable names corresponding to acronyms (or worse, equation indeces) are panic-inducing and NOT GOOD FOR THE HEART.
You’re not as smart as you are. DOUBLE (even triple) check every detail of your implementation. It’s easy to feel great everytime something works fine,
Just before full-on panicking when you get “stuck,” STAND UP AND WALK AROUND. Do something else. Make some tea. Listen to music, take a bath. Go for a run. Taking a moment to clear your head is VASTLY underestimated. Doing so removes you from the specific paradigm you were stuck in, removes you from your thought loops that inhibit you from reaching a solution. Doing so automatically resets your perspective, and it’s either you’ll suddenly realize what you’re missing or that when you get back to the problem, you’re on a fresh start.
Deep learning is not crammable. I repeat. DEEP LEARNING IS NOT CRAMMABLE! Trying to cram it induces anxiety and you’ll stress over the training metrics like an ultra-protective parent smothering its baby and not allowing it to grow to its potential. It’s easy to panic when you see it not behaving the way you want it to, but sometimes (or most of the time?) the best thing to do is leave it alone for a few epochs. Do not give in to the temptation to reset the network with tweaked hyperparameters or something. If you do this frequently, you’ll end up with so many restarts and no trained model at all. You risk losing the insight gained from a failed model, or worse, you just thought it wasn’t learning properly, but it actually was. And now you have to rerun it again after wasting precious training time! Guilty.

A short addition, some weeks later

For the midyear semester, I worked on a new mini-project to complete CS 298. And I know I wrote another note called “new lessons”, so I wanted to call this Part II. But it turns out I only wrote one bullet point, so…

August 11, 2019

WRITE A DRAFT OF THE PAPER BEFORE FINALIZING YOUR EXPERIMENTS

I could see the value in this, but I was too engrossed with the experiments that I ignored it. And after I finished the experiments, I still feel a bit confused with everything that I was doing. What’s this experiment for? What do I actually want to show? Why did I do this? Yes, I should have listened to my friend when he was raving about the benefits of writing some of the paper first… I learned (am learning) the hard way. I just started writing my paper, and doing so forced me to make my intentions clear, and let me iron out the variables for the experiments… (Yes, I’m redoing some of them…) Yes, it’s alright to run a few experiments just to test the waters, but before going all-in, it’s definitely better to write the Methodology section and map out a template for the Results and Discussion.