I seem to keep going back and forth on whether to have one giant model computing both steering and gas, or separate models. This week was one huge model. In fact, it was even more huge than before, as I downloaded another 2x DeepDrive data. This also meant going through the data again to remove bad sections. The data itself isn’t corrupt, but there are scenes where the car drives off a cliff…
Even with more data and a fair bit of training, though, I was running into overfitting problems. The regularization from last week helped, but it wasn’t enough. Isaac Gerg was kind enough to suggest adding Dropout layers, and off stream that did help. But the model is still pretty crummy. It may just need more data given how generally noisy it is.
I may also build a “backup” model. Where, instead of trying to learn correct driving behavior, the car goes forward until it hits something, then backs up and turns a random direction. Call it the “Roomba Initiative”.
Over the weekend, I made the crucial error of messing with my data. I had grown tired of having to preprocess it every time I wanted to model something. And it was irritating that it was split across several files when the raw data size I really wanted could fit in memory. So I compressed it.
Really, I shrank the data by preprocessing the images down to 64x64 pixels (saving a factor of about 16) and saving only the speed/target information needed, not doubling up. While I initially had some scaling errors (note to self: only add 1 once :p), it came out great and the model is now much easier to use. The resulting files are up on Google Drive, take a look.
Speaking of the model, this week I wanted to tackle the steering model a bit. Deciding that the previous neural network design was arbitrary, I arbitrarily removed some layers, added regularization, and changed the optimizer. Fiddling with so many things at once was a mistake. The model started suffering from insanely bad losses, I have to assume because it was predicting wildly off-base steering results. We’re talking total mean-squared-error in the millions. Comically bad.
Watching an animation of the model, I saw that one section of frames was particularly bad. Bad in that it was an image of the car stuck on another car, trying to turn the wrong way to get out. I think the in-game AI had trouble in that scene, leading my model to learn garbage. So I ripped out 250 frames of inputs from the data set.
That wasn’t enough though. It would take dialing down the regularization and moving back to the Adam optimizer to start seeing a “reasonable” steering model. I didn’t have time to fully train on stream, but I suspect it will climb to something ok.
Frustrated with the steering, I turned back to the gas/brake/reverse model. After a few false starts, I got this running with minimal regularization and was surprised to see that just a few epochs of training produced a halfway decent model. By this, I mean the training accuracy was 80% and the validation accuracy was 60%. Not great by any stretch, but this was a big improvement over the “always predict gas” model from before. With a little more training, this model might be there.
The timeline is coming due, however, for a model to see live action. Next week we’ll have to tackle either training these models simultaneously in a single network, or training 2 models. The single has the advantage of being less computation overall, but it might not be as accurate for individual features. Oh well.
Starting from the code sent to me last week about animations, I was able to display all the predictions my steering model was making. With just a little bit of training (10 epochs on 10000 data points), it actually performed fairly well. And it looks cool to boot.
Building an acceleration model wasn’t too bad; it’s almost identical in structure to the steering model, except for a final layer difference. Since we can only control the acceleration as much as gas,brake, and reverse, I decided to switch to a categorical model. In truth, I should account for the ordering of these; that is, I care that brake is closer to gas than reverse. If the model’s going to be wrong, I’d prefer that it’s only a little wrong. In classical statistics, you can handle this with an Ordered Probit Model. In layman’s terms, you can think of this as doing linear regression, plus estimating the thresholds between classes. It’s statistically responsible.
Acknowledging that, I moved forward with a standard logistic regression on the bottom of my neural network. This is partially because I chose the thresholds between classes myself, and arbitrarily at that. But, the resulting model was almost always predicting “drive forward”, likely because that’s the overwhelmingly common (about 90% of all times) state. That’s the next obstacle.
Off-stream, I was able to resolve the “all zeros” problems in my predictions. I suspected previously, the activation functions were giving me trouble.
Switching to something simpler, I wanted to study to steering model. The easiest way to implement this was to just cut out the acceleration part of my existing model. Nothing like commenting out lines to improve a model, eh?
One hangup I had was that the output started to be very restricted again, often zero. I believe this was due to the final “sigmoid” activation function. This function correctly limits outputs to be between 0 and 1, but there’s a price. It’s statistically responsible when you’re computing the probability of something. But I’m really computing a floating-point value that happens to be between 0 and 1 (or -1 and 1 if you don’t scale). Ideally, I’d like the distribution of output values to look something like the distribution of actual steering values. To that end, I changed the final activation to “linear”. The downside of this is that the model can now suggest steering values below 0 and above 1. Practically speaking, this isn’t an issue; I can just clamp the values output by the model. But during training, it’s going to get a little confused. With a custom activation function, I could probably handle this as well, but I’m not overly concerned.
During the stream, I also explored how to visualize the steering. An asture viewer, robertsdionne, sent me some code. I had no idea that matplotlib had a built-in Animation function. I’ll definitely give that a try.
While I was figuring out my own visualization routine, I allowed the model to train for a bit. Ten minutes of working out while I organized my thoughts is probably the most training its been allowed to have. The results were surprisingly decent. Plots of actual and predicted steering reasonably matched up. And it didn’t wobble too much when it really wanted to drive straight. This gives me hope that with proper train, this model can actually work!
While my transition from ArchLinux to Ubuntu went smoothly, I can’t say the same for the self-driving car model. I discovered that it was always predicting the same values no matter the inputs! In this video, I suspected that the problem was unscaled inputs, but fixing that didn’t resolve the problem.
Now I’m thinking that it may be the activations. I was using Rectified Linear for the convolutional layers. It could be that negative values are driving all my gradients to zero. In the DeepDrive model, Craig uses ReLU as well, but the CommaAI model uses “ELU” or exponential linear unit. Basically ELU is like ReLU but “leaks” a little when the input is below zero, and it does so in a nonlinear fashion. This might solve my gradient problem.