It’s time for another LineByLine, now with a recurrent neural net. Keras includes a curious example of an RNN for MNIST, so I wanted to take a look at that. The premise is that we should be ableto classify digits by doing the same computation on pixels one at a time and finally making our prediction at the end. It’s not so different from a convolutional neural net, except it’s exactly one pixel at a time, and we only get the final output to make our prediction. Well, and the recurrence gives some history between successive pixels.

What I learned by working through this is another key was initializing the recurrent weights to the identity matrix. In this way, a given node really only passes on its activation to itself in the next time step. The weights won’t necessarily stay zero, of course, but this seems like a reasonable starting point, and the research bears this out.

One thing to be aware of, however, is that this is a terrible way to solve MNIST. It’s slow as heck (minimal parallelization with the RNNs), and the accuracy is crummy. But, as an instructive tool, it’s highly valuable. I see people bash MNIST, and while there are good reasons for disliking it, it’s educational value is huge.