After a break last week, I stormed back into this font modeling problem. The jittered convolutional model gave reasonable results but required a huge model. To get around this, I reframed the problem as a time-series. Rather than one large 16 x 128 array of pixels, split an image up into 8 chunks, each one 16 rows by 8 columns. Now compute some features using that chunk’s pixels, and some features extracted from the previous chunks. We slowly build up confidence about the font by looking at 1-ish character at a time. Sounds almost human.

Since I just barely got the model to work in 5 minutes of overtime this video, I didn’t have a chance to train for more than a handful of epochs. But it seemed to be doing a bangup job even given that paltry amount of training time.

Next week, I’ll refine this model. In the end, we’ll have a decent font recognizer that can work on a small slice of text.