Further investigation into the data set of “For well thou know’st” revealed that using just the overall width and height of space taken up by the font accounted for most of the predictive power.

This is really an artifact of how I created the data, so I don’t think it’s fair to model that way. To solve this, I remade the data set using a variable number of characters for each font. Instead of always using 16 characters, now I use as many as necessary to fill up 64 pixels wide (which is half the size of the previous data), up to 32 characters. Because I had been careful about creating and reading in my data in the first place, this didn’t take long at all.