- Crashing and Burning with Autoencoders

Sometimes, you crash and burn. I wanted to take a look at autoencoders this week since that’s an area I haven’t worked with much. Things were going fine as I created a simple model with 32 encoded features. The results were crummy, sure, but it ran alright.

But I wasn’t careful about my memory management. For data, I was using the text images from my font classification project. I store the pixels as unsigned bytes, or np.uint8’s. Each image is 16 * 64 = 1024 pixel and so takes up that many bytes. There are about 500000 images total, so that’s more than 500,000,000 bytes, or about half a GB. No big deal. Ah, but then I divided by 255 to convert these to floats, no big deal, right? Wrong, when I did the conversion in NumPy, I didn’t specify a datatype, I got np.float64, which uses 8 bytes per element. So now my data is 4 GB. Still not too bad, until I made two copies of it, one flattened and one with the image dimensions. My machine nominally has 8 GB of memory, but between this indulgence, streaming, and various programs, everything froze.

Important lesson: be careful about your memory. Python will hide things from you, but that can make it that much easier to hurt yourself.

Computing Workflows, Data Science, and such

Dan Does Data

Crashing and Burning with Autoencoders