Computing Workflows, Data Science, and such


TensorFlow for Python3

If you recall my initial video, I was irritated that TensorFlow had been released only for Python2. Well, after much hard work, Google ported TF to Python3! This week’s stream is me reliving the joys of installing TensorFlow, but now for Python3. Watch me make a fool of myself struggling with Docker, pip, and GPUs.

As noted in the comments on the video, I did eventually succeed with Docker and giving myself 5 minutes outside of streaming. That’s just one of the dangers of being live, that I can’t really take a break to read. It wouldn’t be very entertaining or informative to you guys. At least if I’m failing, that’s hopefully amusing and gives you some confidence :p

Mahalanobis Distance in Tensor Flow Part 3

Finally, after numerous screwups, I completed my full TensorFlow implementation of the Mahalanobis distance. This video also reveals the first instance of my portable markerboard; hopefully my handwriting isn’t too much of a scrawl.

The function I implemented takes in a given data set and spits out the mahalanobis distance of every point to the mean. This is an interesting proof-of-concept as this kind of operation occurs in k-means clustering (with whatever distance metric). There, however, you would probably use the global covariance matrix, but the mean for the given cluster. I might take a stab at his next week

The implementation makes full use of matrix computations, avoiding clunky for loops whenever possible. If you know a better way to extract the diagonal of a matrix in TensorFlow than this tf.pack method, I’d love to hear it. A series of matrix multiplies and adds should bev doable, but it might not be any better. Still, it’d be good to see an alternate design.

Mahalanobis Distance in Tensor Flow Part 2

This week, I improved my implementation of Mahalanobis distance a bit. Where previously I was still using Numpy to compute the inverse of the covariance matrix, I thought it would be fun to do that in TensorFlow itself. Conveniently, TF has a function for inverting a matrix.

What TF lacks, or I didn’t find, is a function to compute the covariance between variables on a data set. So I also sought to implement that. After quickly whipping up a version in NumPy, I started translating to TF. You’ll have to watch to see how far I got.

Next week I’ll continue looking at the Mahalanobis distance implementation, as well as given some general thoughts on TensorFlow. While I’m still new to the library, I can now say that I’ve worked through a few tutorials and a small project of my own. I was initially kind of turned off by the style of TF, but I think I’m warming up to it.

Starting next year, I will stream similar sessions for Theano, another “deep learning” framework for Python. This one doesn’t have the backing of Google, but it is open source and has a strong community. While I’ve fiddled with it once or twice, I’ll be starting from scratch with the installation, so everyone can see me screw up right away.

Mahalanobis Distance in Tensor Flow

Last night I decided to stray from tutorials and implement mahalanobis distance in TensorFlow. This metric is like standard Euclidean distance, except you account for known correlations among variables in your data set. My friend over at Math Misery is really into it, so I thought, why not?

tl;dr I did manage to program Mahalanobis Distance (albeit using numpy to invert the covariance matrix) and get the same result as the scipy and pure numpy versions. I think next week I’ll try to do it more purely in TensorFlow and work on optimizing it a bit.

Basic TensorFlow Usage

After the fumbling around last time, I found a more basic introduction to TensorFlow. This time, instead of embarking blindly on a machine learning task, I simply got more comfortable with the library, learning a few things along the way.

One of the cool things I learned worth repeating here is that you can substitute in your own values for arbitrary variables inside a TensorFlow graph. Normally, you use tf.placeholder type variables for your inputs, but you can supply (via feed_dict) input to change tf.Varable and tf.constant values as well. This, combined with interactive sessions allows for better debugging, sanity checking, and simple model exploration when building up a model. I still have plenty of training to go to beat into my head that most TF lines define part of the graph, then sess.run actually executes it. But it’s starting to make sense.

Page 14 of 16