Computing Workflows, Data Science, and such


Book Review: An Introduction to Mathematical Modeling

An Introduction to Mathematical Modeling by Edward A. Bender

Let me preface this review by saying that I’m biased; this is one of my favorite books. Not just favorite math text, but overall favorite. This is one of the few books that I’ve read multiple times. It’s also from 1978, but I’ve seen used copies for sale on many websites. And finally, it clocks at a binary satisfying 256 pages.

This book is a description (with worked examples and discussions!) of various mathematical modeling methods. I can’t stress enough that this is about truly designing intelligent models for problems; you won’t find the term “machine learning” anywhere in the text. After a brief intro to “what is modeling”, it dives right into various techniques with real worked problems. Early sections build a few basic tools, but most chapters are self-contained. It’s easy to jump into a chapter or specific problem and come out with a full understanding. In terms of difficulty, most of the math is actually straightforward. Any technical undergrad or sharp teen can understand. And if you don’t follow, an appendix on probability fills most gaps. By the end, you’ll have explored a swath of applied models.

I want to say again that this text is not about data analysis. Models are fit and backed up with data as appropriate, but the emphasis is on understanding how to construct intelligent models. Your latest neural net might reach the same conclusions, but it will require mountains of data, and you still won’t know why the model is what it is. This really reflects the era of the book. In 1978, computing power was limited, so you invested proportionally more effort into building a great model to minimize required processing. A skill that’s still useful today.

Without giving away spoilers, I want to highlight a few of the worked examples. “The Nuclear Missile Arms Race” in Graphical Methods is one of my favorites. Without needing any numerical data, Bender derives a qualitative effect of missile technology and disarmament. His simple figures make crystal clear his argument, much better than many modern distracting graphics. Also relevant these days is “Are Fair Elections Possible?” in Potpurri. While this is one of the more theoretical math heavy sections, I still found it very accessible. It features a logical proof about trying to avoid a dictator in an election with more than two candidates. And it even gives a hypothetical real world example of deciding between contract choices. This is another situation where we can learn something even in absence of data. Finally, close to my heart is “Dynamics of Car Following” in Local Stability Theory. This is a “typical” problem in the book, with a three part description. After laying out the basic problem, it shows a short table of data that we’ll use to build or verify our model. Then it derives a reasonable model from first principles in (physics, biology, economics, etc). Finally, it applies the data, discusses the results, and acknowledges the limitations. No pretentions, just analysis. If you’re interested in self-driving cars, you also can check out my model (with fun graphics!) on github. This book contains dozens of worked problems; you’ll probably identify with several of them.

While this is a wonderful insightful book, it doesn’t cover everything under the sun. I mentioned it’s from 1978; modeling has advanced since then. Cheap computing opens up new possibilities, and traps. One great thing about easy computing is exploratory data analysis. If you can visualize the data the right way, the model may become obvious. Also, Bender admits that no single model is deeply developed. He even mentions that he had to cut an “in-depth” chapter, I think it’s still worthwhile. My final tally, 97/100 (5 point scales are an archaic blight on the world). A wide variety of math and science topics, all accessible and modular make this a must read!

TensorFlow MNIST and Graph Stream

I streamed some more poking around with Tensor Flow. First, is the ‘advanced’ tutorial on the MNIST digit recognition data. It still holds your hand, which is nice when learning new software. I streamed the ‘basic’ MNIST tutorial as well, but YouTube is still processing that one a week after the fact. I don’t know if it will ever finish.

The next tutorial was a less guided version of the same MNIST data, but I think doing the previous tutorials just confused me about this one. For a Tensor Flow expert, this tutorial was unnecessary, and for the beginning, it was a little uncoordinated. And I made plenty of foolish mistakes on my own. At the end, switched to trying to use Tensor Board to view an actual graph of the neural net, but it failed to produce anything. I like the idea of providing a slick web interface to poking around in a graph, but maybe it needs some fine-tuning or more tutorials. And it would have been nice if there were an obvious way to just get the “dot” files out so I could make my own graph. Again, maybe I didn’t devote enough time to this, but I would have liked it to be simple.

Streaming Schedule

tl;dr Watch me stumble through machine learning software tutorials on Tuesdays at 6 PM Eastern.

After testing the streaming capabilities and getting some good feedback, I’m settling on Tuesday nights 6-7 PM Eastern US time for regular broadcasting. Archive videos will be posted to YouTube if you can’t make the stream. And even still, you can post in the chat, and I’ll see it later.

When time and opportunity allows, I may stream at other times. I’ll try to post here in that case.

TensorFlow Installation Stream

I learned about Google’s new ML library, Tensor Flow. I’ve been live streaming going through the tutorials. See me bumble my way through installation and a simple multinomial regression:

More videos of me going through their tutorials will become available as I stream and YouTube processes them.

Pulling Yourself Down by the Bootstraps

“Bootstrapping” is an attempt to get more out of less data. In a serious statistical model, you split your data into training and validation sets. Essentially, you build a model with the training set, then you “check your work” with the validation set. If your model performs equally well on the validation set as the training set, then you have some confidence that the model didn’t “overtrain” (memorize) the data. Bootstrapping cranks this process up to 11.

In an ideal world, you have plenty of data to afford setting some aside for validation, but back in the real world, you take what you can get. Bootstrapping says, “Not enough data? No problem. Train on a random subset of the data. In fact, train on a bunch of different random subsets. Rinse and repeat.” By picking many different subsets and averaging, you hope to avoid overtraining on any one particular set.

The paper we’re exploring today tried to use bootstrapping but never quite justified it. Here’s a pre-print of the article (raw tex file). Briefly, this study models crash counts on sections of roadway by building a “Negative Binomial” (a fancy Least Squares for predicting counts rather than rational numbers) model using roadway features as independent variables. The crux of the study, however, is the “Mean Absolute Error” (MAE) of a given model, basically how wrong it is. The authors claim that by bootstrapping their model building, they get lower MAEs than simply simply building one big model (Figure 3). This preprint leaves out some details, but this point comes out strongly.

This study fails, however, because it doesn’t address the problems bootstrapping may introduce. The whole point of holding data back in a validation set is to avoid overtraining. But in bootstrapping, you repeatedly use data points multiple times to build one model. Now, it may be that this violation of independence between bootstrap iterations is insignificant. But that needs to be stated and supported in the article.

Confession time: This is one of my earliest scholarly works. Back in summer 2008, we didn’t have the time to understand and qualify bootstrapping’s limitations. So we scrapped the paper. Thankfully, we were able to reuse the Negative Binomial model in another paper. But in a more sleep-deprived or results desperate environment, someone might have ignored the omission and published anyway.

Page 15 of 16