Thanks for adding some historical perspective.

It’s easy to forget that the task of “function fitting”

attracted attention in ML for different reasons than in Statistics.

LW

]]>Even today, I think that the primary motivation of the vast majority of machine learning is still engineering. We seek to create a system that gives good performance on some task. We are generally not concerned about interpreting the fitted model or testing hypotheses about the underlying phenomena. In fact, we naturally assume that the fitted model is very complicated and that we probably can’t understand it. (And our complex non-parametric models pretty much guarantee this!) After all, we got to this point by concluding that we couldn’t write F by hand.

But there are some forces driving ML people toward models that try to capture the underlying causal structure of the phenomena. One is what we call “transfer learning”. We learn F on one set of (x,y) pairs, but then we want it to generalize well to (x,y) pairs drawn from a very different context. To the extent that F captures the underlying causal structure of the phenomenon, it is more likely to generalize to such novel situations. Hence, sometimes good engineering requires good science.

In summary, I don’t think it is accurate to say that machine learning is the same as (computational) statistics. But there is a huge overlap–both in the underlying mathematics and in the resulting techniques.

]]>Working part of the time in a CS department and part of the time in a statistics department, for me the two are really part of a bigger Big Field (what would be a good name for it?) . And indeed more and more people in ML and statistics do similar things. Still, as mentioned above there are a few topics done only in one of the two subfields – I’d like to add reinforcement learning (only done in ML) and good-old testing and setting up protocols for gathering data and reporting of statistical evidence in court (such as DNA matches), all only done in statistics. My friends in ML are often surprised when I tell them of my interest in testing and sampling plans, and my friends in stats often don’t even know what reinforcement learning is.

]]>You’re welcome!

Good luck with your company

I took Stats 301 with you back in 1998, and it was pivotal in drawing me away from math theory and into statistics and applied math. I went on to get my PhD in physics and found a company in the ML/CV space. Always meant to thank you, and now’s my chance. ๐

]]>Nice post and blog! I’ll keep on reading.

]]>In statistics, the coefficients are the end-game. The coefficients are used to indicate e.g. the effectiveness of a drug, and their integrity is important. Consider the linear regression, Y = a_1X_1 + a_2X_2 + \cdots, which is solved by inverting the normal matrix X^TX. If, due to lack of training data richness, the normal matrix is ill-conditioned, then small “measurement error” can cause large changes in coefficients. For that reason, a prior should be specified.

In machine learning, a predictive model Y = F(X, A) is the end-game. Consider the same linear regression with an ill-conditioned normal matrix. Then, with small measurement error, we see large changes in A. However, once put into use, the model F will see inputs X that are similar to the training data, then large changes in A will result in small changes in F(\cdot, A). In other words, our predictive model F will work fine. In practice however we cannot guarantee similar X, so we should regularize in order to allow the model to generalize to new input.

]]>