—Larry Wasserman

Welcome to my blog, which will discuss topics in Statistics and Machine Learning. Some posts will be technical and others will be non-technical. Since this blog is about topics in both Statistics and Machine Learning, perhaps I should address the question: What is the difference between these two fields?

The short answer is: None. They are both concerned with the same question: how do we learn from data?

But a more nuanced view reveals that there are differences due to historical and sociological reasons. Statistics is an older field than Machine Learning (but young compared to Math, Physics etc). Thus, ideas about collecting and analyzing data in Statistics are rooted in the times before computers even existed. Of course, the field has adapted as times have changed but history matters and the result is that the way Statisticians think, teach, approach problems and choose research topics is often different than their colleagues in Machine Learning. I am fortunate to be at an institution (Carnegie Mellon) which is active in both (and I have appointments in both departments) so I get to see the similarities and differences.

If I had to summarize the main difference between the two fields I would say:

Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.

Machine Learning emphasizes high dimensional prediction problems.

But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example:

Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.

Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting.

But the differences become blurrier all the time. Check out two flagship journals:

The Annals of Statistics and

The Journal of Machine Learning Research.

The overlap in topics is striking. And many topics get started in one field and then are developed further in the other. For example, Reproducing Kernel Hilbert Space (RKHS) methods are hot in Machine Learning but they began in Statistics (thanks to Manny Parzen and Grace Wahba). Similarly, much of online learning has its roots in the work of the statisticians David Blackwell and Jim Hannan. And of course there are topics that are highly active in both areas such as concentration of measure, sparsity and convex optimization. There are also differences in terminology. Here are some examples:

Statistics Machine Learning

———————————–

Estimation Learning

Classifier Hypothesis

Data point Example/Instance

Regression Supervised Learning

Classification Supervised Learning

Covariate Feature

Response Label

and of course:

Statisticians use R.

Machine Learners use Matlab.

Overall, the the two fields are blending together more and more and I think this is a good thing.

## 11 Comments

There are some aspects of statistics that I really don’t think machines should do, for example when it comes to expert assessment of informal evidence/information as required for choice of subjective priors or setting up dissimilarity measures that reflect how “dissimilarity” is conceptualised in an area. My impression is that the field of “Machine Learning” communicates by its name that it doesn’t want to cover these aspects (or if some Machine Learners do it, I suspect that they tend to over-automatise). Of course many statisticians don’t want this stuff either, because it’s so obviously subjective, but I’d still assign it to Stats.

I have had this discussion with at least a few people, and my (often disagreed with) opinion is that while there is heavy overlap between topics studied by statisticians and machine learners (I actually think that the overlap is the bulk of topics) there really are some things that only one of the two communities is really active in.

The topics in a “computational learning theory” class ( http://www.machinelearning.com/ ) really are quite distinct from any stats learning theory class I have taken because of its focus on different learning scenarios (active learning, membership query learning, learning under a mistake bound, online learning etc.) and also on computational hardness (a typical theorem would be – “Learning parities/halfspaces/something else with noise is hard in the statistical query model.”). Showing hardness (or computability) in a Turing computation model for learning problems is something that pulls this kind of learning theory closer to computational complexity theory and possibly a bit further away from statistics.

Really though, the fact that one has to make an effort to bring out the distinction between ML and stats means (to me at least) that the two really are quite similar.

One difference that has interested me for a bit is the use of regularization.

In statistics, the coefficients are the end-game. The coefficients are used to indicate e.g. the effectiveness of a drug, and their integrity is important. Consider the linear regression, Y = a_1X_1 + a_2X_2 + \cdots, which is solved by inverting the normal matrix X^TX. If, due to lack of training data richness, the normal matrix is ill-conditioned, then small “measurement error” can cause large changes in coefficients. For that reason, a prior should be specified.

In machine learning, a predictive model Y = F(X, A) is the end-game. Consider the same linear regression with an ill-conditioned normal matrix. Then, with small measurement error, we see large changes in A. However, once put into use, the model F will see inputs X that are similar to the training data, then large changes in A will result in small changes in F(\cdot, A). In other words, our predictive model F will work fine. In practice however we cannot guarantee similar X, so we should regularize in order to allow the model to generalize to new input.

Wow! As a young mathematician, I work in a company where both are used, but I didn’t know the formal difference until now. I would have said “Machine Learning is teaching computers to do Statistics with tones of data”.

Nice post and blog! I’ll keep on reading.

IMHO, the difference is: statistic is applied math for data analysis; machine learning is engineering (how to build adaptive systems that can use data to adapt and improve their performance). Roughly speaking the difference is the same as mechanics in physics and mechanical engineering. The math is the same, but point of view is completely different – describe or predict vs design a system.

Glad to see you started a blog! I just found out about it today via Andrew Gelman.

I took Stats 301 with you back in 1998, and it was pivotal in drawing me away from math theory and into statistics and applied math. I went on to get my PhD in physics and found a company in the ML/CV space. Always meant to thank you, and now’s my chance. :)

You’re welcome!

Good luck with your company

Hi Larry,

Working part of the time in a CS department and part of the time in a statistics department, for me the two are really part of a bigger Big Field (what would be a good name for it?) . And indeed more and more people in ML and statistics do similar things. Still, as mentioned above there are a few topics done only in one of the two subfields – I’d like to add reinforcement learning (only done in ML) and good-old testing and setting up protocols for gathering data and reporting of statistical evidence in court (such as DNA matches), all only done in statistics. My friends in ML are often surprised when I tell them of my interest in testing and sampling plans, and my friends in stats often don’t even know what reinforcement learning is.

Machine Learning grew out of the desire of computer scientists (especially in artificial intelligence) to build computer programs “from data” as opposed to from written specifications. Many of the tasks that AI researchers would like to automate (computer vision, speech understanding, robotic manipulation, language translation) cannot be easily specified in any formal way. People are unable to introspect to come up with a specification–in contrast to, for example, specifying the desired behavior of an accounting system. For many of these tasks, it is easy to obtain “input-output” examples y=F(x) of the desired behavior y of the program F on individual inputs x. Hence, machine learning people started building computer programs to construct F from (x,y) pairs. We didn’t realize there was any connection to statistics at the start. We knew there was a “correct” F, and we wanted to find it. But then we started to notice odd things. One was that if we chose a function space H that was overly simple and used the best F in that space, it often worked better than choosing a function space that we believed contained the true F. We had discovered over-fitting and the bias-variance tradeoff. Another was the importance of understanding the data and fit of F to the data graphically. At some point, we found out that statisticians already knew all of this and that they had formalisms for understanding it!

Even today, I think that the primary motivation of the vast majority of machine learning is still engineering. We seek to create a system that gives good performance on some task. We are generally not concerned about interpreting the fitted model or testing hypotheses about the underlying phenomena. In fact, we naturally assume that the fitted model is very complicated and that we probably can’t understand it. (And our complex non-parametric models pretty much guarantee this!) After all, we got to this point by concluding that we couldn’t write F by hand.

But there are some forces driving ML people toward models that try to capture the underlying causal structure of the phenomena. One is what we call “transfer learning”. We learn F on one set of (x,y) pairs, but then we want it to generalize well to (x,y) pairs drawn from a very different context. To the extent that F captures the underlying causal structure of the phenomenon, it is more likely to generalize to such novel situations. Hence, sometimes good engineering requires good science.

In summary, I don’t think it is accurate to say that machine learning is the same as (computational) statistics. But there is a huge overlap–both in the underlying mathematics and in the resulting techniques.

Thanks for adding some historical perspective.

It’s easy to forget that the task of “function fitting”

attracted attention in ML for different reasons than in Statistics.

LW

David Sarnoff (Russian born American inventor pioneer in the development of both radio and television broadcasting, 1891-1971) once said: “The human brain must continue to frame the problems for the electronic machine to solve.” I agree.

## One Trackback

[...] Larry Wasserman on the difference between statistics and machine learning. [...]