## The Gap

No, not the store. I am referring to the gap between the invention of a method and the theoretical justification for that method.

It is hard to quantify the gap because, for any method, there is always dispute about who invented (discovered?) it, and who nailed the theory. For example, maximum likelihood is usually credited to Ronald Fisher but one could argue that it was really invented by Gauss or Laplace. And who provided the theory for maximum likelihood? Well, most people would say Fisher. But you could also argue that Le Cam was the one who really made it rigorous.

So, with these caveats in my mind, here are a few examples of the gap. I am interested to hear your examples (as well as your opinions about my examples.)

 METHOD INVENTOR THEORETICAL JUSTIFICATION Maximum Likelihood Fisher 1912 Fisher 1922 The Bootstrap Efron 1979 Gine and Zinn 1990 The Lasso Tibshirani 1996 Greenshtein and Ritov 2004 False Discovery Rate Schweder, Spotvoll 1982 Benjamini and Hochberg 1995 Wavelet Denoising Haar 1905 Donoho and Johnstone 1998 Cross-validation Stone 1974 Gyorfi, Kohler, Krzyzak, and Walk 2002 Causal Inference Rubin 1974 Rubin 1974, Robins 1989 (Counterfactual Version) Causal Inference Wright 1934 Spirtes, Glymour, Scheines 1987 (Graphical Version) and Pearl and Verma 1991

Every item in the above list is debatable. For example, I consider Gine amd Zinn (1990) the first definitive result that clarifies necessary and sufficient conditions to justify the bootstrap. But one could well argue that earlier papers Bickel and Freedman (1981) and Singh (1981) deserve that label.

What items do you disagree with?

Partial List of References

John Aldrich. (1997). R.A. Fisher and the making of maximum likelihood 1912-1922. Statist. Sci., 12, 162-176.

Donoho, David L and Johnstone, Iain M. Minimax estimation via wavelet shrinkage. The Annals of Statistics, 26, 879-921.

Efron, Bradley. (1997). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7, 1-26.

Gine, Evarist and Zinn, Joel. (1990). Bootstrapping general empirical measures. The Annals of Probability, 851–869.

Greenshtein, Eitan and Ritov, Ya’Acov. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971-988.

Gy{ö}rfi, L{á}szl{ó} and Kohler, Michael and Krzyzak, Adam and Walk, Harro. (2002). A distribution-free theory of nonparametric regression, Springer.

Tibshirani, Robert. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 267–288.

P.S. Thanks to Ryan Tibshirani and David Choi for discussions on this topic.

1. Posted March 7, 2013 at 11:04 pm | Permalink | Reply

Can you give examples (perhaps in another post) of the opposite – algorithms/hypothesis-tests/ideas that were theoretically near-optimal but became practically usable much later?

• Posted March 8, 2013 at 7:53 am | Permalink | Reply

Good question
Maybe some commenters will chime in?

• Martin Azizyan
Posted March 8, 2013 at 9:40 am | Permalink

Anything Bayesian immediately comes to mind — often impractical without fast computers. Of course the “theoretically near-optimal” part is open to interpretation.

• Posted March 8, 2013 at 7:44 pm | Permalink

I think convex optimization broadly has a very rich history of methods and gaps — but even restricting to LPs — they were first formulated by Kantorovich in the 30s (though people were solving linear systems long before), then Dantzig came up with the simplex method during WWII, and then Khachiyan came up with the Ellipsoid in the seventies, Karmakar in the 80s came up with interior point methods, and Spielman-Teng gave a smoothed analysis for the simplex in the 2000s.

Method: Simplex, Inventor: Dantzig (40s), Theoretical justification: Spielman-Teng (2000s)

(Theoretically good) Poly-time methods were known for a few years before they became practically useful.

• Posted March 30, 2013 at 11:06 am | Permalink | Reply

I have read claims that much of modern AI & machine learning can be seen as just this: algorithms invented decades ago, but only made useful with modern fast computers and large data sets. (One example in particular being the field of computer vision.) I don’t know how true this is, but it seems plausible for examples that I know of: garbage collection was invented back in the ’50s or something, and only in the ’90s did it really become standard issue for new languages to the point where contemporary programmers take it purely for granted; and Bayesian spam filters (mid 90s to early 00s?) are naive Bayesian classifiers and I wouldn’t be surprised if that was arguably centuries old.

2. Posted March 8, 2013 at 5:52 am | Permalink | Reply

There is some amount of controversy in what I am about to say using Yoram Bressler’s slides (http://www.eecs.umich.edu/ssp2012/bresler.pdf)

Compressive Sensing:

Inventor: Prony (1795)
Theoretical Justification: Donoho, Tao, Candes, Romberg (2004)

• Posted March 8, 2013 at 7:52 am | Permalink | Reply

I did not know about Prony
That may win the biggest gap award

3. Posted March 8, 2013 at 6:08 am | Permalink | Reply

One might argue that the counter-factual approach to causal inference was first due to Neymann (1923), which increases the gap quite a bit!

Ref:
Neymann, J. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczki 10, 1–51, 1923.

• Posted March 8, 2013 at 7:55 am | Permalink | Reply

In fact Rubin always credits Neyman, so I guess you are right.

4. Posted March 8, 2013 at 6:54 am | Permalink | Reply

Didn’t Hansen both propose and give theoretical justification for using GMM in his famous 1982 paper?

5. Rohde, Charles

• Posted March 8, 2013 at 8:05 pm | Permalink | Reply

In this case there’s an eternal gap!

• Posted March 8, 2013 at 8:06 pm | Permalink | Reply

I don’t think this is really a “method”

6. Nicole Jinn

What is the difference between the counterfactual version and the graphical version of Causal Inference? I did not know that there are multiple types of Causal Inference. I have done a little bit of research in Causal Inference, which almost always employs the graphical version, at least in the statistics literature I have read. Also, how do you define the invention (or discovery) of a method? There has been (and continues to be) a lot of debate on the existence of a logic of discovery.

7. Paulo

Hi, Larry!

Cool post. I’d suggest you to add the gap between Rosenblatt’s introduction of the Perceptron

Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,
Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386, 408. doi:10.1037/h0042519

and Novikoff’s proof that the Perceptron algorithm converges after a finite number of iterations if the data
set is linearly separable.

Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615-622. Polytechnic Institute of Brooklyn.

All my best,

Zen.

8. Tristan