No, not the store. I am referring to the gap between the invention of a method and the theoretical justification for that method.

It is hard to quantify the gap because, for any method, there is always dispute about who invented (discovered?) it, and who nailed the theory. For example, maximum likelihood is usually credited to Ronald Fisher but one could argue that it was really invented by Gauss or Laplace. And who provided the theory for maximum likelihood? Well, most people would say Fisher. But you could also argue that Le Cam was the one who really made it rigorous.

So, with these caveats in my mind, here are a few examples of the gap. I am interested to hear your examples (as well as your opinions about my examples.)

METHOD | INVENTOR | THEORETICAL |

JUSTIFICATION | ||

Maximum Likelihood | Fisher 1912 | Fisher 1922 |

The Bootstrap | Efron 1979 | Gine and Zinn 1990 |

The Lasso | Tibshirani 1996 | Greenshtein and Ritov 2004 |

False Discovery Rate | Schweder, Spotvoll 1982 | Benjamini and Hochberg 1995 |

Wavelet Denoising | Haar 1905 | Donoho and Johnstone 1998 |

Cross-validation | Stone 1974 | Gyorfi, Kohler, Krzyzak, |

and Walk 2002 | ||

Causal Inference | Rubin 1974 | Rubin 1974, Robins 1989 |

(Counterfactual Version) | ||

Causal Inference | Wright 1934 | Spirtes, Glymour, Scheines 1987 |

(Graphical Version) | and Pearl and Verma 1991 | |

Every item in the above list is debatable. For example, I consider Gine amd Zinn (1990) the first definitive result that clarifies necessary and sufficient conditions to justify the bootstrap. But one could well argue that earlier papers Bickel and Freedman (1981) and Singh (1981) deserve that label.

What items do you disagree with?

What items would you add?

** Partial List of References **

John Aldrich. (1997). R.A. Fisher and the making of maximum likelihood 1912-1922. *Statist. Sci.*, 12, 162-176.

Donoho, David L and Johnstone, Iain M. Minimax estimation via wavelet shrinkage. *The Annals of Statistics*, 26, 879-921.

Efron, Bradley. (1997). Bootstrap methods: another look at the jackknife. *The annals of Statistics*, 7, 1-26.

Gine, Evarist and Zinn, Joel. (1990). Bootstrapping general empirical measures. *The Annals of Probability*, 851–869.

Greenshtein, Eitan and Ritov, Ya’Acov. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. *Bernoulli*, 10, 971-988.

Gy{ö}rfi, L{á}szl{ó} and Kohler, Michael and Krzyzak, Adam and Walk, Harro. (2002). *A distribution-free theory of nonparametric regression*, Springer.

Tibshirani, Robert. (1996). Regression shrinkage and selection via the lasso. *Journal of the Royal Statistical Society. Series B*, 267–288.

P.S. Thanks to Ryan Tibshirani and David Choi for discussions on this topic.

## 19 Comments

Can you give examples (perhaps in another post) of the opposite – algorithms/hypothesis-tests/ideas that were theoretically near-optimal but became practically usable much later?

Good question

Maybe some commenters will chime in?

Anything Bayesian immediately comes to mind — often impractical without fast computers. Of course the “theoretically near-optimal” part is open to interpretation.

I think convex optimization broadly has a very rich history of methods and gaps — but even restricting to LPs — they were first formulated by Kantorovich in the 30s (though people were solving linear systems long before), then Dantzig came up with the simplex method during WWII, and then Khachiyan came up with the Ellipsoid in the seventies, Karmakar in the 80s came up with interior point methods, and Spielman-Teng gave a smoothed analysis for the simplex in the 2000s.

Method: Simplex, Inventor: Dantzig (40s), Theoretical justification: Spielman-Teng (2000s)

(Theoretically good) Poly-time methods were known for a few years before they became practically useful.

I have read claims that much of modern AI & machine learning can be seen as just this: algorithms invented decades ago, but only made useful with modern fast computers and large data sets. (One example in particular being the field of computer vision.) I don’t know how true this is, but it seems plausible for examples that I know of: garbage collection was invented back in the ’50s or something, and only in the ’90s did it really become standard issue for new languages to the point where contemporary programmers take it purely for granted; and Bayesian spam filters (mid 90s to early 00s?) are naive Bayesian classifiers and I wouldn’t be surprised if that was arguably centuries old.

There is some amount of controversy in what I am about to say using Yoram Bressler’s slides (http://www.eecs.umich.edu/ssp2012/bresler.pdf)

Compressive Sensing:

Inventor: Prony (1795)

Theoretical Justification: Donoho, Tao, Candes, Romberg (2004)

I did not know about Prony

That may win the biggest gap award

FYI

http://nuit-blanche.blogspot.com/2013/05/sunday-morning-insight-200-year-gap.html

One might argue that the counter-factual approach to causal inference was first due to Neymann (1923), which increases the gap quite a bit!

Ref:

Neymann, J. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczki 10, 1–51, 1923.

In fact Rubin always credits Neyman, so I guess you are right.

Didn’t Hansen both propose and give theoretical justification for using GMM in his famous 1982 paper?

How about the likelihood principle?

From: Normal Deviate <comment-reply@wordpress.com> Reply-To: Normal Deviate <comment+pg1hu6kx-mokn-_oq24fo6@comment.wordpress.com>

In this case there’s an eternal gap!

I don’t think this is really a “method”

What is the difference between the counterfactual version and the graphical version of Causal Inference? I did not know that there are multiple types of Causal Inference. I have done a little bit of research in Causal Inference, which almost always employs the graphical version, at least in the statistics literature I have read. Also, how do you define the invention (or discovery) of a method? There has been (and continues to be) a lot of debate on the existence of a logic of discovery.

The counterfactual approach is described here

http://en.wikipedia.org/wiki/Rubin_causal_model

I don’t have a definition of “inventing a method.”

I had hoped I had made that clear in the post.

Hi, Larry!

Cool post. I’d suggest you to add the gap between Rosenblatt’s introduction of the Perceptron

Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,

Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386, 408. doi:10.1037/h0042519

and Novikoff’s proof that the Perceptron algorithm converges after a finite number of iterations if the data

set is linearly separable.

Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615-622. Polytechnic Institute of Brooklyn.

All my best,

Zen.

Some quantum-mechanical Feynman-type invariants involving integration over a path space. Invented by physicists in the 50s / 60s, super useful but still theoretically unproven despite substantial effort.

Such things have been going on for a very long time in the history of science: after Leibniz and Newton invented calculus it took over a century before folks like Cauchy, Weierstrass, and Riemann finally developed a solid theoretical foundation in terms of limits, epsilon, etc.

Later, the “operational calculus” was used by engineers to solve problems in cable theory even though mathematicians kvetched about Heaviside’s lack of rigor when he invented it (eventually, of course, rigorous foundations based mostly on Taylor series expansions and Euler’s theorem were developed).