The Gap

No, not the store. I am referring to the gap between the invention of a method and the theoretical justification for that method.

It is hard to quantify the gap because, for any method, there is always dispute about who invented (discovered?) it, and who nailed the theory. For example, maximum likelihood is usually credited to Ronald Fisher but one could argue that it was really invented by Gauss or Laplace. And who provided the theory for maximum likelihood? Well, most people would say Fisher. But you could also argue that Le Cam was the one who really made it rigorous.

So, with these caveats in my mind, here are a few examples of the gap. I am interested to hear your examples (as well as your opinions about my examples.)

Maximum Likelihood Fisher 1912 Fisher 1922
The Bootstrap Efron 1979 Gine and Zinn 1990
The Lasso Tibshirani 1996 Greenshtein and Ritov 2004
False Discovery Rate Schweder, Spotvoll 1982 Benjamini and Hochberg 1995
Wavelet Denoising Haar 1905 Donoho and Johnstone 1998
Cross-validation Stone 1974 Gyorfi, Kohler, Krzyzak,
and Walk 2002
Causal Inference Rubin 1974 Rubin 1974, Robins 1989
(Counterfactual Version)
Causal Inference Wright 1934 Spirtes, Glymour, Scheines 1987
(Graphical Version) and Pearl and Verma 1991

Every item in the above list is debatable. For example, I consider Gine amd Zinn (1990) the first definitive result that clarifies necessary and sufficient conditions to justify the bootstrap. But one could well argue that earlier papers Bickel and Freedman (1981) and Singh (1981) deserve that label.

What items do you disagree with?

What items would you add?

Partial List of References

John Aldrich. (1997). R.A. Fisher and the making of maximum likelihood 1912-1922. Statist. Sci., 12, 162-176.

Donoho, David L and Johnstone, Iain M. Minimax estimation via wavelet shrinkage. The Annals of Statistics, 26, 879-921.

Efron, Bradley. (1997). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7, 1-26.

Gine, Evarist and Zinn, Joel. (1990). Bootstrapping general empirical measures. The Annals of Probability, 851–869.

Greenshtein, Eitan and Ritov, Ya’Acov. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971-988.

Gy{ö}rfi, L{á}szl{ó} and Kohler, Michael and Krzyzak, Adam and Walk, Harro. (2002). A distribution-free theory of nonparametric regression, Springer.

Tibshirani, Robert. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 267–288.

P.S. Thanks to Ryan Tibshirani and David Choi for discussions on this topic.



  1. Posted March 7, 2013 at 11:04 pm | Permalink

    Can you give examples (perhaps in another post) of the opposite – algorithms/hypothesis-tests/ideas that were theoretically near-optimal but became practically usable much later?

    • Posted March 8, 2013 at 7:53 am | Permalink

      Good question
      Maybe some commenters will chime in?

      • Martin Azizyan
        Posted March 8, 2013 at 9:40 am | Permalink

        Anything Bayesian immediately comes to mind — often impractical without fast computers. Of course the “theoretically near-optimal” part is open to interpretation.

      • Posted March 8, 2013 at 7:44 pm | Permalink

        I think convex optimization broadly has a very rich history of methods and gaps — but even restricting to LPs — they were first formulated by Kantorovich in the 30s (though people were solving linear systems long before), then Dantzig came up with the simplex method during WWII, and then Khachiyan came up with the Ellipsoid in the seventies, Karmakar in the 80s came up with interior point methods, and Spielman-Teng gave a smoothed analysis for the simplex in the 2000s.

        Method: Simplex, Inventor: Dantzig (40s), Theoretical justification: Spielman-Teng (2000s)

        (Theoretically good) Poly-time methods were known for a few years before they became practically useful.

    • Posted March 30, 2013 at 11:06 am | Permalink

      I have read claims that much of modern AI & machine learning can be seen as just this: algorithms invented decades ago, but only made useful with modern fast computers and large data sets. (One example in particular being the field of computer vision.) I don’t know how true this is, but it seems plausible for examples that I know of: garbage collection was invented back in the ’50s or something, and only in the ’90s did it really become standard issue for new languages to the point where contemporary programmers take it purely for granted; and Bayesian spam filters (mid 90s to early 00s?) are naive Bayesian classifiers and I wouldn’t be surprised if that was arguably centuries old.

  2. Posted March 8, 2013 at 5:52 am | Permalink

    There is some amount of controversy in what I am about to say using Yoram Bressler’s slides (

    Compressive Sensing:

    Inventor: Prony (1795)
    Theoretical Justification: Donoho, Tao, Candes, Romberg (2004)

  3. Posted March 8, 2013 at 6:08 am | Permalink

    One might argue that the counter-factual approach to causal inference was first due to Neymann (1923), which increases the gap quite a bit!

    Neymann, J. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczki 10, 1–51, 1923.

    • Posted March 8, 2013 at 7:55 am | Permalink

      In fact Rubin always credits Neyman, so I guess you are right.

  4. Posted March 8, 2013 at 6:54 am | Permalink

    Didn’t Hansen both propose and give theoretical justification for using GMM in his famous 1982 paper?

  5. Rohde, Charles
    Posted March 8, 2013 at 2:03 pm | Permalink

    How about the likelihood principle?

    From: Normal Deviate <> Reply-To: Normal Deviate <>

    • Posted March 8, 2013 at 8:05 pm | Permalink

      In this case there’s an eternal gap!

    • Posted March 8, 2013 at 8:06 pm | Permalink

      I don’t think this is really a “method”

  6. Nicole Jinn
    Posted March 8, 2013 at 8:26 pm | Permalink

    What is the difference between the counterfactual version and the graphical version of Causal Inference? I did not know that there are multiple types of Causal Inference. I have done a little bit of research in Causal Inference, which almost always employs the graphical version, at least in the statistics literature I have read. Also, how do you define the invention (or discovery) of a method? There has been (and continues to be) a lot of debate on the existence of a logic of discovery.

  7. Paulo
    Posted March 19, 2013 at 3:39 pm | Permalink

    Hi, Larry!

    Cool post. I’d suggest you to add the gap between Rosenblatt’s introduction of the Perceptron

    Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,
    Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386, 408. doi:10.1037/h0042519

    and Novikoff’s proof that the Perceptron algorithm converges after a finite number of iterations if the data
    set is linearly separable.

    Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615-622. Polytechnic Institute of Brooklyn.

    All my best,


  8. Tristan
    Posted March 26, 2013 at 1:03 pm | Permalink

    Some quantum-mechanical Feynman-type invariants involving integration over a path space. Invented by physicists in the 50s / 60s, super useful but still theoretically unproven despite substantial effort.

  9. Matt Healy
    Posted June 29, 2013 at 5:58 pm | Permalink

    Such things have been going on for a very long time in the history of science: after Leibniz and Newton invented calculus it took over a century before folks like Cauchy, Weierstrass, and Riemann finally developed a solid theoretical foundation in terms of limits, epsilon, etc.

    Later, the “operational calculus” was used by engineers to solve problems in cable theory even though mathematicians kvetched about Heaviside’s lack of rigor when he invented it (eventually, of course, rigorous foundations based mostly on Taylor series expansions and Euler’s theorem were developed).

%d bloggers like this: