There are many ways to discuss the quality of estimators in statistics. Today I want to review three common notions: presistency, consistency and sparsistency. I will discuss them in the context of linear regression. (Yes, that’s presistency, not persistency.)
Suppose the data are where
,
and
. Let
be an estimator of
.
Probably the most familiar notion is consistency. We say that is consistent if
as .
In recent years, people have become interested in sparsistency (a term invented by Pradeep Ravikumar). Define the support of to be the location of the nonzero elements:
Then is sparsistent if
as .
The last one is what I like to call presistence. I just invented this word. Some people call it risk consistency or predictive consistency. Greenshtein and Ritov (2004) call it persistency but this creates confusion for those of us who work with persistent homology. Of course, presistence come from shortening “predictive consistency.”
Let be a new pair. The predictive risk of
is
Let be some set of
‘s and let
be the best
in
. That is,
minimizes
subject to
. Then
is presistent if
This means that predicts nearly as well as the best choice of
. As an example, consider the set of sparse vectors
(The dimension is allowed to depend on
which is why we have a subscript on
.) In this case,
can be interpreted as the best sparse linear predictor. The corresponding sample estimator
which minimizes the sums of squares subject to being in
, is the lasso estimator. Greenshtein and Ritov (2004) proved that the lasso is presistent under essentially no conditions.
This is the main message of this post: To establish consistency or sparsistency, we have to make lots of assumptions. In particular, we need to assume that the linear model is correct. But we can prove presistence with virtually no assumptions. In particular, we do not have to assume that the linear model is correct.
Presistence seems to get less attention than consistency of sparsistency but I think it is the most important of the three.
Bottom line: presistence deserves more attention. And, if you have never read Greenshtein and Ritov (2004), I highly recommend that you read it.
Reference:
Greenshtein, Eitan and Ritov, Ya’Acov. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971-988.
7 Comments
From what I’ve seen, predictive consistency in various senses and settings is always easier to achieve than parameter consistency. (There was some funky thing about this in Grünwald’s MDL book that I’ve been meaning to go back and review…)
Indeed. Also, if only at first glance, ‘presistency’ seems to have connections to Universal Codes, and the ‘close to the best’ property rings rather familiar to the Sharkov/Normalized Maximum Likelihood in particular.
To prove consistency you need assumptions about the data. To prove “presistency” you need assumptions on your comparison class of estimators, B_n. The “no assumptions about the data” is usually used as a selling point for expert advise approaches. It’s ok, as long as you remember the other part- assumptions on the comparison class. But please don’t make the next step of calling the setting “adversarial.”
Dumb question (so, suitable apologies): the convergence mode of sparsistency is not specified; I presume it is in the a.s. sense. The other two measures are required to converge only in probability. Why is it important that the convergence of sparsistency be stronger?
it is in prob
P(support(estimate) = support(truth) –> 1
I think one of the reasons presistency receives less attention (despite its importance) is that people inevitably want to interpret the regression coefficient vector beta. Unfortunately, at least for the typical case, such interpretation depends on pretty strong assumptions — the same sort of assumptions as for sparsistency, such as that your model class contains Nature’s true model.
Recall that not always you need to make lots of assumptions to show consistency, as it is highlighted in Stone, C. J. (1977) Consistent nonparametric regression. Ann. Statist., 5, 595-620, where consistency is shown with no conditions on the joint distribution of (X,Y).
2 Trackbacks
[…] 2013 by Mark Thoma Toxic Inequality – Paul Krugman Weather and Violence – owenzidar Consistency, Sparsistency and Presistency – Normal Deviate […]
[…] Consistency, Sparsistency and Presistency […]