If you have regression at equally spaced values

with constant variance and normal error then, yes,

wavelet estimators give you a precise rule for choosing

the tuning parameter. But this situation is very special.

For ordinary regression or density estimation, cross-validation

is much more popular.

Is that true an adaptive density estimator does not suffer from tuning parameter? Like the adaptive wavelet estimator by Donoho and Johnstone? How does an adaptive estimator compared to a Kernel estimator with a tuned bandwidth?

Thank you!

]]>thanks for the reference

—LW

You might have a look at this paper:

“Cross-Validation and Mean-Square Stability” by Satyen Kale, Ravi Kumar, Sergei Vassilvitskii

A couple of possibly useful references:

and

http://www.oldenbourg-link.com/doi/abs/10.1524/stnd.2006.24.3.351

Another interesting question is the common practice of selecting the tuning parameter using cross-validation, then re-training the selected estimator using the entire data set. Again, one would hope to be able to prove tighter bounds vs. using the version that’s trained on only (k-1)/k of the data.

Both of the above questions seem like they might require additional assumptions on the estimation procedure in order to make progress. Do you know of any work in these directions?

]]>The essence of my view is that I think that scientists want to be “objective”, which far too often tempts them into not deciding things that need to be decided. Not everything can be done data-driven. If you have a loss function, OK, the data can help you to optimise it. But the data can’t tell you which loss function to choose, and so the data really can’t tell you whether you should be interested in squared loss or some other loss, which actually depends on what you want to do with the result. In density estimation, really the scientist needs to decide how much smoothness he or she wants. The data can’t decide this because smoothness of a density is strictly not observable.

In cluster analysis, if you have two nicely separated Gaussian mixture components and move them closer and closer to each other, at some point this will look like a single cluster, not two. And the researcher has to decide from which cutoff downwards this should be treated as a single cluster. There is no way the data can do this (OK, you could test unimodality but in several applications this is not the cluster concept you’re after). So there is no way to have the data properly decide the number of clusters without any kind of tuning. We should actually *want* to tune things so that they are of use to us (this assumes that it is well understood what tuning constants do, so I’m still in favour of getting rid of them where it’s not).

Something about loss function in this spirit is

C. Hennig and M. Kutlukaya: Some thoughts about the design of loss functions. REVSTAT 5 (2007), 19-39 (freely available online). ]]>

That’s true. Shape constrained density estimation is often tuning parameter free which

is nice.

This area is very promising, at least as I see it.

Also, whenever dealing with testing procedures, the choice of the tuning parameter becomes even more serious since there is no clear way of choosing an appropriate loss function.

I quite like the post! Very good and didactic.

]]>