CoLT does not claim to have discovered the basic ideas of non-Bayesian statistics. However, it has explored this space in new ways.

]]>For example, consider something as simple as learning a linear threshold function. If perfect discrimination is possible, that’s easily doable in polynomial time. Now let’s modify the problem slightly by making it more realistic: you don’t expect perfect discrimination to be possible, but you’d like to get as close to perfect discrimination as you can. That’s an NP-hard problem; it can’t be done in polynomial time (unless P=NP). OK, you say, I’d be satisfied with coming within some constant factor of the minimum, e.g., no more than 1.2 times the minimum error. That’s still NP-hard, for *any* constant factor, no matter how large. The same goes for the variable selection problem: not only is it NP-hard to minimize the number of regressors needed for perfect discrimination, it’s also NP-hard to come within any constant factor of the minimum.

As an aside, I lost interest in CoLT for two reasons: (1) nearly every interesting problem seemed to be NP-hard, so in practice you always seem to end up using computational methods that lack time or performance guarantees, and (2) I learned about Bayesian methods, which struck me as far more flexible as well as providing a more intellectually productive way of thinking about inference problems.

]]>Can you explain why asymptotic consistency is needed? In the PAC model, I specify epsilon and delta, and I only need a confidence interval of width epsilon with confidence delta. As a machine learning practitioner (not particularly fluent in the theoretical analysis of PAC algorithms), it would seem that once I have the required sample size (which is polynomial in 1/epsilon and 1/delta), I don’t care how the estimator behaves for larger samples.

Is the issue that I can specify epsilon arbitrarily close to zero, and hence drive the required sample size arbitrarily high? This would never be done in practice. I think most machine learning folks would be happy to bound epsilon away from zero.

]]>That was the one I had in mind.

]]>May I then suggest this paper for that discussion: http://arxiv.org/abs/1304.0828? The paper also received the best paper award for this year’s COLT.

]]>It’s true that in statistics we tend to ignore the computational complexity of the estimator.

We define an estimator to be any measurable function which really isn’t too useful.

There have been a few recent results about minimax statistical bounds when one restricts

the complexity of the estimator. I hope to blog about that in the future.

It depends on how you define “classical statistics”. As a graduate student in the mid-1980s, I learned about VC dimension albeit simply in the context of empirical process theory. But it is true that statisticians have generally been quite slow on the uptake in this area.

]]>Not all were done using asymptotics.

Remember, Hoeffding was a statistician!

I agree about VC dim.