Statisticians are woefully ignorant about computer science (CS).
And computer scientists are woefully ignorant about statistics.
O.k. I am exaggerating. Nonetheless, it is worth asking: what important concepts should every statistician know from computer science?
I have asked several friends from CS for a list of the top three things from CS that statisticians should know. While there wasn’t complete agreement, here are the three that came up:
- Computational complexity classes. In particular, every statistician should understand what P and NP mean (and why you get $1,000,000 from the Clay Mathematics Institute if you prove that .). Understanding the fact that searching through all submodels in a variable selection problem is NP hard will convince you that solving a convex relaxation (a.k.a. the lasso) is a really good idea.
- Estimating computing time. In CS and machine learning, it is expected that one will estimate the number of operations needed to carry out an algorithm. You write a paper with an algorithm and then say something like: this will take computing time. In statistics, we are pretty loose about this. Some do it, some don’t.
- Hashing. One colleague assured me that hashing is critical if statisticians really want to be part of the “big data” world. (Last week, Siva was kind enough to give me a quick tutorial on hashing.)
Is there anything I am missing? What would you put in the top three?
And how about the other way around? For a computer scientist with an interest in statistical learning, what are the three most important concepts that they should know from statistics?
Here are my nominations:
- Concentration of measure.
- Maximum likelihood.
- Minimax theory.
I think this list is not what most statisticians would come up with.
What would you put on the list?