## What To Teach?

The new term is almost here. This year I am (once again) teaching *Intermediate Statistical Theory* which is a first year graduate course in mathematical statistics.

This course used to be populated mainly by students in the first year of the Statistics Ph.D. program (about 5-10 students). Last year I had over 100 students! Most were from Computer Science and related fields. The world has changed. Statistics is sexy.

In the olden days I followed Casella and Berger. I covered the traditional topics like: basic probability, convergence, sufficiency, estimation, testing, confidence intervals, large sample theory.

In response to changes on our field I have been gradually changing the topics. I threw out: unbiased estimation, most of sufficiency, ancillarity, completeness, the Rao-Blackwell theorem, most powerful tests, etc. I added: Hoeffding’s inequality, VC theory, minimax theory, nonparametric smoothing, the bootstrap, prediction and classification, model selection and causation.

The field of statistics is changing quickly. If we don’t change our courses accordingly we run the risk of becoming irrelevant. On the other hand, we don’t want to abandon teaching core statistical principles and just teaching the latest fads.

I am wondering what other people do. Have you changed your courses? Did I throw out too much? Are there other things I should include?

### Like this:

Like Loading...

*Related*

## 10 Comments

Do your statistics graduate students still pick up the ‘thrown out’ material you mentioned in another course? It seems fine to teach such a course as you’ve indicated if it is attended by a high percentage of out-of-department students, but shouldn’t a stats PhD student from your own department should still be exposed to the classics?

I have mixed feeling s about that.

I have never used completeness or ancillarity.

Also, they can always look it up and learn it

on their own if the need it.

LW

Might also depend on what other courses they are taking and what subsequent courses would rely on as pre-requisite material (what is covered in your Advanced Statistical Theory courses, for example)?

I think everyone should have exposure to the classical theory (but I’m an academic so don’t live in the real world). I can’t argue too much with reduced emphasis on sufficiency, ancillarity and completeness (but that may also be more of a personal bias).

I think that exposure to nonparametrics, prediction and classfication, model selection are also valuable topics (and each could be a course in its own right).

Glad to hear “statistics is sexy”, even if it’s mostly from CS. But, dumping most powerful tests? I thought that was the sexiest part of statistics (when I first took it, and now still). I wonder, are you convinced that the ability to discover successful results for the new CS arenas isn’’t advanced by a theoretical understanding of the older optimality results and their difficulties?

The problem for me is that uniformly most powerful tests

don’t exist except in very special cases.

So it’s a lot of machinery to teach

and then you can’t use it.

—LW

I think that it depends on the field of each researcher (or student), i.e., for me most of the topics you threw out are very important for modern statistical theory. For instance, the concepts of sufficiency and ancilliarity are very important to perform Skovgaard’s adjustment for likelihood ratio statistics, see Skovgaard (2001).

All the best,

Alexandre

I forgot the reference:

Skovgaard, I.M., Likelihood asymptotics, Scandinavian Journal of Statistics, 28, 3–32 (2001)

How long is the course supposed to be, contact hours per week, etc?

Speaking as someone who has just graduated from a first degree in Statistics (albeit from the UK), I would appreciate having sufficiency, ancillarity, completeness and the Rao-Blackwell Theorem in a first year grad school course. Again, like what Mark has said above, probably worth looking at how the prerequisite courses cover and the background of the students coming in.

3 hours/week for 15 weeks

I am very glad for having been introduced to a broad range of concepts that I can think about in the same theoretical framework. I think the nonparametric stuff is a big part of it, but it’s a more complex entry into the framework than parametrics and classical theory.

As for utility, I have trouble applying both classical and new. The ops world I’m in is still at “how to summarize data” and “drawing good graphs.” EDA is really important. The smoothing method is less important: It is more important to figure out what should go on the Y axis, show our engineers how to program it as an application that fits in with our architecture, apply it efficiently to thousands of assets in real time, design diagnostics for when the model is going south, figure out what to do with thousands of outlier alerts, and train others to interpret the results. Now I’m moving on to trying to do all of that, but with graphs with millions of nodes, where you are trying to find interpretable relationships and throw out spurious ones according to judgement that is really hard to automate. So it’s really difficult to take any academic paper or concept and directly apply it to real-world data, especially if it’s going to take enough effort that you have to prove it’s going to work before you implement it.

In the intermediate stat based on Casella and Berger that I took (taught by Valerie), the most useful things I learned–conceptually as opposed to procedurally–were navigating multivariate distributions and transformations, hierarchical models, Bayes estimators and intro to Bayesian concepts and methods and decision theory, MLEs and invariance/asymptotic properties thereof, and sufficiency. I had little to no stat background when I took that class. I would vote for teaching more about GLMs before moving on to additive models and the more distribution-free stuff: whereas the social sciences seem to know of the existence of GLMs, they seem to be skipped over in the ML/Compsci side of things except for basic logistic regression? Are all of the Big Data questions really that big?

Ancilliarity and completeness are things I wouldn’t miss. Sufficiency… there is a geometric interpretation in here tied to identifiability, I think, though I’ve not really gone through the math. Minimal Sufficient Statistics tied to Maximal Identifiable Parameters, in the sense of ones you can distinguish in data without the help of priors. So you can kick it out but it might show up again in model selection or causation. In relating models to experiments or for testing of theories, it is helpful to know from the model what you need to be able to observe in order to accumulate information about all of the inner structure that you care about.