As for utility, I have trouble applying both classical and new. The ops world I’m in is still at “how to summarize data” and “drawing good graphs.” EDA is really important. The smoothing method is less important: It is more important to figure out what should go on the Y axis, show our engineers how to program it as an application that fits in with our architecture, apply it efficiently to thousands of assets in real time, design diagnostics for when the model is going south, figure out what to do with thousands of outlier alerts, and train others to interpret the results. Now I’m moving on to trying to do all of that, but with graphs with millions of nodes, where you are trying to find interpretable relationships and throw out spurious ones according to judgement that is really hard to automate. So it’s really difficult to take any academic paper or concept and directly apply it to real-world data, especially if it’s going to take enough effort that you have to prove it’s going to work before you implement it.

In the intermediate stat based on Casella and Berger that I took (taught by Valerie), the most useful things I learned–conceptually as opposed to procedurally–were navigating multivariate distributions and transformations, hierarchical models, Bayes estimators and intro to Bayesian concepts and methods and decision theory, MLEs and invariance/asymptotic properties thereof, and sufficiency. I had little to no stat background when I took that class. I would vote for teaching more about GLMs before moving on to additive models and the more distribution-free stuff: whereas the social sciences seem to know of the existence of GLMs, they seem to be skipped over in the ML/Compsci side of things except for basic logistic regression? Are all of the Big Data questions really that big?

Ancilliarity and completeness are things I wouldn’t miss. Sufficiency… there is a geometric interpretation in here tied to identifiability, I think, though I’ve not really gone through the math. Minimal Sufficient Statistics tied to Maximal Identifiable Parameters, in the sense of ones you can distinguish in data without the help of priors. So you can kick it out but it might show up again in model selection or causation. In relating models to experiments or for testing of theories, it is helpful to know from the model what you need to be able to observe in order to accumulate information about all of the inner structure that you care about.

]]>3 hours/week for 15 weeks

]]>Speaking as someone who has just graduated from a first degree in Statistics (albeit from the UK), I would appreciate having sufficiency, ancillarity, completeness and the Rao-Blackwell Theorem in a first year grad school course. Again, like what Mark has said above, probably worth looking at how the prerequisite courses cover and the background of the students coming in.

]]>Skovgaard, I.M., Likelihood asymptotics, Scandinavian Journal of Statistics, 28, 3–32 (2001)

]]>All the best,

Alexandre

The problem for me is that uniformly most powerful tests

don’t exist except in very special cases.

So it’s a lot of machinery to teach

and then you can’t use it.

—LW

]]>I have mixed feeling s about that.

I have never used completeness or ancillarity.

Also, they can always look it up and learn it

on their own if the need it.

LW

]]>I think everyone should have exposure to the classical theory (but I’m an academic so don’t live in the real world). I can’t argue too much with reduced emphasis on sufficiency, ancillarity and completeness (but that may also be more of a personal bias).

I think that exposure to nonparametrics, prediction and classfication, model selection are also valuable topics (and each could be a course in its own right).

]]>