## The JSM, Minimaxity and the Language Police

The JSM, Minimaxity and the Language Police

I am back from the JSM. For those who don’t know, the JSM is the largest statistical meeting in the world. This year there were nearly 6,000 people.

Some people hate the JSM because it is too large. I love JSM. There is so much going on: lots of talks, receptions, tutorials and, of course, socializing. It’s impossible to walk 10 feet in any direction without bumping into another old friend.

1. Highlights

On Sunday I went to a session dedicated to the 70th birthday of my colleague Steve Fienberg. This was a great celebration of a good friend and remarkable statistician. Steve has more energy at 70 than I did when I was 20.

On a sadder note, I went to a memorial session for my late friend George Casella. Ed George, Bill Strawderman, Roger Berger, Jim Berger and Marty Wells talked about George’s many contributions and his huge impact on statistics. To borrow Ed’s catchphrase: what a good guy.

On Tuesday, I went to Judea Pearl’s medallion lecture, with discussions by Jamie Robins and Eric Tchetgen Tchetgen. Judea gave an unusual talk, mixing philosophy, metaphors (“eagles and snakes can’t build microscopes”) and math. Judea likes to argue that graphical models/structural equation models are the best way to view causation. Jamie and Eric argued that graphs can hide certain assumptions and that counterfactuals need to be used in addition to graphs.

Digression. There is another aspect of causation that deserves mention. It’s one thing to write down a general model to describe causation (using graphs or counterfactuals). It’s another matter altogether to construct estimators for the causal effects. Jamie Robins is well-known for his theory of “G-estimation”. At the risk of gross oversimplification, this is an approach to sequential causal modeling that eventually leads to semiparametric efficient estimators. The construction of these estimators is highly non-trivial. You can’t get them by just writing down a density for the graph and estimating the distribution. Nor can you just throw down some parametric models for the densities in the factorization implied by the graph. If you do, you will end up with a model in which there is no parameter that equals 0 when there is no causal effect. I call this the “null paradox.” And Jamie’s estimators are useful. Indeed, on Monday, there was a session called “Causal Inference in Observational Studies with Time-Varying Treatments.” All the talks were concerned with applying Jamie’s methods to develop estimators in real problems such as organ transplantation studies. The point is this: when it comes to causation, just writing down a model using graphs or counterfactuals is only a small part of the story. Finding estimators is often much more challenging.
End Digression.

On Wednesday, I had the honor of giving the Rietz Lecture (named after Henry Rietz, the first president of the Institute of Mathematical Statistics). I talked about Topological Inference. I will post the slides on my web page soon. I was fortunate to be introduced by my good friend Ed George.

2. Minimaxity

One thing that came up at my talk, but also in several discussions I had with people at the conference is the following problem. Recall that the minimax risk is

$\displaystyle R=\inf_{\hat \theta}\sup_{P\in {\cal P}}E_P[ L(\theta(P),\hat\theta)]$

where ${L}$ is a loss function, ${{\cal P}}$ is a set of distributions and the infimum is over all estimators. Often we can define an estimator that achieves the minimax rate for a problem. But the estimator that achieves the risk ${R}$ may not be computable in any practical sense. This happens in some of the topological problems I discussed. It also happens in sparse PCA problems. I suspect we are going to come across this issue more and more.

I suggested to a few people that we should really define the minimax risk by restricting ${\hat\theta}$ to be an estimator that can be computed in polynomial time. But how would we then compute the minimax risk? Some people, like Mike Jordan and Martin Wainwright, have made headway in studying the tradeoff between optimality and computation. But a general theory will be hard. I did mention this to Aad van der Vart at dinner one night. If anyone can make headway with this problem, it is Aad.

3. Montreal

The conference was in Montreal. If you have never been there, Montreal is a very pleasant city; vibrant and full of good restaurants. We had a very nice time including good dinners and cigars at a classy cigar bar.

Rant: The only bad thing about Montreal is the tyrannical language police. (As a dual citizen, I think I am allowed to criticize Canada!) Put a sign on your business only in English and go to jail. Very tolerant. This raises lots interesting problems: Is “pasta” English? What if you are using it as a proper name, as in naming your business: Pasta! And unless you live in North Korea, why should anyone be able to tell you what language to put on your private business? We heard one horror story that is almost funny. A software company opened in Montreal. The language police checked an approved of their signage. (Yes. They really pay people to do this.) Then they were told they had to write their code in French. I’m not even sure what this means. Is C code English or French? Anyway, the company left Montreal.

To be fair, it’s not just Montreal that lives under the brutal rule of bureaucrats and politicians. Every country does. But Canada does lean heavily toward arbitrary (i.e. favoring special groups) rules. Shane Jensen pointed me to Can’tada which lists stuff you can’t use in Canada.
End Rant.

Despite my rant, I think Montreal is a great city. I highly recommend it for travel.

4. Summary

The JSM is lots of fun. Yes, it is a bit overwhelming but where else can you find 6000 statisticians in one place? So next year, when I say I don’t want to go, someone remind me that I actually had a good time.

1. Posted August 9, 2013 at 4:53 pm | Permalink | Reply

I was there too. (Intended to introduce myself to you at your talk, but unfortunately wound up not attending.)

Surprisingly, I wound up enjoying Nate Silver’s talk, whom I tend to disagree with. I was disappointed that you didn’t challenge his claim to be a Bayesian, though. He seemed rather nervous, probably due to speaking to an audience of statisticians.

Speaking of minimaxness, in Peter Bickel’s history talk, he mentioned an old result of Chuck Stone’s that seemed to make the Curse of Dimensionality even gloomier. However, I continue to believe that as long as math stat solves problems that are tractable rather than practical, there are going to be a lot of theorems that are of dubious relevance. In this case, it was exacerbated by involving the number of derivatives the regression function has, a property I also consider irrelevant. (All the world is discrete, isn’t it?) The problem also may go away if one adopts the approach I mentioned at the end of my own talk (just random musings at this point, though).

The conference was extremely well organized. My only complaint is that they often scheduled “marquee” speakers in parallel sessions in the same time slots.

I agree with your point that Montreal is a thoroughly enjoyable city (at least with the weather there this week). However, I disagree with your rant on the language police. I have been critical of the governments of Taiwan and China for trying to stamp out the regional Chinese languages (“dialects”), so I applaud the Quebec government’s attempts to preserve the use of French. My wife and I went to a number of the neighborhoods, away from the tourist spots, and had no trouble. (My long-dormant and never-strong-to-begin-with reading knowledge of scientific French did help a little.)

• Posted August 9, 2013 at 5:32 pm | Permalink | Reply

I know what you mean. I was speaking against stiff competition (Brad Efron and Iain Johnstone.)
Regarding language. Forcibly stamping out a language is bad; but so is forcing people to use
a language. If the people of Quebec love French, it will survive without the force of government.
In fact, isn’t the Quebec government guilty of the same thing the Chinese government is: they’re
supressing a language (english).

• Posted August 12, 2013 at 10:58 am | Permalink | Reply

Norm: I am going to have to agree with Larry (on French not the dubious relevance of non-finite constructions).

Its a bit like encouraging water to flow uphill – French as a language in Canada and even Quebec seems to require an unequal playing field.

For instance, my kids went to the French School system in Ottawa (next to Quebec) that was open only to children of French speaking parents (usually both) and they would get a detention if caught speaking English inside the school (outside an English course). But I almost never heard any of the kids speaking French in the school yard or on the way home – they just flowed back to speaking English.

But providing this unfair playing field, no matter how well motivated, creates opportunities for being unfair, ludicrous and even outright mean.

Also, Montreal is about the most English friendly city in Quebec – so you have seen a biased sample.

2. Posted August 9, 2013 at 6:04 pm | Permalink | Reply

I believe that schoolkids in Quebec are required to learn English from the beginning, so the government can’t be said to be suppressing English.

By the way, I didn’t attend the Efron or Johnstone talks either. Is your talk on your Web page? I see a paper there with the word “homology” on it.

• Posted August 9, 2013 at 6:23 pm | Permalink | Reply

The talk is now on my website
http://www.stat.cmu.edu/~larry

• AL
Posted August 10, 2013 at 10:26 am | Permalink

This is completely off-topic, but what do you use to draw those wonderful graphics? (Which help a lot BTW)

• Posted August 10, 2013 at 11:28 am | Permalink

latex

• Ken

At the International Biometrics in 2006 in Montreal I asked one of the postgrads about this, and she said most don’t learn English at school. It is something that isn’t popular. She had learnt most of her English during her masters. I also noticed that some of waiters were learning it as they worked.

• Posted August 10, 2013 at 1:17 am | Permalink

From what I understand, the kids are required to take English as an academic subject (like math or geography). That doesn’t mean that they actually get practice in using English, though, and the student you met probably was referring to lack of practical skill in English.

As I said, we made a point of not hanging around the tourist areas of the city, and when we asked people on the street for directions, everyone was able to communicate, if only haltingly. This contrasts with our experience in Japan last September, where the level of English was startlingly low (though more than compensated by the genuine desire to help us strangers).

3. Judea Pearl

Larry,
the impression that I am against the use of counterfactuals.
This is not the case.

1. I repeatedly say that counterfactuals are the building
block of rational behavior and scientific thoughts.
see: http://ftp.cs.ucla.edu/pub/stat_ser/R269.pdf
http://ftp.cs.ucla.edu/pub/stat_ser/r360.pdf

2. I showed that ALL counterfactuals can be
encoded parsimoniously in one structural equation model,
and can be read easily from any such model.
see: http://ftp.cs.ucla.edu/pub/stat_ser/r370.pdf

3. I showed how the graphical-counterfactual symbiosis
can work to unleash the merits of both.
and I emphasized that mediation analysis would
still be in its infancy if it were not for the algebra
of counterfactuals (as it emerges from structural semantics).

4. I am aware of voiced concerns about graphs hiding
assumptions, but I prefer to express these concerns in terms of
“hiding opportunities”, rather than “hiding assumptions”,
because the latter is unnecessarily alarming.

A good analogy would be Dawid’s notation X||Y for independence
among variables, which states that every event of the form
X = x_i is independent of every event of the form Y=y_j.
There may therefore be hundreds of assumptions conveyed
by the innocent and common statement X||Y.
Is this a case of hiding assumptions?
I do not believe so.
Now imagine that we are not willing to defend the assumption
“X = x_k is independent of Y=y_m” for some specific k and m.
The notation forces us to write “variable X is not
independent of variable Y” thus hiding all the
(i,j) pairs for which the independence is defensible.
This is a loss of opportunity, not a hiding of assumptions,
because refraining from assuming independence is
a more conservative strategy; it prevents unwarranted
conclusions from being drawn.

Thanks for commenting on my lecture.

• Posted August 9, 2013 at 9:10 pm | Permalink | Reply

Thanks for the clarifications Judea.

4. Jacques René Giguère.