Estimating Undirected Graphs Under Weak Assumptions

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.”

As the name implies, the goal is to estimate an undirected graph ${G}$ from random vectors ${Y_1,\ldots, Y_n \sim P}$ . Here, each ${Y_i = (Y_i(1),\ldots, Y_i(D))\in\mathbb{R}^D}$ is a vector with ${D}$ coordinates, or features.

The graph ${G}$ has ${D}$ nodes, one for each feature. We put an edge between nodes ${j}$ and ${k}$ if the partial correlation ${\theta_{jk}\neq 0}$ . The partial correlation ${\theta_{jk}}$ is

$\displaystyle \theta_{jk} = - \frac{\Omega_{jk}}{\sqrt{\Omega_{jj}\Omega_{kk}}}$

where ${\Omega = \Sigma^{-1}}$ and ${\Sigma}$ is the ${D\times D}$ covariance matrix for ${Y_i}$ .

At first sight, the problem is easy. We estimate ${\Sigma}$ with the sample covariance matrix

$\displaystyle S = \frac{1}{n}\sum_{i=1}^n (Y_i - \overline{Y})(Y_i - \overline{Y})^T.$

Then we estimate ${\Omega}$ with ${\hat\Omega = S^{-1}}$ . We can then use the bootstrap to get confidence intervals for each ${\theta_{jk}}$ and then we put an edge between nodes ${j}$ and ${k}$ if the confidence interval excludes 0.

But how close is the bootstrap distribution ${\hat F}$ to the true distribution ${F}$ of ${\hat\theta_{jk}}$ ? Our paper provides a finite sample bound on ${\sup_t | \hat F(t) - F(t)|}$ . Not surprisingly, the bounds are reasonable when ${D < n}$ .

What happens when ${D>n}$ ? In that case, estimating the distribution of ${\hat\theta_{jk}}$ is not feasible unless one imposes strong assumptions. With these extra assumptions, one can use lasso-style technology. The problem is that, the validity of the inferences then depends heavily on strong assumptions such as sparsity and eigenvalues restrictions, which are not testable if ${D_n > n}$ . Instead, we take an atavistic approach: we first perform some sort of dimension reduction followed by the bootstrap. We basically give up on the original graph and instead estimate the graph for a dimension-reduced version of the problem.

If we were in a pure prediction framework I would be happy to use lasso-style technology. But, since we are engaged in inference, we take this more cautious approach.

One of the interesting parts of our analysis is that it leverages recent work on high dimensional Berry-Esseen theorems namely the results by Victor Chernozhukov, Denis Chetverikov and Kengo Kato which can be found here.

The whole issue of what assumptions are reasonable in high-dimensional inference is quite interesting. I’ll have more to say about the role of assumptions in high dimensional inference shortly. Stay tuned. In the meantime, if I have managed to spark your interest, please have a look at our paper.

2 Comments

Abnormal

Posted October 1, 2013 at 5:32 pm | Permalink

I am glad to see a graphical model not strongly relying on the divine normality assumption.
Jake

Posted October 4, 2013 at 1:53 am | Permalink

Hi Larry,

I think this paper would have helped me a lot about six months ago, and I look forward to getting a chance to sit down and read it.

Just from reading your summary of it here in the blog, though, it seems as if you’re stuck with doing a ton of matrix inverses, which is going to put a limit on the maximum size of D (both in the time required to compute them and the space required to store them (much less the space to store the sample covariances or even Y-Ybar in the large-n case!)

Normal Deviate