Hunting For Manifolds
Larry Wasserman
In this post I’ll describe one aspect of the problem of estimating a manifold, or manifold learning, as it is called.
Suppose we have data and suppose that the data lie on, or close to, a manifold of dimension . There are two different goals we might have in mind.
The first goal is to use the fact that the data lie near the manifold as a way of doing dimension reduction. The popular methods for solving this problem include isomap, local linear embedding, diffusion maps, and Laplacian eigenmaps among others.
The second goal is to actually estimate the manifold .
I have had the good fortune to work with two groups of people on this problem. With the first group (Chris Genovese, Marco Perone-Pacifico, Isa Verdinelli and me) we have focused mostly on certain geometric aspects of the problem. With the second group (Sivaraman Balakrishnan, Don Sheehy, Aarti Singh, Alessandro Rinaldo and me) we have focused on topology.
It is this geometric problem (my work with the first group) that I will discuss here. In particular, I will focus on these two papers: here and here.
1. Statistical Model
To make some progress, we introduce a statistical model for the data. Here are two models.
Model I (Clutter Noise): With probability we draw from a distribution supported on . With probability we draw from a uniform distribution on some compact set . To be concrete, let’s take . The distribution of is
where is uniform on .
Model II (Additive Noise): We draw from a distribution supported on . Then we let where is a -dimensional Gaussian distribution. In this case, has density given by
where is a Gaussian. We can also write this as
where denotes convolution.
There are some interesting complications. First, note that is a singular distribution. That is, there are sets with positive probability but 0 Lebesgue measure. For example, suppose that is a circle in and that is uniform on the circle. Then the circle has Lebesgue measure 0 but . Model II has the complication that it is a convolution and convolutions are not easy to deal with.
Our goal is to construct an estimate .
2. Hausdorff Distance
We will compare to using the Hausdorff distance. If and are sets, the Hausdorff distance between and is
where
and is a ball of radius centered at .
Let’s decode what this means. Imagine we start to grow the set by placing a ball of size around each point in . We keep growing until it engulfs . Let be the size of needed to do this. Now grow until it engulfs . Let be the size of needed to do this. Then . Here is an example:
There is another way to think of the Hausdorff distance. If is a set and is a point, let be the distance from to :
We call the distance function. Then
Using as our loss function, our goal is to find the minimax risk:
where the infimum is over all estimators and the supremum is over a set of distributions . Actually, we will consider a large set of manifolds and we will associate a distribution with each manifold . This will define the set .
3. Reach
We will consider all -dimensional manifolds with positive reach. The reach is a property of a manifold that appears in many places in manifold estimation theory. I believe it first appeared in Federer (1959).
The reach is is the largest number such that each point in has a unique projection on . A set of non-zero reach has two nice properties. First, it is smooth and second, it is self-avoiding.
Consider a circle in the plane with radius . Every point on the plane has a unique projection on with one exception: the center of the circle is distance from each point on . There is no unique closest point on to . So the reach is .
A plane has infinite reach. A line with corner has 0 reach. Here is a example of a curve in two-dimensions. Suppose you put a line, perpendicular to , at each point on . If the length of the lines is less than reach(M), then the lines won’t cross. If the length of the lines is larger than reach(M), then the lines will cross.
Requiring to have positive reach is a strong condition. There are ways to weaken the condition based on generalization so reach, but we won’t go into those here.
The set of manifolds we are interested in is the set of -manifolds contained in with reach at least . Here, is some arbitrary, fixed positive number.
4. The Results
Recall that that the minimax risk is
In the case of clutter noise it turns out that
Note that the rate depends on but not on the ambient dimension . But in the case of additive Gaussian noise,
The latter result says that approaches 0 very slowly. For all practical purposes, it is not possible to estimate . This might be surprising. But, in fact, the result is not unexpected.
We can explain this by analogy with another problem. Suppose you want to estimate from observations . if satisfies standard smoothness assumptions, then the minimax risk has the form
which is a nice, healthy, polynomial rate. But if we add some Gaussian noise to each observation, the minimax rate of convergence for estimating is logarithmic (Fan 1991). The same thing happens if we add Gaussian noise to the covariates in a regression problem. A little bit of Gaussian noise makes these problems essentially impossible.
Is there a way out of this? Yes. Instead of estimating we estimate a set that is close to . In other words, we have to live with some bias. We can estimate , which we call a surrogate for , with a polynomial rate.
We are finishing a paper on this idea and I’ll write a post about it when we are done.
5. Conclusion
Estimating a manifold is a difficult statistical problem. With additive Gaussian noise, it is nearly impossible. And notice that I assumed that the dimension of the manifold was known. Things are even harder when is not known.
However, none of these problems are insurmountable as long as we chnage the goal from “estimating ” to “estimating an approximation of ” which, in practice, is usually quite reasonable.
6. References
Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257-1272.
Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418-491.
Genovese, C., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Minimax Manifold Estimation. Journal of Machine Learning Research, 13, 1263–1291.
Genovese, C., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Manifold estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics, 49, 941-963.
11 Comments
Just wanted to let you know that I really enjoy your blog.
Thanks!
Agreed! The blog has been mostly digestible by ordinary tech people (at least with one example to go off of). I had an easier time thinking of the stuff above in terms of dilation and erosion (like in computer vision), but since I was thinking of stuff as “growing” had to flip the definition of reach in my head (maximum \epsilon such that points do not have a unique projection).
Thanks!
How does an approach such as given in M. Soltanolkotabi and E. J. Candès. A geometric analysis of subspace clustering with outliers. compare with the approaches here?
If I remember correctly, they are looking for clusters
around linear subspaces.
(But I would need to check)
Larry, closely related two-dimensional problem was considered in the paper “Estimating trajectories” by Y. Chow in Annals of Statistics in 1986…
Thanks for the reference
LW
…just wondering if those slow rates have anything to do with the metric…I mean, Hausdorff is a good guy but it seems Wasserstein is gaining momentum…
–Pierp
Yes I think you are right.
The rate in Wasserstein would probably be faster.
LW
I’m wondering if you can provide a reference for your “errors in variables” comment on adding Gaussian noise to covariates. Thanks.
Fan and Truong (1993)
Annals of Statistics
p 1900-1925
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1176349402
2 Trackbacks
[…] Hunting for Manifolds – Normal Deviate […]
[…] Wasserman writes on Hunting for Manifolds. Given data that are close to some manifold, how do we estimate the underlying […]