Hunting For Manifolds
Larry Wasserman
In this post I’ll describe one aspect of the problem of estimating a manifold, or manifold learning, as it is called.
Suppose we have data and suppose that the data lie on, or close to, a manifold
of dimension
. There are two different goals we might have in mind.
The first goal is to use the fact that the data lie near the manifold as a way of doing dimension reduction. The popular methods for solving this problem include isomap, local linear embedding, diffusion maps, and Laplacian eigenmaps among others.
The second goal is to actually estimate the manifold .
I have had the good fortune to work with two groups of people on this problem. With the first group (Chris Genovese, Marco Perone-Pacifico, Isa Verdinelli and me) we have focused mostly on certain geometric aspects of the problem. With the second group (Sivaraman Balakrishnan, Don Sheehy, Aarti Singh, Alessandro Rinaldo and me) we have focused on topology.
It is this geometric problem (my work with the first group) that I will discuss here. In particular, I will focus on these two papers: here and here.
1. Statistical Model
To make some progress, we introduce a statistical model for the data. Here are two models.
Model I (Clutter Noise): With probability we draw
from a distribution
supported on
. With probability
we draw
from a uniform distribution on some compact set
. To be concrete, let’s take
. The distribution
of
is
where is uniform on
.
Model II (Additive Noise): We draw from a distribution
supported on
. Then we let
where
is a
-dimensional Gaussian distribution. In this case,
has density
given by
where is a Gaussian. We can also write this as
where denotes convolution.
There are some interesting complications. First, note that is a singular distribution. That is, there are sets with positive probability but 0 Lebesgue measure. For example, suppose that
is a circle in
and that
is uniform on the circle. Then the circle has Lebesgue measure 0 but
. Model II has the complication that it is a convolution and convolutions are not easy to deal with.
Our goal is to construct an estimate .
2. Hausdorff Distance
We will compare to
using the Hausdorff distance. If
and
are sets, the Hausdorff distance between
and
is
where
and is a ball of radius
centered at
.
Let’s decode what this means. Imagine we start to grow the set by placing a ball of size
around each point in
. We keep growing
until it engulfs
. Let
be the size of
needed to do this. Now grow
until it engulfs
. Let
be the size of
needed to do this. Then
. Here is an example:
There is another way to think of the Hausdorff distance. If is a set and
is a point, let
be the distance from
to
:
We call the distance function. Then
Using as our loss function, our goal is to find the minimax risk:
where the infimum is over all estimators and the supremum is over a set of distributions
. Actually, we will consider a large set of manifolds
and we will associate a distribution
with each manifold
. This will define the set
.
3. Reach
We will consider all -dimensional manifolds
with positive reach. The reach is a property of a manifold that appears in many places in manifold estimation theory. I believe it first appeared in Federer (1959).
The reach is is the largest number
such that each point in
has a unique projection on
. A set of non-zero reach has two nice properties. First, it is smooth and second, it is self-avoiding.
Consider a circle in the plane with radius
. Every point on the plane has a unique projection on
with one exception: the center
of the circle is distance
from each point on
. There is no unique closest point on
to
. So the reach is
.
A plane has infinite reach. A line with corner has 0 reach. Here is a example of a curve in two-dimensions. Suppose you put a line, perpendicular to
, at each point on
. If the length of the lines is less than reach(M), then the lines won’t cross. If the length of the lines is larger than reach(M), then the lines will cross.
Requiring to have positive reach is a strong condition. There are ways to weaken the condition based on generalization so reach, but we won’t go into those here.
The set of manifolds we are interested in is the set of -manifolds contained in
with reach at least
. Here,
is some arbitrary, fixed positive number.
4. The Results
Recall that that the minimax risk is
In the case of clutter noise it turns out that
Note that the rate depends on but not on the ambient dimension
. But in the case of additive Gaussian noise,
The latter result says that approaches 0 very slowly. For all practical purposes, it is not possible to estimate
. This might be surprising. But, in fact, the result is not unexpected.
We can explain this by analogy with another problem. Suppose you want to estimate from
observations
. if
satisfies standard smoothness assumptions, then the minimax risk has the form
which is a nice, healthy, polynomial rate. But if we add some Gaussian noise to each observation, the minimax rate of convergence for estimating is logarithmic (Fan 1991). The same thing happens if we add Gaussian noise to the covariates in a regression problem. A little bit of Gaussian noise makes these problems essentially impossible.
Is there a way out of this? Yes. Instead of estimating we estimate a set
that is close to
. In other words, we have to live with some bias. We can estimate
, which we call a surrogate for
, with a polynomial rate.
We are finishing a paper on this idea and I’ll write a post about it when we are done.
5. Conclusion
Estimating a manifold is a difficult statistical problem. With additive Gaussian noise, it is nearly impossible. And notice that I assumed that the dimension of the manifold was known. Things are even harder when
is not known.
However, none of these problems are insurmountable as long as we chnage the goal from “estimating ” to “estimating an approximation of
” which, in practice, is usually quite reasonable.
6. References
Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257-1272.
Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418-491.
Genovese, C., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Minimax Manifold Estimation. Journal of Machine Learning Research, 13, 1263–1291.
Genovese, C., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Manifold estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics, 49, 941-963.
11 Comments
Just wanted to let you know that I really enjoy your blog.
Thanks!
Agreed! The blog has been mostly digestible by ordinary tech people (at least with one example to go off of). I had an easier time thinking of the stuff above in terms of dilation and erosion (like in computer vision), but since I was thinking of stuff as “growing” had to flip the definition of reach in my head (maximum \epsilon such that points do not have a unique projection).
Thanks!
How does an approach such as given in M. Soltanolkotabi and E. J. Candès. A geometric analysis of subspace clustering with outliers. compare with the approaches here?
If I remember correctly, they are looking for clusters
around linear subspaces.
(But I would need to check)
Larry, closely related two-dimensional problem was considered in the paper “Estimating trajectories” by Y. Chow in Annals of Statistics in 1986…
Thanks for the reference
LW
…just wondering if those slow rates have anything to do with the metric…I mean, Hausdorff is a good guy but it seems Wasserstein is gaining momentum…
–Pierp
Yes I think you are right.
The rate in Wasserstein would probably be faster.
LW
I’m wondering if you can provide a reference for your “errors in variables” comment on adding Gaussian noise to covariates. Thanks.
Fan and Truong (1993)
Annals of Statistics
p 1900-1925
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1176349402
2 Trackbacks
[…] Hunting for Manifolds – Normal Deviate […]
[…] Wasserman writes on Hunting for Manifolds. Given data that are close to some manifold, how do we estimate the underlying […]