“C. Hennig and T. F. Liao How to find an appropriate clustering for mixed type variables with application to socioeconomic stratification. Journal of the Royal Statistical Science, Series C (Applied Statistics) 62, 309-369 (2013), with discussion”

we observed that maximising certain cluster validation indexes (average silhouette width, Calinski/Harabasz; the phenomenon may apply to others) as recommeded in the literature led to an estimate of the number of clusters for which the clustering is not significantly better than random data simulated from a null model for “no clustering but some other realistic structure”. However, other supposedly non-optimal numbers of clusters led to significantly better clusterings.

Comparing the validation index values of clusterings to what happens under such null models is generally helpful for cluster validation.

Generally it may be worthwhile when interested in some kind of structure to compare with a null model that incorporates other kinds of structure that one could legitimately expect in such data, instead testing against an unrealistically simplistic null model such as iid normal.

Obviously, though, adding randomness here is only required because I am (“we all are?”) too stupid to compute such things from sufficiently flexible null models theoretically. (But this applies to some other examples listed here, too.) ]]>

And Meinshausen’s stability subsampling method for model selection. ]]>

I’m not sure what you are asking Phil.

Randomization turns a non-identifiable parameter

into a identifiable parameter.

But I said this in the post so perhaps I am not understanding your question.

Yes I was able to sneak that by

]]>Btw, I loved your biosketch in amstat news: “In his spare time, he enjoys mountain climbing, parachuting, and big game hunting.”

]]>