Mode-detection via median-shift

Median-shift is a mode seeking algorithm that relies on computing the median of local

neighborhoods, instead of the mean. We further combine median-shift with Locality

Sensitive Hashing (LSH) and show that the combined algorithm is

suitable for clustering large scale, high dimensional data sets. In particular,

we propose a new mode detection step that greatly accelerates performance. In

the past, LSH was used in conjunction with mean shift only to accelerate nearest

neighbor queries. Here we show that we can analyze the density of the LSH bins

to quickly detect potential mode candidates and use only them to initialize the

median-shift procedure. We use the median, instead of the mean (or its discrete

counterpart - the medoid) because the median is more robust and because the

median of a set is a point in the set. A median is well defined for scalars but

there is no single agreed upon extension of the median to high dimensional

data. We adopt a particular extension, known as the Tukey median, and show that

it can be computed efficiently using random projections of the high dimensional

data onto $1D$ lines, just like LSH, leading to a tightly integrated and

efficient algorithm.

In  Computer Vision, 2009 IEEE 12th International Conference on

Details

TypeInproceedings
URL10.1109/ICCV.2009.5459423
Pages1909 -1916
> Publications > Mode-detection via median-shift