Simple practical methods for estimating distances in large and sparse datasets like the web

Speaker  Ping Li

Affiliation  Stanford

Host  Chris Burges

Duration  01:08:37

Date recorded  14 December 2006

Random projections are a popular method for estimating distances. Instead of computing L2 distances over a big data matrix, we multiply the big matrix with a very sparse random matrix (over [-1 0 1]) to produce a smaller matrix. Distances in the smaller matrix are close to distances in the big matrix, but the smaller matrix is easier to work with. We then generalize very sparse random projections by introducing a novel data structure for estimating L1 and Lp distances as opposed to L2 distances only. Finally, we end with a new method, conditional random sampling. The new method estimates a variety of distance functions (L1, L2, Lp, Chi-square and more) from a single data structure. A single data structure is more convenient than different data structures for different distance functions.

©2006 Microsoft Corporation. All rights reserved.
> Simple practical methods for estimating distances in large and sparse datasets like the web