MSRA-CFW: Data Set of Celebrity Faces on the Web

MSRA-CFW is a data set of celebrity face images collected from the web. Starting from any face image, we obtain its near-duplicate images and associated surrounding texts. Then we detect the dominant people names by matching with a large list of celebrity names from public websites such as Wikipedia. A classifier is applied to further identify the celebrities appearing in the web images. The final dataset contains 202792 faces of 1583 people.
Lei Zhang (张磊)

Senior Researcher

Microsoft Research
One Microsoft Way
Redmond, WA 98052
United States

Email: leizhang AT