Inferring the Demographics of Search Users

Bin Bi, Milad Shokouhi, Michal Kosinski, and Thore Graepel

Abstract

Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such

signals however are often only available for a small fraction of search users, namely those who log in with their social network account and allow its use for personalization of search

results. In this paper, we offer a solution to this problem by showing how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. This is accomplished in two steps; we fifirst train predictive models based on the publically available myPersonality dataset containing users' Facebook Likes and their demographic information. We then match Facebook Likes

with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. We fifind that the accuracy of classifying age and gender, expressed by the area under the

ROC curve (AUC), are 77% and 84% respectively for predictions based on Facebook Likes, and only degrade to 74% and 80% when based on search queries. On a US state-by-state

basis we fifind a Pearson correlation of 0:72 for political views between the predicted scores and Gallup data, and 0:54 for affiliation with Judaism between predicted scores and data from the US Religious Landscape Survey. We conclude that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data

and believe that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.

Details

Publication typeProceedings
Published in22nd International World Wide Web Conference
PublisherACM
> Publications > Inferring the Demographics of Search Users