In my talk, I'm going to discuss the approach to search results diversification currently used at Yandex, which is named Spectrum. Large number of queries sent to Yandex are highly ambiguous and mention a specific entity or a class of entities. A query might refer to several objects of the same name (like [apple] might mean either a fruit or a consumer electronics company). More importantly, a query might represent an underlying intent from a large spectrum: e.g. someone searching for [pizza] might want either a restaurant offering delivery service, or a recipe, or even images of pizza.
Spectrum is based on analyzing click-through statistics. The system first identifies objects in queries. Each object is then classified into one or more categories, e.g. "cities", "humans", "cars", "medicines" etc. Based on the object's category, our mined knowledge about typical information needs related to the object and relevant pages available on the Web, Spectrum determines the share of users looking for this object in relation to each of the potential intents. The search engine then uses this information to rank its results for ambiguous queries using the probabilistic model of SERP perception. Target ranking is exactly the one that maximizes the user's chance to find a relevant answer.
Andrey Plakhov is a Senior Search Engineer at Yandex. Among other things, he is responsible for diversification of web search results. He has a PhD in computer science from Keldysh Institute of Applied Mathematics. His interests vary from model-based query understanding to applying natural language acquisition theories to web search.