Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Data-driven Research at Web Scale

Speaker  Evelyne Viegas, Haym Hirsh, and Serge Sharoff

Affiliation  MSR, Rutgers University, University of Leeds

Host  Judith Bishop

Duration  01:35:28

Date recorded  14 April 2011

This session will bring together a group of leaders in information retrieval and language modeling to discuss the challenges in information retrieval and how language modeling approaches may help address some of these challenges. The focus is on the use of n-gram models to further research in areas such as document representation and content analysis (e.g., clustering, classification, information extraction), query analysis (e.g., query suggestion, query reformulation), retrieval models and ranking, and spelling as well as the access to n-grams as an enabler of experimental design. Previous efforts of delivering n-grams to the research community adopted a data release approach with a cut off on the n-gram counts that obfuscate the long tail effects, an issue this service-based approach makes possible for further studies. Moreover, previous efforts also focused on just the document body; whereas richer types of textual contents are included in the Web N-gram service that can engage researchers in new innovations. The Web N-gram service provides access to petabytes of data via services—up to two orders of magnitude greater than currently available offerings. Finally, by providing regular data refresh, the Web N-gram service can open up new research directions in fields where lack of dynamic data has locked academic researchers into conducting research over static and stale data sets.

©2011 Microsoft Corporation. All rights reserved.
By the same speakers
> Data-driven Research at Web Scale