Microsoft Web N-gram Services
Access petabytes of data via the Public Beta Web N-gram Services.
We invite the whole community to use the Web N-gram services, made available via a cloud-based platform, to drive discovery and innovation in web search, natural language processing, speech, and related areas by conducting research on real-world web-scale data, taking advantage of regular data updates for projects that benefit from dynamic data.
The Web N-gram services provide you access to:
- Content types: Document Body, Document Title, Anchor Texts
- Model types: Smoothed models
- N-gram availability: unigram, bigram, trigram, N-gram with N=4, 5
- Training size (Body): All documents indexed by Bing in the en-us market
- Access: Hosted Services by Microsoft
- Updates: Periodical updates
Late last year, we introduced a private beta testing of the Web N-gram Services. We are now expanding access in the Public Beta Web N-gram Services to include professors, students, and researchers from around the world.
Web N-gram is brought to you by Microsoft Research in partnership with Microsoft Bing.Events
- Spelling Alteration for Web Search Workshop
Bellevue, WA, U.S. ·19 July 2011 - Web N-gram Workshop
Geneva, Switzerland ·23 July 2010
Papers
- SIGIR 2011
- Exploring Web Scale Language Models for Search Query Processing, WWW 2010
- An Overview of Microsoft Web N-gram Corpus and Applications, NAACL-HLT 2010
