WhatNext: A Prediction System for Web Requests using N-gram Sequence Models

  • Zhong Su ,
  • Qiang Yang ,
  • Ye Lu ,
  • Hong-Jiang Zhang

As an increasing number of users access information on the web, there is a great opportunity to learn from the server logs to learn about the users’ probable actions in the future. In this paper, we present an n-gram based model to utilize path profiles of users from very large data sets to predict the users’ future requests. Since this is a prediction system, we cannot measure the recall in a traditional sense. We, therefore, present the notion of applicability to give a measure of the ability to predict the next document. Our model is based on a simple extension of existing point-based models for such predictions, but our results show for n-gram based prediction when n is greater than three, we can increase precision by 20% or more for two realistic web logs. Also we present an efficient method that can compress our model to 30% of its original size so that the model can be loaded in main memory. Our result can potentially be applied to a wide range of applications on the web, including pre-sending, pre-fetching, enhancement of recommendation systems as well as web caching policies. Our tests are based on three realistic web logs. Our algorithm is implemented in a prediction system called WhatNext, which shows a marked improvement in precision and applicability over previous approaches.