Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Classifying Data Streams with Skewed Class Distributions and Concept Drifts

Jing Gao, Bolin Ding, Wei Fan, Jiawei Han, and Philip S. Yu

Abstract

Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.

Details

Publication typeArticle
Published inIEEE Internet Computing
URLhttp://www.computer.org/portal/web/csdl/doi/10.1109/MIC.2008.119
Pages37-49
Volume12
Number6
PublisherIEEE Computer Society
> Publications > Classifying Data Streams with Skewed Class Distributions and Concept Drifts