Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Fast Learning from Sparse Data

D.M. Chickering and D. Heckerman


We describe two techniques that significantly improve the running time of several standard machine-learning algorithms when data is sparse. The first technique is an algorithm that efficiently extracts one-way and two-way counts — either real or expected — from discrete data. Extracting such counts is a fundamental step in learning algorithm for constructing a variety of models including decision trees, decision graphs, Bayesian networks, and naive-Bayes clustering models. The second technique is an algorithm that efficiently performs the E-step of the EM algorithm (i.e., inference) when applied to a naive-Bayes clustering model. Using real-world data sets, we demonstrate a dramatic decrease in running time for algorithms that incorporate these techniques.


Publication typeInbook
InstitutionMicrosoft Research
PublisherMorgan Kaufmann
> Publications > Fast Learning from Sparse Data