Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Fast Learning from Sparse Data
Fast Learning from Sparse Data

We describe two techniques that significantly improve the running time of several standard machine-learning algorithms when data is sparse. The first technique is an algorithm that efficiently extracts one-way and two-way counts — either real or expected — from discrete data. Extracting such counts is a fundamental step in learning algorithm for constructing a variety of models including decision trees, decision graphs, Bayesian networks, and naive-Bayes clustering models. The second technique is an algorithm that efficiently performs the E-step of the EM algorithm (i.e., inference) when applied to a naive-Bayes clustering model. Using real-world data sets, we demonstrate a dramatic decrease in running time for algorithms that incorporate these techniques.

tr-2000-15.ps
PostScript file

Publisher: Morgan Kaufmann

Details

Type: Chapter
Pages: 109-115
Number: MSR-TR-2000-15
Institution: Microsoft Research