Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Connections between mining frequent itemsets and learning generative models
Connections between mining frequent itemsets and learning generative models

Frequent itemsets mining is a popular framework for pattern discovery. In this framework, given a database of customer transactions, the task is to unearth all pat- terns in the form of sets of items appearing in a sizable number of transactions. We present a class of models called Itemset Generating Models (or IGMs) that can be used to formally connect the process of frequent item- sets discovery with the learning of generative models. IGMs are specified using simple probability mass func- tions (over the space of transactions), peaked at spe- cific sets of items and uniform everywhere else. Under such a connection, it is possible to rigorously associate higher frequency patterns with generative models that have greater data likelihoods. This enables a generative model-learning interpretation of frequent itemsets min- ing. More importantly, it facilitates a statistical sig- nificance test which prescribes the minimum frequency needed for a pattern to be considered interesting. We illustrate the effectiveness of our analysis through ex- periments on standard benchmark data sets.

fitemsets-techreport.pdf
PDF file

Details

Type: TechReport
Number: MSR-TR-2007-100
Pages: 10
Institution: Microsoft Research