Connections between mining frequent itemsets and learning generative models

Frequent itemsets mining is a popular framework for pattern discovery. In this framework, given a database of customer transactions, the task is to unearth all pat- terns in the form of sets of items appearing in a sizable number of transactions. We present a class of models called Itemset Generating Models (or IGMs) that can be used to formally connect the process of frequent item- sets discovery with the learning of generative models. IGMs are specified using simple probability mass func- tions (over the space of transactions), peaked at spe- cific sets of items and uniform everywhere else. Under such a connection, it is possible to rigorously associate higher frequency patterns with generative models that have greater data likelihoods. This enables a generative model-learning interpretation of frequent itemsets min- ing. More importantly, it facilitates a statistical sig- nificance test which prescribes the minimum frequency needed for a pattern to be considered interesting. We illustrate the effectiveness of our analysis through ex- periments on standard benchmark data sets.

fitemsets-techreport.pdf
PDF file

Details

TypeTechReport
NumberMSR-TR-2007-100
Pages10
InstitutionMicrosoft Research
> Publications > Connections between mining frequent itemsets and learning generative models