Yongxin Tong, Lei Chen, and Bolin Ding
In recent years, many new applications, such as sensor network monitoring and moving object search, show a growing amount of importance of uncertain data management and mining. In this paper, we study the problem of discovering threshold-based frequent closed itemsets over probabilistic data. Frequent itemset mining over probabilistic database has attracted much attention recently. However, existing solutions may lead an exponential number of results due to the downward closure property over probabilistic data. Moreover, it is hard to directly extend the successful experiences from mining exact data to a probabilistic environment due to the inherent uncertainty of data. Thus, in order to obtain a reasonable result set with small size, we study discovering frequent closed itemsets over probabilistic data. We prove that even a sub-problem of this problem, computing the frequent closed probability of an itemset, is #P-Hard. Therefore, we develop an efficient mining algorithm based on depth-first search strategy to obtain all probabilistic frequent closed itemsets. To reduce the search space and avoid redundant computation, we further design several probabilistic pruning and bounding techniques. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments.
In Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012)
Publisher IEEE Computer Society