Finding Top-k Min-Cost Connected Trees in Databases

Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin

2007

It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper, we study processing an l-keyword query, p1, p2, ..., pl, against a relational database which can be modeled as a weighted graph, G(V, E). Here V is a set of nodes (tuples) and E is a set of edges representing foreign key references between tuples. Let Vi ⊆ V be a set of nodes that contain the keyword pi. We study finding top-k minimum cost connected trees that contain at least one node in every subset Vi, and denote our problem as GST-k. When k = 1, it is known as a minimum cost group Steiner tree problem which is NP-Complete. We observe that the number of keywords, l, is small, and propose a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3^{l} n + 2^{l} ((l + log n)n + m)), where n and m are the numbers of nodes and edges in graph G. Our solution can handle graphs with a large number of nodes. Our GST-1 solution can be easily extended to support GST-k, which outperforms the existing GST-k solutions over both weighted undirected/directed graphs. We conducted extensive experimental studies, and report our finding.

Publication type | Inproceedings |

Published in | Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE 2007) |

Pages | 836-845 |

Publisher | IEEE Computer Society |

- Classifying Data Streams with Skewed Class Distributions and Concept Drifts
- Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data
- Stream Mining

> Publications > Finding Top-k Min-Cost Connected Trees in Databases