Finding Top-k Min-Cost Connected Trees in Databases

Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin

2007

It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper, we study processing an l-keyword query, p1, p2, ..., pl, against a relational database which can be modeled as a weighted graph, G(V, E). Here V is a set of nodes (tuples) and E is a set of edges representing foreign key references between tuples. Let Vi ⊆ V be a set of nodes that contain the keyword pi. We study finding top-k minimum cost connected trees that contain at least one node in every subset Vi, and denote our problem as GST-k. When k = 1, it is known as a minimum cost group Steiner tree problem which is NP-Complete. We observe that the number of keywords, l, is small, and propose a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3^{l} n + 2^{l} ((l + log n)n + m)), where n and m are the numbers of nodes and edges in graph G. Our solution can handle graphs with a large number of nodes. Our GST-1 solution can be easily extended to support GST-k, which outperforms the existing GST-k solutions over both weighted undirected/directed graphs. We conducted extensive experimental studies, and report our finding.

Publication type | Inproceedings |

Published in | Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE 2007) |

Pages | 836-845 |

Publisher | IEEE Computer Society |

- Cost-Based Query Optimization for Multi Reachability Joins
- Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
- Text Cube: Computing IR Measures for Multidimensional Text Database Analysis

> Publications > Finding Top-k Min-Cost Connected Trees in Databases