Federated search (federated information retrieval or distributed information
retrieval) is a technique for searching multiple text collections
simultaneously. Queries are submitted to a subset of collections that
are most likely to return relevant answers. The results returned by
selected collections are integrated and merged into a single list. Federated
search is preferred over centralized search alternatives in many
environments. For example, commercial search engines such as Google
cannot easily index uncrawlable hidden web collections while federated
search systems can search the contents of hidden web collections
without crawling. In enterprise environments, where each organization
maintains an independent search engine, federated search techniques
can provide parallel search over multiple collections.
There are three major challenges in federated search. For each query,
a subset of collections that are most likely to return relevant documents
are selected. This creates the collection selection problem. To
be able to select suitable collections, federated search systems need to
acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user.
This final step is the result merging problem.
The goal of this work, is to provide a comprehensive summary of the
previous research on the federated search challenges described above.
M. Shokouhi and L. Si