Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Building a Web Thesaurus from Web Link Structure
Building a Web Thesaurus from Web Link Structure

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, question answering, etc. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. It can be considered a live thesaurus for various concepts and knowledge on the Web, an important component toward the Semantic Web. First, a set of high quality and representative websites of a specific domain is selected. After filtering navigational links, a link analysis technique is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. Furthermore, experiments on automatic query expansion based on the thesaurus show 20% improvement in search precision compared to the baseline.

tr-2003-10.doc
Word document

Details

Type: TechReport
Number: MSR-TR-2003-10
Pages: 11
Institution: Microsoft Research