Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Understanding Tables on the Web

Jingjing Wang, Bin Shao, Haixun Wang, and Kenny Zhu

Abstract

This paper presents a framework that attempts to harvest useful knowledge from the rich corpus of relational data on the Web: HTML tables. Through a multi-phase algorithm, and with the help of a universal probabilistic taxonomy called Probase, the framework is capable to understanding the entitles, attributes and values in many tables on the Web. With this knowledge, we built two interesting applications: a semantic table search engine which returns relevant tables from keyword queries, and a tool to further expand and enrich Probase. Our experiments indicate generally high performance in both table search results and taxonomy expansion. This showed that the proposed framework practically benefits knowledge discovery and semantic search.

Details

Publication typeTechReport
NumberMSR-TR-2011-29
PublisherMicrosoft Technical Report
> Publications > Understanding Tables on the Web