Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Probase: A Probabilistic Taxonomy for Text Understanding

Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Zhu


Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in human language. Much work has been devoted to creating universal ontologies or taxonomies for this purpose. However, none of the existing ontologies has the needed depth and breadth for “universal understanding”. In this paper, we present a universal, probabilistic taxonomy that is more comprehensive than any existing ones. It contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages. Unlike traditional taxonomies that treat knowledge as black and white, it uses probabilities to model inconsistent, ambiguous and uncertain information it contains. We present details of how the taxonomy is constructed, its probabilistic modeling, and its potential applications in text understanding.


Publication typeInproceedings
Published inACM International Conference on Management of Data (SIGMOD)
> Publications > Probase: A Probabilistic Taxonomy for Text Understanding