DF-ITF, Topic Search

Overview | ProBase | Snapshots | DF-ITF | Evaluation

When dealing with query "... except florida", we know "florida" is an entity in Probase and treat "except florida" as if it is a concept. And this concept is the most representative concept in which "florida" is an entity but with ``florida'' removed from it. To find the most representative concept to an entity, we use a DF-ITF score which is a form of inverse function of the well-known TF-IDF score.

For a given entity e and a concept c that contains e, the DFITF is the product of a "document frequency" and an "inverse term frequency":

where tf (c, e) is the number of occurrences of e as a c in Probase, Ce is the set of all concepts that contains e, E is the set of all entities in Probase, and |c| is the number of unique entities in concept c.