Tasks and Applications

Knowledge Layer Building

Data sources

  • NeedleSeek
  • Probase
  • Satori
  • Freebase
  • Raw web pages (from RetroIndex)
  • Wikipedia
  • ...

Classification of knowledge extraction tasks

  • By knowledge type: entities, types, semantic classes, peer similarity, synonymy, antonymy, acronyms, is-a, part-of, attribute names, attribute values, general relations, events...
  • By data source: plain sentences, web page contents, page URLs, anchor text, query set, click through, dictionaries, databases...
  • By source scope:
    1. for web pages: all pages, pages in a subset of domains (e.g. Wikipedia pages)...
    2. for query set: all queries, queries containing "list of"...
  • By techinuqe element: lexical pattern matching, tag pattern matching, co-occurrence calculation, wrapper induction, template generation, context analysis, distributional similarity, graph analysis, bootstrapping, classification, clustering...

Knowledge extraction task items

  • IsA extraction from web pages with lexical patterns and produced knowledge iteratively
  • Extraction attributes from Wikipedia and other data resources

Target knowledge layers

  • Type system
  • Attribute system
  • Relatedness system


Common Utility Modules