Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches

  • Shuming Shi ,
  • Huibin Zhang ,
  • Xiaojie Yuan ,
  • Ji-Rong Wen

Proceedings of COLING 2010 |

Main approaches to corpus-based semantic class mining include distributional similarity (DS) and pattern-based (PB). In this paper, we perform an empirical comparison of them, based on a publicly available dataset containing 500 million web pages, using various categories of queries. We further propose a frequency-based rule to select appropriate approaches for different types of terms.