NeedleSeek

What is it?

NeedleSeek is a project in Microsoft Research Asia for open-domain semantic mining and serving. In this project, we

    • Mine open-domain semantic knowledge from web-scale data sources;
    • Answer and serve user requests based on the mined semantic knowledge

Our fundamental goal in this project is to explore how and to what extent a computer system can understand the world as well as the meaning of text to better meet the information needs of users.

In this online research prototype, a web interface is provided for users to

    • Search and browse the semantic knowledge-base we built (via the “Semantic Card” and “Semantic Map” tabs)
    • Input keywords or natural language queries to get semantic “answers” (via the “Answer” tab)

Semantic Card:

On the “semantic card” tab, we show the mapping from the input word or phrase to one or multiple concepts, with each card representing a concept. For example, the word “apple” can be mapped to the company Apple, the fruit apple, and the tree apple. As another example, the phrase “Harry Potter” can represent a movie, a book, a character, a game, etc. In our prototype, only the top-three concepts of a term are shown at the moment. On each card, the following information about the concept is shown: labels, attributes, key sentences, and related concepts.

Semantic Map:

On the “semantic map” tab, concepts are organized to semantic categories (or semantic classes); and the semantic relations between concepts are shown. For example, {apple, orange, banana…} is a semantic class of fruits. We pay special attention to semantic classes because the concepts in one semantic class tend to share similar semantic characteristics (for example, they have similar attributes). Only the top-three semantic categories are shown for (the concepts of) a term in our prototype at the moment.

Answer:

On the “answer” tab, we treat the input keywords as a query/question and return one or multiple groups of concepts (typically entities) which are either related to the query or the answer(s) to the question. For example, for query “beautiful languages”, we return a list of human languages and a list of programming languages.

 

Research prototype (language: English):
    http://needleseek.msra.cn/

Related Groups

    Web Search & Mining group, Microsoft Research Asia

Related Research Papers

1. Ensemble Semantics for Large-scale Unsupervised Relation Extraction.
    Bonan Min, Shuming Shi, Ralph Grishman and Chin-Yew Lin
    EMNLP-CoNLL 2012.

2. Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining.
    Fan Zhang, Shuming Shi, Jing Liu, Shuqi Sun, and Chin-Yew Lin
    ACL'11.

3. Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches.
    By Shuming Shi, Huibin Zhang, Xiaojie Yuan, and Ji-Rong Wen
    In the 23rd International Conference on Computational Linguistics (COLING'10).

4. Comparable Entity Mining from Comparative Questions.
    by Shasha Li, Chin-Yew Lin, Young-In Song and Zhoujun Li
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL'10), 2010.

5. Employing Topic Models for Pattern-based Semantic Class Discovery. [paper][slides]
    By Huibin Zhang, Mingjie Zhu, Shuming Shi, and Ji-Rong Wen
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL'09), Singapore, August 2009.

6. Pattern-based Semantic Class Discovery with Multi-Membership Support.
    By Shuming Shi, Xiaokang Liu, and Ji-Rong Wen
    In ACM 17th Conference on Information and Knowledge Management (CIKM'08). Napa Valley, California, USA, 2008 (Poster) 

 

People
Ji-Rong Wen
Ji-Rong Wen