Zhongyuan Wang (王仲远)
Email: zhy.wang @ microsoft . com (without any space in the email address)
zhywangchina @ 163 . com, wzhy AT outlook . com
Tel: +86-138-109-72076, +86-10-59174328
Zhongyuan Wang is a Researcher at Microsoft Research Asia (MSRA) and a PhD candidate at Renmin University of China (his PhD advisors are Haixun Wang and Ji-Rong Wen). He received his master’s degree (advisor was Xiaofeng Meng) and bachelor's degree in computer science at Renmin University in 2010 and 2007 respectively. Zhongyuan Wang won Wu Yuzhang Scholarship (Top-level Scholarship at Renmin University), Kwang-Hua Scholarship, and ACM SIGMOD07 Undergraduate Scholarship (one of the seven winners all over the world) in the university. After he graduated from RUC, he joined MSRA as a Research Software Development Engineer. Until now, Zhongyuan Wang has published several papers in the leading international conferences, such as VLDB, ICDE, etc. He is also the translator of the book “Windows Phone 7 Programming for Android and iOS Developers”, published in 2012. His research interests include knowledge base, web data mining, online advertising, machine learning and natural language processing.
Currently, Zhongyuan Wang takes charge of Probase project. He focuses on acquiring web tables, attributes, knowledge facts from more than 7 billion web documents in MS Cloud platform, addressing entities disambiguation/attributes synonyms in Probase, understanding web documents by reasoning over uncertain data, and building cool applications (such as short text understanding, ads matching, and query recommendation) upon on the knowledge base.
Probase: a knowledgebase that knows our mental world
My personal blog: 仲子说
- I lead two important projects in MSRA: Probase and Enterprise Dictionary, which were reviewed by Bill Gates, Harry Shum, Peter Lee, etc. The demo of Enterprise Dictionary was a candidate for MGX 2014.
- I publish 10+ papers in top international conferences
- I have 2 US patents, and 1 Chinese patent
- I'm the co-author/translator of 2 books: “Windows Phone 7 Programming for Android and iOS Developers”, and “Web Data Management: Concepts and Techniques”
- I won Wu Yuzhang Scholarship (Top-level Scholarship at Renmin University), and ACM SIGMOD07 Undergraduate Scholarship (one of the seven winners all over the world)
- Short Text Understanding / Conceptualization
The goal of this project is to provide better text understanding.
A large variety of applications need to handle short texts such as search queries, ads keywords, tweets, image captions, etc. Understanding short texts is a big challenge for machines. Unlike long texts and documents, for which we can use “bag of words” based statistical approaches to analyze, short texts do not contain enough information or statistical signals to make the analysis meaningful. Furthermore, short texts are usually not well-formed sentences. For example, queries submitted to search engines usually do not follow grammar rules. Consequently, approaches based on sentence structure analysis do not work well either. Human beings are good at deriving meaning from noisy, ambiguous, and sparse input. We understand short texts because knowledge in our mind enriches the input to produce meaning. Thus, in order for machines to understand short texts, we need to supply such knowledge to machines so that the gap between insufficient input and understanding can be bridged.
We have been continuously improving our conceptualization mechanism, which is at the core of our short text understanding services. We leverage the co-occurrence network to enhance sense disambiguation. We also generate the mappings between auxiliary words and concept clusters. These can help sense disambiguation using context auxiliary words.
- Knowledgebase, Graph
- Database, Data Mining
- Machine Learning
- Web Search and Mining
- Natural Language Processing
- Short text conceptualization and its applications (CCF ADL 32 - "Natural Language Processing and Machine Learning")
- Program Committees, WAIM 2013
- Program Committees, CIKM 2012
- Program Committees, WAIM 2011
Tech Transfers to Products:
- Bing Ads System
–Added semantic features based on semantic similarity between queries and ads keywords
–Shipped to Bing ads system, Oct. 2012
- Query Recommendation on MSN US
–Using article titles of each channel to train a classifier based on conceptualization techniques
–Compared with the previous QAS-based approach, our model made CTR increase by 36.8% and 80.0% in US Movie and US Music channels separately
- Related Topics for Bing Image Search
–Using is-a data to improve related topics in Bing image search
–Constructing and weighting an entity linkage graph to improve the related topics
–Shipped to Bing Image Search in June, 2013, and got ~200% gains on the total query share
- Microsoft Power Query for Excel
–Microsoft Power Query is an Excel add-in that enhances the self-service Business Intelligence experience in Excel by simplifying data discovery and access. Power Query enables users to easily discover, combine, and refine data for better analysis in Excel. Power Query includes a public search feature that is currently intended for use in the United States only.
- 2009 Wu Yuzhang Scholarship(Top-level Scholarship of Renmin University of China. Top 10/22000)
- 2008/2009 Kwang-Hua Scholarship(Twice)
- 2008 HP Distinguished Chinese Student Scholarship
- 2007 Excellent Graduate Student Award of Renmin University
- 2007 ACM SIGMOD07 Undergraduate Scholarship (one of the seven winners all over the world)
- 2006 China Computer World Scholarship
- 2005~2006 The Outstanding Students Scholarship
- 2005 First Prize in Beijing Contest District in China Undergraduate Mathematical Contest in Modeling (CUMCM2005)
- 2005~2006 First-Class Scholarship
- 2003~2004 Fan Zhi’an Scholarship
- 2003~2004 Excellent League Member of RUC
- Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou, Short Text Understanding Through Lexical-Semantic Analysis, in International Conference on Data Engineering (ICDE), April 2015.
- Fang Wang, Zhongyuan Wang, Senzhang Wang, and Zhoujun Li, Exploiting Description Knowledge for Keyphrase Extraction, in PRICAI, December 2014.
- Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen, Concept-based Short Text Classification and Ranking, in ACM International Conference on Information and Knowledge Management (CIKM), ACM – Association for Computing Machinery, October 2014.
- Zhongyuan Wang, Haixun Wang, and Zhirui Hu, Head, Modifier, and Constraint Detection in Short Texts, in International Conference on Data Engineering (ICDE), 2014.
- Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang, A Distributed Graph Engine for Web Scale RDF Data, in PVLDB, August 2013.
- Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung-won Hwang, Attribute Extraction and Scoring: A Probabilistic Approach, in International Conference on Data Engineering (ICDE), , 2013.
- Peipei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, and Xindong Wu, Computing Term Similarity by Large Probabilistic isA Knowledge, in ACM International Conference on Information and Knowledge Management (CIKM), 2013.
- Jingjing Wang, Haixun Wang, Zhongyuan Wang, and Kenny Zhu, Understanding Tables on the Web, in International Conference on Conceptual Modeling, October 2012.
- Bolin Ding, Haixun Wang, Ruomin Jin, Jiawei Han, and Zhongyuan Wang, Optimizing Index for Taxonomy Keyword Search, in ACM International Conference on Management of Data (SIGMOD), May 2012.
- Masumi Shirakawa, Haixun Wang, Yangqiu Song, Zhongyuan wang, Kotaro Nakayama, and Takahiro Hara, Entity Disambiguation based on a Probabilistic Taxonomy, no. MSR-TR-2011-125, November 2011.
- Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung-won Hwang, Web Scale Taxonomy Cleansing, in VLDB, September 2011.
- Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen, Short Text Conceptualization using a Probabilistic Knowledgebase, in IJCAI, 2011.