Tel: +86 (10) 58963188
I am currently a senior researcher in Microsoft Research Asia and research manager of the Web Data Management Group. I received B.S. and M.S. degrees from the School of Information, Renmin University of China. I received my Ph.D. degree in 1999 from the Institute of Computing Technology, the Chinese Academy of Science. My main research interests are Web data management, information retrieval (especially Web IR), data mining and machine learning.
1. Go Beyond Page-level Web Search
Nowadays, major commercial search engines take Web page as the basic information unit and return a list of pages as search result to user. I was obsessed by a question: is page the only or best atomic unit for information search on the Web? Then I spent majority of my time to explore new search technologies, in ways going beyond current page-level paradigm.
- Block-based search: semantic blocks in a page are generated by a VIPS (Vision-based Page Segmentation) algorithm, and labeled with importance values by a block importance model. Then the semantic blocks, along with their importance values, are used to build block-level link analysis and block-based ranking (1, 2) algorithms, and finally improve the relevance of search results. The technology is also used to cluster web image search results.
- Deep Web search: while most search engines focus on crawling, indexing and searching static HTML pages, a vast amount of Web data are hidden deeply inside many Web databases, which are high-quality structured data, but are very difficult to obtain and integrate. How to search the data hidden in Web databases poses a big challenge to today's search technology. We are working on techniques to crawl and integrate data from deep Web.
- Object-level search: structured information about real-world objects embeds in web pages and online databases. We explored a new paradigm to enable web search at the object level. We developed a set of technologies to automatically classify, extract (1, 2), integrate and rank (1, 2) Web objects, and then build powerful object-level search engines for specific domains such as shopping, academy, restaurant, travel, etc. With the object-level search technology, people can get more accurate and neater information in one stop, instead of browsing through a long list of pages. We describe the whole system in a recent CIDR paper.
2. Infrastructure for Web Innovations and New Search Engine Architecture
I'm recently working on a project called WebStudio. The goal is to provide researchers an easy-to-use Web development environment to implement and test their ideas at Web-scale. We are also building a new search engine architecture based on WebStudio. The central idea is to make it very easy for researchers to try their cool ideas. More information comes later...
- Zhicheng Dou, Ruihua Song and Ji-Rong Wen, A Large-scale Evaluation and Analysis of Personalized Search Strategies, The 16th international World Wide Web conference (WWW 2007)
- Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen and Wei-Ying Ma, Web Object Retrieval, The 16th international World Wide Web conference (WWW 2007) [PDF]
- Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma, Object-level Vertical Search, The Third Biennial Conference on Innovative Data Systems Research (CIDR 2007), Asilomar, CA, USA, January 7-10, 2007 [PDF]
- Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang and Hsiao-Wuen Hon, Webpage Understanding: An Integrated Approach, The 13th International Conference on Knowledge Discovery and Data Mining (SIGKDD 2007)
- Shuyi Zheng, Ruihua Song, Di Wu and Ji-Rong Wen, Joint Optimization of Wrapper Generation and Template Detection, The 13th International Conference on Knowledge Discovery and Data Mining (SIGKDD 2007)
- Jun Zhu, Zaiqing Nie, Bo Zhang and Ji-Rong Wen, Dynamic Hierarchical Markov Random Fields and their Application to Web Data Extraction, The 24th International Conference on Machine Learning (ICML 2007)
- Shuyi Zheng, Ruihua Song and Ji-Rong Wen, Template-Independent News Extraction Based on Visual Consistency. The 22nd Conference on Artificial Intelligence (AAAI-07)
- Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang and Wei-Ying Ma, Simultaneous Record Detection and Attribute Labeling in Web Data Extraction, The 12th International Conference on Knowledge Discovery and Data Mining (SIGKDD 2006), August 20 - 23, Philadelphia, USA, 2006 [PDF]
- Chun Yuan, Ni Lao, Ji-Rong Wen, Jiwei Li, Zheng Zhang, Yi-Min Wang and Wei-Ying Ma, Automated Known Problem Diagnosis with Event Traces, Eurosys 2006 [PDF]
- Ping Wu, Ji-Rong Wen, Huan Liu and Wei-Ying Ma. Query Selection Techniques for Efficient Structured Web Source Crawling .The 22nd International Conference on Data Engineering (ICDE 2006), April 3-7, Atlanta, GA, 2006 [PDF]
- Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song and Wei-Ying Ma, Gravitation-Based Model for Information Retrieval, The 28th Annual International ACM SIGIR Conference (SIGIR 2005), August 2005 [PDF]
- Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang and Wei-Ying Ma, 2D Conditional Random Fields for Web Information Extraction, The 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, August, 2005 [PDF]
- Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen and Wei-Ying Ma, Object-Level Ranking: Bringing Order to Web Objects, The 14th international World Wide Web conference (WWW 2005), Chiba, Japan, May 10-14, 2005 [PDF]
- Jiying Wang, Ji-Rong Wen, Fred Lochovsky and Wei-Ying Ma, Instance-based Schema Matching for Web Databases by Domain-specific Query Probing, The 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Ontario, Canada, August 2004 [PDF]
- Ji-Rong Wen, Ni Lao and Wei-Ying Ma, Probabilistic Model for Contextual Retrieval, The 27th Annual International ACM SIGIR Conference (SIGIR 2004), July 2004 [PDF]
- Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma, Block-based Web Search, The 27th Annual International ACM SIGIR Conference (SIGIR 2004), July 2004 [PDF]
- Deng Cai, Xiaofei He, Ji-Rong Wen and Wei-Ying Ma, Block-Level Link Analysis, The 27th Annual International ACM SIGIR Conference (SIGIR 2004), July 2004 [PDF]
- Ruihua Song, Haifeng Liu, Ji-Rong Wen and Wei-Ying Ma, Learning Block Importance Models for Web Pages, Proceeding of the Thirteenth World Wide Web conference (WWW 2004), 203-211, New York, May, 2004 [PDF]
- Cong Li, Ji-Rong Wen and Hang Li, Text Classification Using Stochastic Keyword Generation, The Twentieth International Conference on Machine Learning (ICML 2003), 464-471, Washington, DC USA, August 21-24, 2003 [PDF]
- Shipeng Yu, Deng Cai, Ji-Rong Wen and Wei-Ying Ma, Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation, Proceeding of the Twelfth World Wide Web conference (WWW 2003), 11-18, Budapest, Hungary, May 2003 [PDF]
- Hang Cui, Ji-Rong Wen, Jian-Yun Nie and Wei-Ying Ma, Probabilistic Query Expansion using Query Logs, Proceeding of the Eleventh World Wide Web conference (WWW 2002), 325-332, Honolulu, Hawaii, May, 2002 [PDF]. (A long version at IEEE TKDE [PDF])
- Ji-Rong Wen, Jian-Yun Nie and Hong-Jiang Zhang, Clustering User Queries of a Search Engine, Proceeding of the Tenth World Wide Web conference (WWW10), 162-168, Hong Kong, May, 2001 [PDF] (A long version at ACM TOIS [PDF]