Zhicheng Dou is a researcher in Web Search and Data Management Group, Microsoft Research Asia. He joined Microsoft in July 2008. He received my Ph.D. and B.S. degrees in computer science and technology from the Nankai University of China in 2008 and 2003, respectively. His research interests include several topics in Web Search and Data Mining fields, including personalized web search, anchor text and click-through data mining, query understanding, and search result diversification.  He is recently interested in the temporal Web, and is now working on extraction and management of the time-series data from the Web.


Besides research, Zhicheng Dou is also a good developer. He enjoys implementing cool ideas into real systems.


  • Intent and Diversity (INDI)
    By submitting one query, users may have different intents. For an ambiguous query, users may seek for different interpretations. For a faceted topic, users may be interested in different subtopics. In this project, we investigate how many queries are ambiguous in real search logs; we propose methods to diversify search results; we experiment with new metrics to measure diversity; we also organize NTCIR INTENT and IMINE tasks to provide common data for IR community.
  • Project Q
    Search has a long document-centric tradition, where “searching information” is equivalent to “searching document”. Project Q is our recent effort to explore a new query-centric search paradigm, which treats query as object and shifts search from “searching document” to “searching query”. We have developed several effective query mining technologies and proved that, when deeply mining queries “without” time constraint, we can greatly improve search relevance and user experiences.
  • Web Page Analysis (WEPA)
    A Web page is not atom but rich in structure. In this project, we take advantage of HTML DOM structure and associated visual features, such as font size, width and height of a DOM element, to understand the purpose of authors in creating a page. We model importance of blocks in the page; we extract structured data from pages across websites; we learn templates from a set of mixed pages from a website; we also identify article title, body and images from pages to improve reading experience.
  • WebSensor (InformationSensor)
    With the rapid growth of the web, there are grand challenges when making sense of web data: big volume, high velocity, high variety, and unknown veracity. In the physical world, a sensor is a converter that measures a physical quantity and converts it into a signal that can be read by an observer or by an instrument—today, mostly electronic. This project creates a virtual, WebSensor layer atop the web.
  • WebStudio
    WebStudio is an end-to-end experimental search system for facilitating search experiments on specific web data collections. In WebStudio, some default components are implemented. Users can customize major operations (including document parsing, page classification, index building, index serving, and front-end processing) in the E2E search engine, by adding their own experimental logic for testing ideas.

Professional Activities

  • PC, SIGIR 2013, CIKM 2013, IEEE BIG Data 2013, OAIR 2013, WWW 2013, SDM 2013, KDD 2012, WIDM 2009
  • Organizer, NTCIR10 Intent2 task
  • Reviewer, TKDE, KAIS, KDD'08, WWW'07, KDD'07, APWeb'07, ICDM'06


