- 2013.10.24: Yu Zheng gave a keynote speech at the 8th International Conference on Intelligent Systems and Knowledge Engineering (ISKE2013). New!
- 2013.10.10: Yu Zheng was invited to a panel on the Technology for connected cities, EmTech'13.
- 2013.10.9: Dr. Zheng gave a talk at EmTech 2013, “When Air Quality Meets Big Data”.
- 2013.10.8: Dr. Zheng gave an invited talk at MIT, CEES (Slides).
- Yu Zheng won the TR35 Award (Top Innovators under 35) for his research on Urban Computing.
- Dr. Zheng is organizing the 2nd international conference on Urban Computing (UrbComp 2013).
- Dr. Zheng gave an keynote speech at the World Geospatial Developers Conferences (WGDC 2013).
- 2012.11.29. Dr. Yu Zheng gave an invited lecture on urban computing in Beijing urban planning institute. (Slides)
- 2012.9.17. Dr. Yu Zheng gave an invited lecture on "Urban Computing with City Dynamics" in Cornell University. (Slides)
- 2012.9.11. Dr. Yu Zheng gave an invited lecture on "Urban Computing with City Dynamics" in Carnegie Mellon University (CMU). (Slides)
- 2012.4.10: Dr. Yu Zheng gave an invited talk about Urban Computing in MIT Media Lab. (see more)
Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human, to tackle the major issues that cities face, e.g. air pollution, increased energy consumption and traffic congestion. Urban computing connects unobtrusive and ubiquitous sensing technologies, advanced data management and analytics models, and novel visualization methods, to create win-win-win solutions that improve urban environment, human life quality, and city operation systems. Urban computing also helps us understand the nature of urban phenomena and even predict the future of cities.
Urban computing is also a research project in Microsoft Research Asia, led by Dr. Yu Zheng since March 2009. By analyzing the big data generated in urban spaces, a series of urban computing applications have been enabled as follows.
1. Infer fine-grained air quality throughout a city
2. Discovery regions of different functions
3. Large-Scale dynamice taxi ridesharing
4. Real-Time sensing urban energy consumption
5. Finding smart driving directions for end users
6. Glean the underlying problems in road networks
7. A passenger-cabbie recommender system
8. Detecting anomalous events in a city
9. Constructing popular routes from check-ins
Infer Fine-Grained Air Quality in a City Using Big Data
Goal: We infer the real-time and fine-grained air quality information throughout a city, based on the (historical and real-time) air quality data reported by existing monitor stations and a variety of data sources we observed in the city, such as meteorology, traffic flow, human mobility, structure of road networks, and point of interests (POIs).
Motivation: Information about urban air quality, e.g., the concentration of PM2.5, is of great importance to protect human health and control air pollution. While there are limited air-quality-monitor-stations in a city, air quality varies in urban spaces non-linearly and depends on multiple factors, such as meteorology, traffic volume, and land uses. We do not know the air quality of a location without a station; we do not know where we should build new stations either.
Methodology: We propose a semi-supervised learning approach based on a co-training framework that consists of two separated classifiers. One is a spatial classifier based on an artificial neural network (ANN), which takes spatially-related features (e.g., the density of POIs and length of highways) as input to model the spatial correlation between air qualities of different locations. The other is a temporal classifier based on a linear-chain conditional random field (CRF), involving temporally-related features (e.g., traffic and meteorology) to model the temporal dependency of air quality in a location.
 Yu Zheng, Furui Liu, Hsun-Ping Hsie. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013).
 Analyzing newly available data about the intricacies of urban life could make cities better.” MIT Technology Review. 2013.8.21
 Interviewed by IFeng.com. Big data can predict air quality. 2013.11.29 (In Chineses)
Discovering Region of Different Functions in a City Using Human Mobility and POIs
Goal: We propose a framework (titled DRoF) that Discovers Regions of different Functions, such as educational areas and business districts, in a city using both human mobility among regions and points of interests (POIs) located in a region. The results generated by our framework can benefit a variety of applications, including urban planning, location choosing for a business, and social recommendations.
Insight: We segment a city into disjointed regions according to major roads, such as highways and urban express ways. We infer the functions of each region using a topic-based inference model, which regards a region as a document, a function as a topic, categories of POIs (e.g., restaurants and shopping malls) as metadata (like authors, affiliations, and key words), and human mobility patterns (when people reach/leave a region and where people come from and leave for) as words. As a result, a region is represented by a distribution of functions, and a function is featured by a distribution of mobility patterns.
 Jing Yuan, Yu Zheng, Xing Xie. Discovering regions of different functions in a city using human mobility and POIs. 18th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2012).
Large-Scale Dynamic Taxi Ridesharing Service
Abstract: We present a large-scale taxi ridesharing service, which efficiently serves real-time requests sent by taxi users and generates ridesharing schedules that reduce the total travel distance significantly. We first propose a taxi searching algorithm using a spatio-temporal index to quickly retrieve candidate taxies that could satisfy a user query. A schedule allocation algorithm is then proposed to check each candidate taxi so as to insert the user’s trip into the schedule of the taxi. Our service can serve 40% additional taxi users while saving 15% travel distance over no ridesharing on average.
 Shuo Ma, Yu Zheng, Ouri Wolfson. T-Share: A Large-Scale Dynamic Taxi Ridesharing Service. IEEE International Conference on Data Engineering (ICDE 2013) Best Paper Runner-up Award.
Real-Time Sensing Urban Energy Consumption
Abstract: We propose a step toward real-time sensing of refueling behavior and citywide petrol consumption. We use reported trajectories from a fleet of GPS-equipped taxicabs to detect gas station visits, measure the time spent, and estimate overall demand. For times and stations with sparse data, we use collaborative filtering to estimate conditions. Our system provides real-time estimates of gas stations’ wait times, from which recommendations could be made, an indicator of overall gas usage, from which macro-scale economic decisions could be made, and a geographic view of the efficiency of gas station placement.
 Fuzhen Zhang, David Wilkie, Yu Zheng, Xing Xie. Sensing the Pulse of Urban Refueling Behavior. 15th ACM International Conference on Ubiquitous Computing (UbiComp 2013)
Constructing Popular Routes from User Check-in Data
Abstract: We present a Route Inference framework based on Collective Knowledge (RICK) to construct the popular routes from uncertain trajectories, e.g., a user's check-in sequence in FourSquare, geo-tagged photos in Flickr, or the migratory trails of a bird. Explicitly, given a location sequence and a time span, the RICK is able to construct the top-k routes which sequentially pass through the locations within the specified time span, by aggregating such uncertain trajectories in a mutual reinforcement way (i.e., uncertain + uncertain → certain). Our work can benefit trip planning, traffic management, and animal movement studies.
 Ling-Yin Wei, Yu Zheng, Wen-Chih Peng, Constructing Popular Routes from Uncertain Trajectories. 18th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2012). (Data)
 Hechen Liu, Ling-Yin We, Yu Zheng, Markus Schneider, Wen-Chih Peng. Route Discovery from Mining Uncertain Trajectories. Demo Paper, in IEEE International Conference on Data Mining (ICDM 2011).
Smart Driving Directions Based on Taxi Trajectories
Goal: In this research, we aim to mine the time-dependent and practically quickest driving route for end users using GPS-equipped taxicabs traveling in a city.
Insight: The time that a driver traverses a route depends on three aspects: 1) The physical feature of a route, such as distance, the number of traffic lights and direction turns; 2) The time-dependent traffic flow on the route; 3) A user’s drive behavior. Thus, a good routing service should consider these three aspects (routes, traffic and drivers), which are far beyond the scope of the shortest path computing.
GPS-equipped taxis can be regarded as mobile sensors probing traffic flows on road surfaces, and taxi drivers are usually experienced in finding the fastest (quickest) route to a destination based on their knowledge. Consequently, the trajectories of taxicabs already have the knowledge of experienced drivers, physical routes and traffic conditions.
In the beginning of this work, we mine smart driving directions from the historical GPS trajectories of a large number of taxis, and provide a user with the practically fastest route to a given destination at a given departure time. We build our system based on a trajectory dataset generated by over 33,000 taxis in a period of 3 months. According to extensive synthetic experiments and in-the-field evaluations, this system saves 5 minutes per 30-minute trip. See details in the following publications.
 Jing Yuan, Yu Zheng, et al. T-Drive: Driving Directions Based on Taxi Trajectories. In ACM SIGSPATIAL GIS 2010, The Best Paper Runner-Up Award.
 Jing Yuan, Yu Zheng, et al, T-Drive: Enhancing Driving Directions with Taxi Drivers' Intelligence. Transactions on Knowledge and Data Engineering (TKDE).
 Adding cabbie know-how to online maps, MIT Technology Review, 2010.11.6
 Follow that cab! Racing Google Maps on city streets, NewScientist, 2010.11.5
Further Research: Later, we expanded this research by considering the drive behavior and traffic prediction as well as other factors affecting driving, such as weather conditions. Specifically, we proposed a model incorporating day of the week, time of day, weather conditions, and individual driving strategies (both of the taxi drivers and of the end user for whom the route is being computed). Using this model, our system predicts the traffic conditions of a future time (when the computed route is actually driven) and performs a self-adaptive driving direction service for a particular user. This service gradually learns a user’s driving behavior from the user’s GPS logs and customizes the fastest route for the user. Refer to the following publication for details.
 Jing Yuan, Yu Zheng, et al. Driving with Knowledge from the Physical World. 17th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2011).
 "A driving route made just for you", MIT Technology Review, 2011.8.30.
Glean the underlying problems in a city's road network
Abstract: Urban computing for city planning is one of the most significant applications in Ubiquitous computing. In this paper we detect flawed urban planning using the GPS trajectories of taxicabs traveling in urban areas. The detected results consist of 1) pairs of regions with salient traffic problems and 2) the linking structure as well as correlation among them. These results can evaluate the effectiveness of the carried out planning, such as a newly built road and subway lines in a city, and remind city planners of a problem that has not been recognized when they conceive future plans. We conduct our method using the trajectories generated by 30,000 taxis from March to May in 2009 and 2010 in Beijing, and evaluate our results with the real urban planning of Beijing.
 Yu Zheng, Yanchi Liu, Jing Yuan, Xing Xie, Urban Computing with Taxicabs, 13th ACM International Conference on Ubiquitous Computing (UbiComp 2011), Beijing, China, Sep. 2011. The best paper nominee.
 A technical report describing the map segmentation and trajectory projection details.
 "Taxicab data helps ease traffic". Future of Technology on MSNBC.com. 2011.9.30
 "GPS Data on Beijing Cabs Reveals the Cause of the Traffic Jams". MIT Technology Review, 2011.9.27. Featured on the first page.
 "Urban computing based on taxicabs". Reported by ACM TechNews. 2011.9.27
Crowd Sensing of Traffic Anomalies in a City
 Detecting Traffic Anomalies: We detect anomalies in a city according to the taxi trajectories. The anomaly could be caused by unexpected or sudden accidents, such as traffic control, protests, concerts, parades, celebrations, and large-scale sale promotion. In many cases, the anomaly occurs before the corresponding accident actually happens. If detecting the unusual mobility pattern of people in this region in advance, we can solve the problem early and avoid the happening of the tragedy.
[a] Wei Liu, Yu Zheng, Sanjay Chawla, Jing Yuan and Xing Xie. Discovering Spatio-Temporal Causal Interactions in Traffic Data Streams. In KDD 2011.
[b] Linsey Xiaolin Pang, Sanjay Chawla, Wei Liu, and Yu Zheng. On Mining Anomalous Patterns in Road Traffic Streams. In the 7th International Conference on Advanced Data Mining and Applications (ADMA 2011). The best paper award
 Diagnose and Describe Traffic Anomalies: In publication [c], we identify the source traffic flow that results in an anomaly. In publication [d], we address the problem of detecting and describing traffic anomalies using crowd sensing with two forms of data, human mobility and social media.
[c] Sanjay Chawla, Yu Zheng, and Jiafeng Hu. Inferring the root cause in road traffic anomalies, IEEE International Conference on Data Mining (ICDM 2012).
[d] Bei Pan, Yu Zheng, David Wilkie, and Cyrus Shahabi. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013
A Passenger-Cabbie Recommender System
Abstract: We present a recommender for taxi drivers and people expecting to take a taxi, using the knowledge of 1) passengers’ mobility patterns and 2) taxi drivers’ pick-up behaviors learned from the GPS trajectories of taxicabs. First, this recommender provides taxi drivers with some locations (and the routes to these locations), towards which they are more likely to pick up passengers quickly (during the routes or at the parking places) and maximize the profit. Second, it recommends people with some locations (within a walking distance) where they can easily find vacant taxis. In our method, we propose a parking place detection algorithm and learn the above knowledge (represented by probabilities) from trajectories. Then, we feed the knowledge into a probabilistic model which estimates the profit of a parking place for a particular driver based on where and when the driver requests for the recommendation. We validate our recommender using trajectories generated by 12,000 taxis in 110 days.
 Jing Yuan, Yu Zheng, Liuhang Zhang, Xing Xie. Where to Find My Next Passenger? , 13th ACM International Conference on Ubiquitous Computing (UbiComp 2011).
 Nicholas Jing Yuan, Yu Zheng, Liuhang Zhang, Xing Xie. T-Finder: A Recommender System for Finding Passengers and Vacant Taxis. accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE).
Some Released Datasets
 T-Drive Taxi Trajectroies: This is a sample of T-Drive taxi trajectory dataset which was generated by over 10,000 taxis in a period of one week in Beijing.
 GeoLife Trajectory Dataset: This is a GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project by 167 users in a period of over two years (from April 2007 to Dec. 2010). This trajectory dataset can be used for many research theme, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
 Taxi request simulator: This simulator can generate people's request for taxicabs on different road segments, using the knowledge mined from a large-scale real taxi trajectories. Each query consists of an origin, destination, and a timestamp.
 Check-in data from Foursquare: Each check-in includes a venue ID, the category of the venue, a timestamp, and a user ID.
A slide deck for a 1-hour presentation
Yu Zheng, Project Lead, Lead Researcher
Jing Yuan, Associate Researcher 2
Qiwei Jin, Software Developer Engineer
Xing Xie, Lead researcher
There are many research interns who have worked with us in this project. We truely appreciate their contribution though we cannot list each of them on this page.