Using a diversity of big data to infer and predict fine-grained air quality throughout a city, and finally tackle air pollutions.
Many countries are suffering from air pollutions. Many cities have built a few air quality monitoring stations to inform people urban air quality every hour. Influenced by multiple complex factors, however, urban air quality is highly skewed in a city, varying by locations significantly and changing over time differently in different places. Thus, we do not know the air quality of a location without a monitoring station. We do not what the air quality at a place will be tomorrow either, let alone the root cause the air pollution.
This project aims to predict the fine-grained air quality of current time throughout a city and forecast the air quality of future time at each monitoring station. We also expect to identify the root cause of air pollution. For example, what's the proportion of PM2.5 in the environment derived from vehicular emission. what is the spatio-temporal causality interaction between the air pollutions of different cities?
The research has been publicly available through a "cloud + client" framework, where the cloud continuously collect real-time data, such meteorological data and air quality data. A user can access the air quality information through using a mobile client or web client.
Step 1: Infer Fine-Grained Air Quality
The first step of this project is to infer the real-time and fine-grained air quality of arbitrary location by using two parts of data. One is the real-time and historical air quality data from existing monitoring stations. The other is five additional data sources we observed in a city, consisting of meteorological data, traffic, human mobility, POIs, and road network data. We propose a semi-supervised learning approach based on a co-training framework that consists of two separated classifiers. One is a spatial classifier based on an artificial neural network (ANN), which takes spatially-related features (e.g., the density of POIs and length of highways) as input to model the spatial correlation between air qualities of different locations. The other is a temporal classifier based on a linear-chain conditional random field (CRF), involving temporally-related features (e.g., traffic and meteorology) to model the temporal dependency of air quality in a location. Read the related publications for more details.
 Yu Zheng, Furui Liu, Hsun-Ping Hsieh. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013). (Data) (Website) (Mobile App)(Video)
 Yu Zheng, Xuxu Chen, Qiwei Jin, Yubiao Chen, Xiangyun Qu, Xin Liu, Eric Chang, Wei-Ying Ma, Yong Rui, Weiwei Sun. A Cloud-Based Knowledge Discovery System for Monitoring Fine-Grained Air Quality. MSR-TR-2014-40.
A Dataset is released for research purposes: download the data.
Step 2: Forecast Air Quality at Each Station
The second step is to predict the fine-grained air quality of the next 48 hours. Specifically, in the first 6 coming hours, we predict a real-valued AQI for each kind of air pollutant, at each hour, in each station. For the next 7-12, 12-24, and 24-48 hours, we predict a max-min range of the AQIs at the corresponding time interval. Our predictive model is comprised of four major components: 1) a linear regression-based temporal predictor to model the local factor of air quality, 2) a neural network-based spatial predictor modeling the global factors, 3) a dynamic aggregator combining the predictions of the spatial and temporal predictors according to the meteorological data, and 4) an inflection predictor to capture the sudden changes of air quality.
 Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, Tianrui Li. Forecasting Fine-Grained Air Quality Based on Big Data. In the Proceeding of the 21th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2015).
A portion of the data used in the research has been released here.
Step 3: Suggest Locations for Monitoring Stations
Given a limited budget to build a few additional air quality monitoring stations, where shall we put them? The research solves this problem from the perspective of maximizing the inference accuracy and stability.
 Hsun-Ping Hsieh*, Shou-De Lin, Yu Zheng. Inferring Air Quality for Station Location Recommendation Based on Big Data. In the Proceeding of the 21th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2015).
Step 4: Identify the Root Cause of Air Pollution
1) Study the correlation between vehicular emission and air quality
2) Identify the spatio-temporal causality between air pollutants of different cities.
Idea: Find co-evolving patterns from air quality data from different stations and then apply causality models to these patterns for root cause discovery.
 Chao Zhang*, Yu Zheng, Xiuli Ma, Jiawei Han. Assembler: Efficient Discovery of Spatial Coevolving Patterns in Massive Geosensory Data. In Proceedings of the 21th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2015).
 Julie Yixuan Zhu, Yu Zheng, Xiuwen Yi, Victor O.K. Li, A Gaussian Bayesian Model to Identify Spatiotemporal Causalities for Air Pollution based on Urban Big Data. The International Workshop on Smart Cities, in conjunction with InFOCOM 2016.
3) Suggesting the locations for building additional monitoring stations;
Step 5: Study the Impact of Air Pollution to People's Health
- China Daily: Microsoft, IBM eye Technology to forecast air pollution in China. 2016.1.19
- 中国科技报：环境从治理走向智理, 2016.1.11
- NBC News: Microsoft, IBM Eye Big Business Opportunity in China's Air Pollution. 2015.12.28
- Reuters: Tech giants spot opportunity in forecasting China's smog, 2015.12.28
- GeekWire Reporter: What Microsoft Research is doing to help Beijing air pollution. 2015.11.30
- ComputerWorld: Microsoft predicts China's air pollution with data analysis, 2015.6.11
- 香港明报：微軟大數據分析 實時監測港空氣 2015.6.10
- see the CEO face-to-face interview on urban air
- A story about Urban Air has been featured by GCR news.
- 新华网： 微软郑宇：大数据解决城市中的大挑战。2015.1.13
- 凤凰网（专访）. 微软郑宇：大数据可预测空气污染 人人都是移动传感器. 2013.11.29
We appreciate our partners from Microsoft Product Teams who have been working with us closely in this project.
We also appreciate our partners like Stella Ye and Sandy Qi (from Bing) who made Urban Air available on Bing Map http://cn.bing.com/ditu/.
There are a few interns who have worked with us in the urban air project. We may not be able to list all of them here.
Yubiao Chen, Xuxu Chen, Hsun-Ping Hsieh, Furui Li, Zhenni Feng, Zhangqing Shang, Ruiyuan Li, Xiuwen Yi, .
- Hsun-Ping Hsieh, Shou-De Lin, and Yu Zheng, Inferring Air Quality for Station Location Recommendation Based on Urban Big Data, ACM – Association for Computing Machinery, 12 August 2015.
- Xuxu Chen, Yu Zheng, Yubiao Chen, Qiwei Jin, Weiwei Sun, Eric Chang, and Wei-Ying Ma, Indoor Air Quality Monitoring System for Smart Buildings, in UbiComp 2014, ACM, September 2014.
- Yu Zheng, Xuxu Chen, Qiwei Jin, Yubiao Chen, Xiangyun Qu, Xin Liu, Eric Chang, Wei-Ying Ma, Yong Rui, and Weiwei Sun, A Cloud-Based Knowledge Discovery System for Monitoring Fine-Grained Air Quality , no. MSR-TR-2014-40, March 2014.
- Yu Zheng, Furui Liu, and Hsun-Ping Hsieh, U-Air: When Urban Air Quality Inference Meets Big Data, in KDD 2013, ACM, August 2013.