Cloud Research Projects

Microsoft Azure for Research

Cloud research projectsCongratulations to the selected projects from the Microsoft Azure for Research Award program! We look forward to learning how the cloud will help you collect, filter, analyze, and share data across themes such as life sciences, urban sciences, environmental sciences, ecology, and geosciences. You can read abstracts of the selected projects below, which are organized by region.

Asia Pacific

Srikumar Venugopal, University of New South Wales, Australia
Scalable Protein Sequence Similarity Search for Metagenomics

Abstract: Metagenomics is the study of uncultured microorganisms from their habitats. In recent years, so-called next-generation sequencers have boosted the speed at which genomes can be sequenced from environmental samples. This in turn has led to a deluge in the amount of available metagenomic data. A key step in the study of metagenomic data is sequence alignment, that is computationally-intensive over large datasets. Tools such as BLAST require large-scale dedicated computing infrastructure for such analysis. We introduce ScalLoPS, a new tool designed to scale protein sequence alignment across cloud resources. This project proposes evaluating ScalLoPS against BLAST for Microsoft Azure using metagenomic datasets.

Soeren Balko, Queensland University of Technology, Australia
relocate.js

Abstract: With "relocate.js", we aim to address fundamental challenges of today's data centre-based cloud computing: poor average hardware utilisation, the negative environmental footprint, and privacy concerns of remotely processing confidential data. We do so by introducing a virtualization approach which dynamically identifies resource-intense components of some cloud-based application and transparently relocates and executes these on the client. Our R&D work addresses associated challenges of statically analysing and cross-compiling legacy code; optimally identifying relocatable application components in real time, subject to multiple optimisation goals; and preserving security of backend application code that runs on the client.

Jian Zhang, University of Technology, Sydney, Australia
Friends Recommendation Based on Graph Correlation

Abstract: In social multimedia, people’s online friendship is related with their behaviours. This project aims to correlate their contact graph and their image tag graph for further recommendations in a large dataset.

Yuedong Yang, Griffith University, Australia
Cloud-Based Platform for Genome-Scale Prediction of Protein Functional Complex Structures at Experimental Quality

Abstract: Exponential increase in the number of proteins with unknown functions calls for bioinformatics databases with high-quality function prediction to guide experimental studies. A commonly used method is to infer function by sequence homology, but it can cover only a small part of proteins. Hence, it is critically important to identify protein functions in high confidence but in the absence of sequence. Here, we plan to expand our successful template-based approach for function prediction that goes beyond RNA/DNA-binding functions, and make genome-scale high-quality functional annotation. The resulting database will be freely available to scientific community of molecular biology and bioinformatics.

Chao Wang, Tsinghua University, China
Intelligent Sustainable Navigation Services (ISUNS)

Abstract: The objective of this research is to improve the eco-efficiency of overall urban transport decision-making by maximizing the access gained and information support from a given level of vehicular travel with providing citizens with safe, healthy, pleasant and well-informed urban travelling experience. The researchers will examine factors such as high urban residential density, travel styles, alternative options, costs, time consumptions, and travel experiences, etc. to develop a navigation system based on behavior-patterns. This proposal might also be associated with various policy changes.

Conghui Zhu, Harbin Institute of Technology, China
Chinese Minority Ethnic Languages Translation

Abstract: We plan to develop a Chinese Minority Ethnic Languages statistical translation system, which supports at least three languages, in cooperation with Microsoft Research Asia. The first step is resources gathering, included: seed gathering, and parallel sentence crawling, extracting, evaluating from internet. The statistical translation system training and tuning are the next step which needs huge computations. Finally a translation API is supported to help people Exchange information equally.

Meng Xianhai, Beihang University, China
A cloud based platform for virtual geologic Earth

Abstract: Geologic data and models have typical characteristics of big data. In this project, we aim to develop a cloud based platform for virtual geologic earth, which can be used to collect the geologic data and models, index them by unified spatial data model, and share them by visualization. The spatial data model is designed to manage different types of geologic raw data and various models, and define the corresponding spatial index and file format. The cloud computing technology by Microsoft Azure is used to implement distributed data conversion, files storage and models release.

Jiamin Xu, Shanghai Jiao Tong University, China
Unsteady Aerodynamics and Aeroacoustics of Slat Morphing Trailing Edge

Title: The objective of this proposal is to investigate issues related to flow control optimization based on large scale unsteady aerodynamic simulations combined with multi-level optimization strategies using Cloud resources. The research issues are representative of the challenges faced with aerospace design community when high-fidelity unsteady simulation models are considered.

Lianwen Jin, South China University of Technology, China
FaceMore: An Innovative Facial Beautification Web Service Based on Windows Azure

Abstract: This project aims at building an online, elastic scalable, quickly deployed system, FaceMore, which can deal with personalized face beautification on various scales of face data sets by taking advantages of special features provided Microsoft Azure and integrating the advanced face beautification technology. It is expected that this system may make full use of large amount of data, support large concurrent processing and provide unique functions like personalized face beautification and various special face effects based on the data driven average face hypothesis and the region-aware mask based facial beautification algorithm.

Xueming Qian, Xi'an Jiaotong University, China
Schedule Travel Life by Exploring Spectrums of Social User and City Services

Abstract: Make a schedule for a short-medium term travel user by exploring users’ social community and services of locations of destination cities is very important. In this proposal, 1) we propose to mine social users’ preferences/life-spectrum using the community-contributed information from their travel history and check in history in their residents. 2) we propose to mine city services/activity spectrum from the crowd-source contributed by worldwide users including the check-in data of local residents (comments, geo-locations, and so forth), and the travel information from users' shared photos and travelogues. 3) we propose to recommend users preferred services/activities according to the temporal-geo-social spectrum similarities. 4) we provide an objective/overall summarization of the local services for improving the services qualities. The problems we need to solve are as follows: 1) how to mine users? preferences; 2) how to mine city services spectrum; 3) how to recommend user personalized services/activities when in an unfamiliar city; 4) how to recommend user personalized events (sequential services) Outputs of this project are as follows :1) provide a solution to schedule personalized travel life by exploring spectrums of social user and city services, 2) provide objective feedback for local services department to improve services quality by summarizing the comments/evaluations from world-wide users, 3) publish several papers and show demos in important international conferences, 4) share our datasets for researchers in this area , 5) train 5 MSD/PhD students in this research area.

Yingchi Mao, Hohai University, China.
Safety and efficient utilization of hydropower development in Lancang River based on Windows Azure

Abstract: Based on the dam safety analysis and evaluation results of XiaoWan and Nuozhadu Hydropower station in Lancang River, we will analyze the relationship between the ecological environment and the dam safety of XiaoWan hydropower station project, also indicate the impact of dam on the ecological environment. We will apply the data mining techniques and theories to the analysis of the safe of dam and the environment. Moreover, we will establish a data mining method library. The project will implement XiaoWan Hydropower secure environment comprehensive analysis of data mining prototype system.

Zheping Xu, Institute of Botany, Chinese Academy of Sciences, China
Dynamic Biodiversity Protect and Monitor in a Cloud Environment

Abstract: We are experiencing sixth mass extinction of plants and animals. We should take any effect to protect the biodiversity of our planet. Besides occurrences from specimens and observatory, much temporal distribution information of kinds of species can be extracted and processed from scientific literatures. However, we should a high performance environment to store and process our huge data (>20TB), namely more than 112 million records from 42 million pages, and may generate more than 500 million distribution points. There are also some new techniques should be introduced: machine learning, natural language processing, GIS and etc.

Zhumin Chen, Shandong University, China
Urban lLfestyles Detection form Big Heterogeneous Human Behavioral Data Using Windows Azure

Abstract: Lifestyle is the typical way of life of an individual, group, or culture in different nations. Understanding lifestyles is significant. This project is to study how to use Microsoft Azure to discover the lifestyles based on the heterogeneous human behavioral data in Web news, Twitter, Weibo etc. We will use Work Role, HDInsight, SQL database, Virtual Machine, Mobile Service etc. of Microsoft Azure to identify, collect, store and process lifestyle related data, and then detect and evaluate the lifestyle. We hope the results of this project can help people to find the historical changes of their lifestyle and help organizations to find the possible crises in lifestyles.

Guangjun Zhang, Peking University, China
Machine learning—parameter estimation for groundwater flow and transport models based on Windows Azure Cloud

Abstract: Machine learning—parameter estimation and interactive analysis of various multi-scale field data with visualization of modeling results on the Microsoft Azure cloud-computing environment will be a thrilling research spot. This project will implement a parallel parameter estimation (PPEST) approach on the Microsoft cloud for a regional groundwater flow model and mass transport model. The sensitivity and uncertainty of the distributed model parameters will be quantified. Then the optimized modeling results and interactively conduct scenario analysis will be visualized. This study will provide useful information and insights for earth science model researchers.

Huayi Wu, Wuhan University, China
Collaborative Geoprocessing on Windows Azure

Abstract: With the trend towards distributed computing, a geoprocessing algorithm is often encapsulated into a geoprocessing service, providing a standardized interface for web invocation. To build large-scale geoprocessing applications as complex geospatial simulation models, several geoprocessing services can be integrated into a geoprocessing service chain, which enables interoperable, distributed, and collaborative geoprocessing that significantly enhances the capacity to derive geoinformation and knowledge over a network. We designed a collaborative geoprocessing framework, GeoSquare, for geoscientific data/algorithms/models sharing and geoprocessing orchestration. The basic architecture of GeoSquare is service oriented. GeoSquare demonstrates a feasible way to achieve a geoprocessing computing paradigm for the future.

Jitao Sang, Chinese Academy of Sciences, China
Cyber-Physical Footprint Association: Cloud Storage and Computing

Abstract: User behavior is one of the most important sensors towards urban computing. In this project, we propose to exploit the user footprint on both cyber and physical worlds to address the data collection problem in urban computing. In particular, this project is featured in two-fold: (1) cross-platform user behavior data aggregation, and (2) cyber-physical footprint association. We are first motivated to aggregate the user online activities across different social platforms to derive a cross-platform cyber footprint, and then associate the cyber footprint with physical footprint constructed from real-world sensors. Microsoft Azure well fits to our project for its powerful storage and computing ability. We are applying for Microsoft Azure resources to storage the raw user heterogeneous behavior data, associate the cross-network behavior data to obtain social media footprints, and deploy the urban computing demo system.

Junjie Wu, Beihang University, China
A System for Heterogeneous Social Media Big Data Analytics in Azure Cloud

Abstract: The explosively growing UGC is of great commercial and research values. This project aims at mining UGC and providing high-quality information services to various kinds of organizations. With a prototypical system built in former national scientific projects, we expect improving it to a business standard with the help of services provided by Microsoft Azure.

Lei Zou, Peking University, China
Graph Data Management in Urban Computing

Abstract: An inherent challenge in urban computing is the large-scale heterogeneous data management. In this project, we focus on "graph data." As one of the most popular and powerful representations, graphs have been used to model many application data in urban computing, such as traffic network and location-based social networks. We first design a general-purpose graph engine for urban computing, and then demonstrate the usefulness of graph data management in urban computing by two interesting applications in traffic network and social network.

Xinbo Gao, School of Electronic Engineering, China
Videos analysis and recommendation for online learning

Abstract: With the significant development of the Internet and multimedia related techniques, online learning has become an important supplement to traditional education in classroom. Videos have become the most popular study materials. There are, however, lack of connection between these videos, and learners have no convenient way to find the similar and most related videos for learning. To overcome these problems, we plan to build a platform to process and analyze education videos, build table of content for video searching and recommendation, and design an online learning community for recording, sharing knowledge. This platform need a large number of storage and computational resource which is what we expect from Microsoft Azure.

Yan Xu, Beihang University, China
Large-scale histopathology image analysis for colon cancer in Azure

Abstract: Histopathology images are critically important means in the colorectal cancer diagnosis and treatment. A standard histopathology slice can be scanned at a resolution of, for example, as big as 200,000 × 200,000, which makes most existing tools for histopathology image segmentation, clustering, and classification infeasible to apply on a single machine. We developed a parallel multiple instance learning algorithm on Microsoft Azure. We used manual features and parallel multiple instance learning (P-MIL) in the previous framework. To further improve the performance, we will study automatic extraction of fine-grained information form coarse-grained labels using deep learning.

Yuan Juli, Zou Hengming, Shanghai Jiao Tong University, China
A Distributed Algorithm for File Distribution and Replication on Cloud Platform

Abstract: Caching and replication of popular data objects contribute significantly to the reduction of the network bandwidth usage and the overall access time to data. Our focus is to improve the efficiency of object replication within a given distributed replication group .Given the request rates for the objects and the server capacities, find the replica allocation that minimizes the access time over all servers and objects. We design a distributed approximation algorithm that solves this problem and prove that it provides a 2-approximation solution. We also show that the communication and computational complexity of the algorithm is polynomial.

Long Quan, The Hong Kong University of Science and Technology, Hong Kong
Large-Scale Three-Dimensional Urban Reconstruction

Abstract: In this project, we propose a fully-automatic method for large-scale urban scenes 3D reconstruction based on the input images captured both at the ground level and the low latitude air using Microsoft Azure.

Tony Tung, Kyoto University, Japan
Life Maps

Abstract: Information is collected in various forms from all over the world and beyond. However, it is still difficult for each individual to access, get exposed to, or grasp all existing resources. Hence, we propose to globally enrich the knowledge one could have locally using state of the art map tools, and scalable 3D reconstruction and visualization techniques. Particularly, in addition to existing satellite images and static 3D object representation (e.g., provided by Bing Maps and its competitors), we propose to introduce dynamic information that can be either reconstructed in real-time (e.g., from user inputs), or simulated based on live measurements (e.g., using weather forecast, sun and moon positions, traffic information, etc.). The project outcome could potentially outclass existing 2D/3D map representation by providing to users informative real dynamic 3D snapshots of any place on Earth. For example, one could observe traffic congestion in New York City in real-time, see London on a rainy day, or enjoy—privately—the sight of the Eiffel Tower by night, all from real or simulated 2D/3D objects. Additional information about local air quality, available resources (e.g., housing, energy, network, etc.), social behavior, and trends could also be provided. We believe the recent advances in cloud computing technology can handle the challenges of dynamic big data management, multimodal sensing, and scalable real-time 3D vision.

Hwasoo Yeo, Korea Advanced Institute of Science and Technology, Korea
Cloud+P58:Q59 Sensing Based Urban Travel Time Prediction with Online Traffic Simulator

Abstract: The objective of this project is to develop a software module that provides the information on the life patterns of the people in urban area. Three major focusing points of this project are established. The first major focus is developing the methodology for categorizing the entire time-dependent location data of smartphones by each transportation mode. The second major focus is developing the methodology for predicting future demand of each transportation mode using the categorized data. The third major focus is developing the methodology for predicting the travel time of highway and arterial road links using the information of predicted future demand.

Hyunju Lee, Gwangju Institute of Science and Technology, Korea
Text mining for identifying disease-gene-biological relationships

Abstract: Genes usually contribute to the development of diseases through biological events such as gene expression, regulation, phosphorylation, localization and protein catabolism. Our disease-gene search engine, DigSee (http://gcancer.org/digsee), services the sentences from MEDLINE abstracts with identified triple relations that ‘which genes’ are involved in the development of ‘which cancer' through ‘which biological events’. Since the current version of DigSee supports only cancer, our goal is to incorporate more diseases types other than cancers into the system. This new system will allow researchers working on various types of diseases to search which genes are related to the disease through which biological events.

Joon Heo, Yonsei University, Korea
Does ‘Gangnam Style’ really exist? Answers from data science perspective

Abstract: Based on open datasets from city of Seoul, we conduct two major analyses: (1) among numerous polygonal spatial clustering algorithms, which one would be the most suitable for such high dimensional (several thousands) datasets?; (2) based on the finding from the above analysis, what kind of attributes are governing factors for differentiating between Gangnam and other districts? The first and the second analyses can be applied to any other district (Gu) and characterize its differentiating attributes, which would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters.

Muhammad Bilal Amin, Kyung Hee University, Korea
Enabling Data Parallelism for Large-Scale Biomedical Ontology Matching over Multicore Cloud Instances

Abstract: Ontology matching is among the core techniques used for integration and interoperability resolution between biomedical systems. However, due to the ever-evolving nature of biomedical data, ontologies are becoming large-scale and complex; consequently, leading to performance bottlenecks during matching. We present a parallel ontology matching system for large-scale biomedical ontologies that implements data parallelism over multicore cloud platform for performance benefits. Our system decomposes these ontologies into smaller and simpler subsets depending on matching algorithms. Matching process over subsets is divided from granular to finer-level abstraction of independent matching requests, matching jobs, and matching tasks, running in parallel over cloud instance. Matched results are aggregated to generate mediation bridge ontology.

Tai-Quan Peng, Nanyang Technological University, Singapore
Tracking Social Happiness on Twitter: A Multi-Level Study

Abstract: It is a long-standing interest among the public and researchers to observe and explain individuals’ happiness. The project will mine rich time-stamped information stored on Twitter, which are objective and real-time records of user’s subjective feeling, to fulfill three objectives: (1) developing and testing a time-variant and domain-specific Happiness Index of Twitter (HIT) at both individual and societal levels; (2) modeling the dynamics of HIT at both individual and societal levels; and (3) uncovering causal mechanisms underlying the dynamics of HIT.

Chu Hong Steven Hoi, Nanyang Technological University, Singapore
Cloud-Based Mobile Recommender Systems by Online Collaborative Filtering Techniques

Abstract: This project aims to investigate a novel framework of on-the-fly mobile recommender systems in the cloud for enhancing the human living experience by mining GPS trajectory data. Unlike the conventional mobile recommender systems that use batch collaborative filtering techniques, we propose a family of novel online collaborative filtering techniques, which are far more efficient and scalable than the existing techniques for building the on-the-fly mobile recommender systems. The Microsoft Azure platform helps us to resolve the cloud-based storage and computation tasks in our experimental studies and system implementations.

James M Hogan, Queensland University of Technology, Australia
Bing for Genomes—Information Retrieval Approaches to Genomic Search and Comparison

Tomasz Bednarz, CSIRO, Australia
Image Analysis in Azure Clouds

Guangzhong Sun, University of Science and Technology, China
Smart Campus Construction Based on Rich Campus Datasets

Victor O.K. Li, The University of Hong Kong, China
A Big Data Stream Processing Solution for Hidden Causality Detection of Urban Dynamics

Wenjun Wu, Beihang University, China,
Cloud-Based MOOC Platform for Self-Organized Learning

Yanmin Zhu, Shanghai Jiao Tong University, China
NoiseSense: Crowdsourcing-Based Urban Noise Mapping with Smartphones

Peng Gong, Tsinghua University, China
Satellite Remote Sensing for Urban Computing—40 Year Dynamic Information on Land Use for Beijing City from Time Series Landsat Data and Computer Simulation

Aoying Zhou, East China Normal University, China
COBA: Sensing Urban Lifestyle Based on Collective Online Behavior Analysis with Windows Azure

Yuguo Li, University of Hong Kong, Hong Kong
SmartComfort—Use of Smartphone and Cloud Technologies for Building Thermal Comfort, and Ventilation and Health Studies in Megacities

Hansol Lee, Korea Military Academy, Korea
Employing a Customized Web-Based Corpus Program forLanguage Learning

Hojung Cha, Yonsei University, Korea
Development of a Crowd Sensing Framework for Inducing User Participation in Urban Environments

Europe and Russia

Didier Donsez, Universit‚ Joseph Fourier - Grenoble 1, France
CIRUS : A Cloud Infrastructure for Real-Time Ubilytics

Abstract: The Internet of Things (IoT) has become a reality with the availability of chatty embedded devices. The huge amount of data generated by things must be analyzed with models and technologies of the “Big Data Analytics”, deployed on cloud platforms. The CIRUS project aims to deliver a self-adaptive cloud-based infrastructure for real-time ubilytics (ubiquitous big data analytics). The CIRUS infrastructure collects and analyzes IoT data for M2M services using COST such as M2M gateways (OpenHAB, PeerGreen), Message brokers (Mosquitto, RabbitMQ, JORAM) or Message-as-a-Service providers and Analytics frameworks (Hadoop, Storm, S4, Samza) deployed and reconfigured dynamically with RoboConf.

Romain Rouvoy, University Lille 1, France
ApiSwarm : Elastic processing of crowd-based datasets

Abstract: This project intends to leverage the APISENSE crowd-sensing platform to support the real-time processing of «big» scientific datasets collected in the physical world from a «large» crowd of smartphones. Examples of case studies covered in this project include the automatic inference of roadmaps, the continuous cartography of network coverage quality, or even the detection and the dynamic analysis of earthquakes. As the volume of data to be collected in the wild is unpredictable, APISENSE requires the adoption of elastic computation models and infrastructures offered by Microsoft Azure to continuously provision the processing capabilities to fit the feeds of data reports.

Gabriel Antoniu, INRIA, France
Z-CloudFlow: Advanced Data Storage and Processing for Multi-Site Cloud Workflows

Abstract: This joint project is carried out within the Microsoft Research–INRIA Joint Centre. It addresses the problem of advanced data storage and processing for supporting scientific workflows in the cloud. The goal is to design and implement a framework for efficient processing of scientific workflows executed on several geographically distributed datacenters in the cloud. The main objectives are to reduce the data access latency and minimize the workflow makespan. We are trying to give reasonable answers to following research questions: how to optimize communication and/or computation overhead? How to group tasks and dataset together to minimize data transfers? How to do workload balancing between data centers to avoid bottlenecks? What strategies to use and how for efficient data transfers?

Patrick Hénaff, Université Paris 1, France
Euclide Quant Network

Abstract: Zanadu is a computational finance platform. It provides benchmark data and standard models of mathematical finance, implemented in a real-life environment. The models are presented as IPython notebooks that combine a mathematical description of each model, the code, and calculation results. The platform will be tested by scientists in three research labs, and by graduate students at four universities throughout Europe. The platform is complemented by an on-line forum and reference library, in order to build a scientific community around the project. The quarterly “Parisian Model Validation Seminar”, will provide a venue to discuss the platform.

Frederic Magoules, Ecole Centrale, Paris, France
Advanced Linear Algebra Libraries for the Cloud

Abstract: Our research consists to offer advanced linear algebra libraries to research communities in academia and in industries. The goal of this proposal is to turn our advanced linear algebra libraries into usable applications on Microsoft Azure.

Role, Paris Descartes University, France
Azure-Based Text Mining Tools for Genome-Wide Association Studies

Abstract: The goal of the project is to develop and deploy on the cloud a set of robust, easy-to-deploy text mining tools to assist researchers in the analysis and interpretation of large-scale results coming from genome-wide association studies (GWAS).

Jean-Charles Régin, University Nice-Sophia Antipolis, France
Using Windows Azure for High Performance Computing

Abstract: We propose to use cloud computing for solving complex optimization problems and to give to the community the opportunity to use automatically a cloud computing based solver for solving their problems.

Frank Hutter, Freiburg University, Germany
CloudEval: Towards Community-Based Performance Testing and Optimization

Abstract: Applications ranging from software verification to industrial optimization require effective solutions to hard combinatorial problems. Modern heuristic algorithms solve these problems well in practice, but it is not well understood which heuristics and which parameter settings work best on which types of problem instances. As progress is mostly driven by empirics, CloudEval aims to provide a shared platform for the continuous evaluation of solvers on benchmarks. CloudEval will use state-of-the-art automated algorithm configuration tools to customize each solver for each benchmark and will maintain and advertise a leader board of solvers for each category.

Eoin O'Grady, Marine Institute, Ireland
Irish Digital Ocean – SMART Marine Research Platform

Abstract: The Irish Digital Ocean – SMART Marine Research Platform is a cloud environment tailored for data-intensive collaborative marine research and innovation. The platform has the potential to significantly improve the effectiveness of marine research, at team, organizational, national and international level, and lead to the development a vibrant marine research and innovation ecosystem. The platform will focus on collaborative research, the reuse of marine digital assets and translation of research outputs to new products and services. The platform will be a key component in the research stream of the wider Irish Digital Ocean (IDO) framework.

Liliana Pasquale, University of Limerick, Ireland
Minority Report: Using the Cloud to Enable Proactive Digital Forensic Investigations

Abstract: The Minority Report project aims to develop an open source toolset and a demonstrative data set to support proactive digital forensic investigations in large and distributed systems. This tool can be very useful for system administrators and investigators since it can suggest in advance the likelihood of potential hypotheses of a crime. To exhibit good performance, even for large sets of events, our tool will leverage the HPC capabilities provided by Microsoft Azure to decompose each crime hypothesis into smaller and more tractable sub-hypotheses. We will perform the analysis of crime hypotheses by using the Event Calculus and SMT solvers (e.g., Z3).

Elisabetta Di Nitto, Politecnico di Milano, Italy
sAfe CitiEs through clouD and Internet-of-Things (ACED-IoT)

Abstract: ACED-IoT will develop a scalable and reliable prototype of a safety planner for the management of incidents, for example, a fire or a flooding occurring in a large area. The safety planner will exploit an IoT infrastructure and social networks to gather data, and Microsoft Azure as cloud infrastructure for the back-end data processing and for planning the actions to support safety squads during the emergency.

An area with high-density population will be considered as reference scenario. The system will be self-adaptive and, in particular, will detect and predict the potential failure of the used sources and take proper recovery actions.

Dariusz Mrozek, Silesian University of Technology, Gliwice, Poland
Cloud4Psi. Cloud Computing in the Service of 3D Protein Structure Similarity Searching

Abstract. 3D protein structures exhibit high conservation in the evolution of organisms, and even if protein sequences diverged significantly, finding structural similarities allows to draw conclusions on functional similarity of proteins in various, sometimes evolutionary distant organisms. However, popular methods that allow searching for protein structure similarities are still very time-consuming. The similarity searching against large repositories of structural data requires increased computational resources that are not available for everyone. Our project addresses the problem. We are going to develop the cloud-based system that will be a highly-scalable and high-performance solution for protein similarity searching and for protein function identification.

Marek Stanislaw Wiewiorka, Warsaw University of Technology, Poland
Towards an interactive secondary analysis of RNA sequencing data service in Widows Azure cloud with Apache Spark framework

Abstract: We would like to continue our current research on the application of new MapReduce frameworks, mainly Apache Spark and Stratosphere in genomics and transcriptomics. Using Microsoft Azure with the requested resources would give us a chance to find an optimal, fast, scalable, and interactive RNA sequencing analysis service that can be used by the whole bioinformatics community. Our project is complementary the work done by other researches on SeqInCloud (powered by Microsoft Hadoop on Azure [HoA]) solution that provides genome analysis pipeline for DNA.

Sergey Chernov, New Economic School, Russia
Enabling Large-Scale Social Network Analysis Using VK Data

Abstract: Social media ecosystem attracted a great deal of research attention in the past decade. Numerous research projects aim at large-scale analysis of available online networks, including worldwide-popular resources like Facebook, Twitter, etc. Still, these studies fall short to address countries, in which aforementioned networks are not the most widespread ones. In particular, Russian social network VK has about 220 mln active accounts, but it is mostly ignored in research literature. We would like to collect and process VK data into a publicly available research dataset to provide interested scientists with an easy access to the social network data on Russia.

Evgeny Rogaev, Vavilov Institute of General Genetics, Russian Academy of Science, Russia
Alzheimer Bio Project

Abstract: The goal of the project is to analyze in Microsoft Azure Alzheimer human sequence data (the Institute have public open data available) and analyze the annotations of various effects for Alzheimer disease. The project will allow prediction and drug analysis. Our partners are eager to collaborate, provide their bio expertise and their unique bio software pipeline for analysis.

Heiko Schuldt, University of Basel, Switzerland
ADAM+ – A Large-Scale Distributed Image and Video Retrieval System

Abstract: ADAM+ focuses on new strategies and methods towards a modern distributed system that stores, organizes and retrieves multimedia data. To that end it subsumes three tasks: the extraction of features of image and video collection, the execution and analysis of queries and the adaption of features and parameters using machine-learning algorithms. With these tasks, the goal of the project is to create a large-scale distributed image and video retrieval search engine that jointly uses various query paradigms, such as keyword search, query-by-example and query-by-sketch for query formulation and is able to adapt to various collections and features.

Derrick Crook, University of Oxford, United Kingdom
Modernizing Medical Microbiology

Abstract: The Modernizing Medical Microbiology Consortium which is at the forefront of translating pathogen whole genome sequencing into clinical practice is seeking to develop new rapid methods for analyzing genomic sequence linked to clinical record data on a very large scale. The opportunities offered by access to the Microsoft Azure technologies will enable first in class experiments to be successfully completed. These will involve processing greater than 10,000 pathogen genomic sequences to unravel the evolution and spread of organisms fast enough to use the data clinically. New methods will also be developed to enable routine use in hospitals and medical services.

Michael Epitropakis, University of Stirling, United Kingdom
Efficient Regression Test Optimization in Windows Azure

Abstract: The goal of the proposed research project is to develop novel search-based optimization methodologies that can improve the regression testing phase of a software project, in an automatic and cost-effective way. To adequately cater for real world regression testing cases, a multi-objective formulation is utilized, which enables us to test and study different properties of the system under test. The usage of the Microsoft Azure platform allows to tackle regression testing scenarios on large-scale open source software projects. Any developments toward this direction will help the software testing community to deal with regression testing scenarios as efficiently as possible.

Nando de Freitas, University of Oxford, United Kingdom
Deep Learning on the Cloud

Abstract: We plan to capitalize on a recent breakthrough in machine learning to build efficient parallel algorithms to train massive deep neural networks. If successful, this project will enable users to build deep learning applications on the cloud, thus significantly advancing AI.

Said Kharbouche, University College London, United Kingdom
Online GlobAlbedo's Data Analysis and Visualization

Abstract: In our GlobAlbedo project of daily earth surface's albedo mapping, we have developed an unique method for extracting and visualizing ROIs (Region Of Interest) on a single server on a first come, first served process. We would like to exploit the Microsoft-Azure cloud computing environment to reduce to as short as possible the system's response time to allow our end-users to be able to visualize, analyze and download the data regardless of the ROI size or the number of simultaneous requests all in the shortest possible time period.

Ian Gent, University of St Andrews, United Kingdom
Recomputation of Scientific Experiments

Abstract: The discipline of Computer Science has never treated scientific replication seriously. Not only are experiments rarely replicated, they are rarely even replicable in a meaningful way. Computer science must embrace recomputation (exact replication of a previous experiment) as standard practice. This proposal is to help build the tools and repositories that will make recomputation a credible route for researchers in computer science to make their scientific experiments available online, via the Microsoft Azure platform and elsewhere. It is also to help build the community of researchers contributing both experiments and expertise and tools to this effort.

Paolo Missier, Newcastle University, United Kingdom
Analysis and Interpretation of Human Exome Sequencing for Clinical Diagnosis and EHR Integration in the Cloud

Abstract: Cloud e-genome is a two-year pilot project for the UK National Health Service. Its aim is to facilitate the adoption of systematic genetic testing in clinical practice, and its integration with Electronic Health Record (EHR) management at population scale. The project explores the use of a cloud computing infrastructure, algorithms and tools to enable clinical diagnosis based on “second generation” human whole-exome sequencing. Our use cases are specifically in the area of rare neurological diseases.

The project is a collaboration between the Institute of Genetic Medicine and the School of Computing Science at Newcastle University, UK. Partial funding for the project comes from NIHR (National Institute for Health and Research, UK), as well as from a gift from Microsoft.

Philip Kershaw, STFC Rutherford Appleton Laboratory, United Kingdom
JASMIN

Abstract: JASMIN is a national environmental data computing facility based at STFC Rutherford Appleton Laboratory in the UK. Established in 2012, over the next two years it will see significant expansion funded through NERC’s Big Data Initiative. NERC is the United Kingdom’s main government agency for funding and managing research, training and knowledge exchange for the environmental sciences. The challenges of Big Data affecting this community in terms of increased projected data volumes, complexity of data, and timeliness for results. Cloud provides a solution to address some of these challenges. To this end, the JASMIN development team are working with Microsoft to explore a hybrid cloud model augmenting JASMIN’s private cloud with resources from Microsoft Azure.

Blesson Varghese, University of St Andrews, United Kingdom
Real-Time Catastrophe Risk Management on Windows Azure

Abstract: Currently, simulations employed in the financial industry do not support real-time catastrophe risk management due to their memory, data and computational demands. In this novel research, we propose to develop real-time risk management simulations on Microsoft Azure by implementing techniques to scale out when the memory, data and computational load increases. An interactive online tool built on HDInsight and Hive, which incorporates the simulation and facilitates real-time catastrophe risk management, will be developed and deployed. This tool will be used to make timely decisions when a catastrophe unfolds and thereby reduce damage to the financial market.

Julio Hernandez-Castro, University of Kent, United Kingdom
ChessWitan: Mining chess data to distinguish human from computer play

Abstract: Our aim is to develop a model able to distinguish between computer and unassisted human play through the analysis of the moves of a chess game. The originating motivation is that the widespread availability of powerful chess-playing programs is tainting chess play at all levels, and new high profile cases are arising continuously, where players receive computer assistance in tournament play, either directly or via an intermediary. There are currently no tools available able to give an objective assessment of the likelihood of cheating, and the only recourse is to time-consuming expert opinion on each individual case.

Nadarajen Veerapen, University of Stirling, United Kingdom
Automated Bug Fixing

Abstract: The goal of this project is to help programmers automatically fix software bugs. This requires detecting and extracting bug fixing patterns found in open source repositories. This information will then be used to generate potential fixes when a bug occurs. After testing the possible solutions, the most likely ones will be presented to the programmer. These steps are computationally expensive and will benefit from running within Microsoft Azure.

Vassilis Glenis, Newcastle University, United Kingdom
Modelling Flood Risk in Urban Areas

Abstract: Using a unique, fully-coupled surface-subsurface urban flood modelling system that is transferable to other urban areas, the project will use the cloud resource to implement multiple simulations of the model in the cities of Newcastle-upon-Tyne and London, driven by multiple realisations of future climate and rainfall, to determine flood risk and, subsequently, implement multiple simulations with a range of mitigation solutions designed in collaboration with local stakeholders. The complexity of urban drainage components (sewer networks, drainage paths) and the high-resolution data (land elevation, surface features), will benefit from the computational resources available in Microsoft Azure.

Peter Coveney, University College London, United Kingdom
Collaborative Computational Project in Systems Medicine

Adam Farquhar, The British Library, United Kingdom
British Library Labs in the Cloud

Latin America

Eduardo Alves do Valle Jr., School of Electrical and Computer Engineering - FEEC, UNICAMP, Brazil
Medical Image Classification for Computer Aided Diagnosis with Deep Learning and Jumbo Vectors

Abstract: Information retrieval and content-based image classification has been studied by the scientific community in many different ways. A key application of this technology is Computer-Aided Diagnosis (CAD), improving doctor’s abilities to detect or prevent several diseases. Our aim is to advance the state of the art in CAD systems, for the screening of pathologies based upon medical images, focused on the early screening of melanoma. The techniques covered by this project involve the benefits of Deep Learning Architectures and Bag of Visual Words models, which show complementary advantages.

Anna Izabel J. Tostes, Federal University of Minas Gerais, Brazil
A Collaborative Cloud-Based Business Intelligence Platform for Transportation Networks

Abstract: This project consists in developing a collaborative platform of business intelligence to deal with traffic management and planning with real-time smart decision-making, preventive actions and contingency plans. A SharePoint portal using Microsoft Azure cloud computing platform will be developed using data sources based on sensors (road detectors and cameras along the road), geographical information systems (Bing Maps), and social networks, collecting different views of data traffic. The goal is to analyze these data sets using graph theory and social network metrics, and then perform an analysis using OLAP cube and data fusion, producing traffic metrics and congestion indicators.

Carmem Satie Hara, Federal University of Parana, Brazil
RING Project

Abstract: The increasing amount of RDF data made available nowadays requires data to be partitioned across multiple servers. The RING project aims at providing a scalable RDF storage system and efficient query processing involving both simple and complex pattern matchings. Our approach is based on a workload-aware method for data partitioning for minimizing inter-server communication during query executions. We intend to store the RDF graph in its native form in order to apply graph exploration instead of costly join operations on triples. A grant from Microsoft Azure would allow us to evaluate the effectiveness of the solution on its cloud platform.

Fernando da Fonseca de Souza, Universidade Federal de Pernambuco, Brazil
Cloud Databases: A model to guarantee data consistency

Abstract: This work proposes a model for ensuring data consistency in cloud databases. More specifically, it shall define a consistency model appropriate for the context of cloud databases; develop a prototype with a mechanism to implement the created model; and validate the mechanism. Thus, an architecture will be designed considering widely recognized and studied data replication and propagation approaches considering the required adaptations. The proposed architecture should be seen as a starting point for the development of a database management system to be hosted in the cloud; its target audience will be composed of users who require assurance of data consistency.

Luiz André Portes Paes Leme, Universidade Federal Fluminense, Brazil
Assessing Recommendation Approaches for Dataset Interlinking

Abstract: Whenever a dataset is published on the Web-of-Data, an exploratory search over the datasets already published must be performed to link resources. To tackle this problem, this project introduces and compares approaches for selecting datasets that most likely can be linked to that being published. The approaches use statistical and social network techniques to compute the relevance of the datasets from selected sets of their features. Moreover, the project aims at producing a full comparison between the approaches, using different feature sets. The experiments will use real-world datasets extracted from catalogues of datasets, VoID descriptions and Linked Data crawlers in order to show the effectiveness of different approaches.

Marcelo Valadares Galdos, Brazilian Bioethanol Science and Technology Laboratory (CTBE) / Brazilian Center of Research in Energy and Materials (CNPEM), Brazil
Using Azure to run an integration of process-based environmental models and geographic information systems

Abstract: Both land use change and crop management changes under sugarcane, a key bioenergy crop, are intensively occurring on a large scale in Brazil. This project aims at using cloud computing to integrate process-based modeling and a Geographic Information System to run simulations of soil carbon stocks at the regional scale. The GEFSOC (Global Environmental Facility - Soil Organic Carbon) system utilizes a dynamic ecosystem model in combination with an MPI-like run control process written in PERL, and a basic relational database implemented in MySQL. The Microsoft cloud will be useful in scaling parallel processing demands for large-scale ecosystem modelling.

Marta Mattoso, Federal Univ Rio de Janeiro, Brazil
User-Steering Phylogenetic Workflows in the Cloud

Abstract: This research aims at helping monitoring large-scale parallel execution of bioinformatics phylogenetic analysis workflow in the cloud. These workflows may be deployed as services and its real-time provenance can be queried during the execution so that the scientist can take actions like staging out part of the results such as a phylogenetic tree for visualization.

Milton Cezar Ribeiro, São Paulo State University, Brazil
Integrating phonological, landscape, fauna movement and remote sensing massive data and processing throughout e-Science and cloud computing

Abstract: The power that e-Science and cloud computing will provide for the already funded (by FAPESP and Microsoft Research Institute) e-Phenology project: (a) to develop models, methods and algorithms to support extraction, management, integration, and analysis of phenology data systems from various scales; (b) to develop novel methods to understanding and simulate landscape-based spatio-temporal connectivity at varying scales; (c) to integrate movement of fauna coupled with phenological and landscape data, in order to understand the interplay between space and time with fauna movements; (d) to combine highly complex data analysis and simulations of movement of fauna in simulated neutral landscapes.

Rafael Duarte Coelho dos Santos, INPE - Brazilian National Institute for Space Research, Brazil
Prototype Deployment of a Data Server for the Brazilian Weather and Climate Virtual Observatory

Abstract: We are developing tools for the Brazilian Weather and Climate Virtual Observatory. Some of the tools are based on Thematic Data Servers—databases and data-access web services that will provide information from a specific sensor network, simulation project, etc. We propose using Microsoft Azure to demonstrate how a data provider can create one thematic data server and connect it to the Virtual Observatory Register so the data and services on the cloud-based thematic data server can be discovered and used together with the virtual observatory tools.

Ricardo da Silva Torres, Institute of computing, University of Computing, Brazil
Big Image Data Management on the Cloud for e-Science Applications

Abstract: The project we propose here has as objective the investigation of novel algorithms, tools, and solutions aiming the storage, retrieval, and analysis of huge volumes of image data using Big Data technologies. The versatility and the relevance of this proposal will be demonstrated in Remote Sensing Image Recognition applications, which seek to interpret and enrich information obtained from satellites and aerial photos.

Alvaro Soto Arriaza, Pontificia Universidad Católica de Chile, Chile
Cloud-Based Visual Ontology for Contextual Visual Recognition

Abstract: The goal of this proposal is to build and test a visual ontology that is able to use visual information extracted from an image by state-of-the-art inductive detectors and refine it using common sense semantic networks in order to obtain meaningful descriptions on scenes and deducting contextual information. The visual ontology will work by mixing information from big visual databases and common sense semantic networks.

Andres M. Pinzon, Center for Bioinformatics and Computational Biology of Colombia, Colombia
A cloud-based system for the integration of molecular data and biodiversity information for Colombian species

Abstract: Biodiverse countries like Colombia have a huge potential for the finding of new or better biological components to be used on different areas such as the health sector and general industry. In order to take advantage of this potential and to become really competitive in a new worldwide bioceconomy based on biological resources and information, is fundamental to deeply describe, list and analyze our biological and molecular resources. Here we propose the development of a Microsoft Azure cloud service that links taxonomy information from all known Colombia’s species to already published molecular data from public database. This system will promote and support the development of biotechnological policies, bioprospecting strategies as well as law enforcement against biopiracy and national research programs for sustainable natural resources exploitation.

Matthew Bawn, University of San Martin de Porres, Peru
The Use of the Cloud as a Computational Platform for Genomic Analysis

Abstract: With the advent of next-generation sequencing (NGS), medicine is progressing towards the prevention of genetic risk factors for common diseases such as glaucoma. Glaucoma describes a group of disorders characterized by a distinctive optic-nerve damage that causes progressive and irreparable defects in the optical field, which if left untreated, result in complete blindness. We will apply NGS to investigate the genetic basis of glaucoma in Peruvian populations, who as with other Latin-American populations are severely under-represented in current genome databases. The production of interpretable data from NGS will be achieved via the application of Microsoft Azure based high-performance computational analysis.

Hugo Andres Neyem, Pontificia Universidad Católica de Chile, Chile
Improving the Preservation of Latin America’s Wildlife through a Cloud Shared Workspace

North America

Philippe Desjardins-Proulx, Université du Quebec, Canada
Growing Intelligence with Cloud Markov Logic Networks

Abstract: Modern methods from artificial intelligence and machine learning are now widely used, from business intelligence to ecology, for their ability to build effective predictive models. These methods are at the vanguard of the data revolution, allowing us to tackle complex problems for which no simple models have been found. Yet, unlike humans, most machine-learning algorithms are unable to build on prior knowledge to find better models, a process called "transfer learning". We use the flexibility of Microsoft Azure to develop new techniques for transfer learning based on probability theory and logic, with a focus on applications to biodiversity.

C. Titus Brown, Michigan State University, United States
Open assembly and analysis of large sequencing data sets

Abstract: We propose to execute existing cloud computing pipelines for de novo sequence assembly in the Azure cloud on a substantial number of data sets. We have three goals: first, improve the breadth of our knowledge about the natural world by making useful summary analyses available for existing and new data sets; second, execute well-defined and open protocols on the data, and retain detailed provenance information; and third, drive open biological science forward by analyzing people’s data in exchange for making it open.

Chaowei Yang, GMU, United States
Spatial Cloud Computing: A Practical Approach

Abstract: Cloud computing is redefining the possibilities of many geoscience disciplines. We wrote a book to introduce cloud computing to the geoscience communities. The book includes slides and hands-on examples for deploying, optimizing, and operating applications in cloud to serve as a text. This proposal is to develop a Microsoft Azure version of the examples so that the geoscience communities can not only learn how to use other cloud services, but also could stand up their own geoscience applications in Azure when finishing reading the book.

Chih-Yuan Yang, University of California, Merced, United States
Single-Image Super-Resolution: A Benchmark

Abstract: A large-scale experiment is proposed to test state-of-the-art single-image super-resolution algorithms in order to build a systematic performance benchmark of existing methods. Due to the tremendous computational load, the experiments are best executed on a scale extendable computing platform such as Microsoft Azure. We have implemented several algorithm for run the experiments. The generated results will be the content of a paper submitted to a top-tier computer vision conference.

Hans J Johnson, The University of Iowa, United States
Azure Cloud Testing and Algorithm Reproducibility in Medical Image Analysis

Abstract: We propose the application of 200,000 CPU hours and 30 terabytes of storage to run CDash builds of proposed patches to ITK, reproducibility tests of Insight Journal submissions, and reproducible analysis for ITK community members.

Huy T. Vo, New York University, United States
Building a 3D Model for New York City Using LiDAR Data

Abstract: The benefits of having an accurate model of New York City (NYC) are enormous in many research areas of urban informatics, which often uses spatial correlation of multiple data sources to better understand how cities work. Unfortunately, there is no 3D model available to the researchers in urban informatics. Instead, we have approximately 20 billion LiDAR points (1TB of raw disk space). The project is to construct an actual 3D model of NYC from this massive point cloud. Given the size and complexity of the computing involved, we request to use Microsoft Azure to support this computation.

Jason Slepicka, University of Southern California, United States
Big Karma

Abstract: Karma is an open source information integration tool that learns how to assist users in cleaning and normalizing data, modeling its semantics, and publishing it in a variety of forms including RDF for the Linked Data Cloud. In order to handle much larger datasets in size and complexity, we are in the process of moving Karma’s machine learning algorithms and processing to the cloud. By moving to Microsoft Azure, Karma would gain access to a scalable Hadoop environment and free users from managing scaling infrastructure. This will make Karma more capable and available for far more users than otherwise possible.

Matthew Graham, California Institute of Technology, United States
A study of quasar variability

Abstract: Quasars are one of the most important class of astronomical objects and are highly variable. However, the mechanism of their optical variability is poorly understood. We propose a definitive study to model the time series of 200000 quasars to determine the best-fitting stochastic description of their variability and look for correlations between variability features and physical parameters, such as black hole mass, that will help distinguish between different possible variability mechanisms.

Judy Qiu, Indiana University, United States
Extending Twister4Azure and Integration with Apache Big Data Stack

Abstract: We have shown that Iterative MapReduce is a powerful programming model for data intensive applications on both cloud and HPC platforms. We have developed Twister4Azure which is an implementation of these ideas on Azure. Our recent work has focused on a generalization of MapReduce to a Map-Collective model where the “reduce” phase in MapReduce is supported by a library of powerful optimized collective communication routines covering operations like (all)reduce, scatter, gather, broadcast, regroup, combine, and merge, which cover the key primitives in MapReduce and MPI. We showed that the same collectives could be added to Hadoop with a significant performance increase. Our optimized broadcast collectives for Twister enabled clustering with millions of centers. We integrate these ideas and request Azure time to perform the cloud-based Map-Collective research.

Kui Ren, University at Buffalo, State University of New York, United States
The Power of Indoor CrowdIndoor—3D Maps from the Crowd

Abstract: In this work, we address a critical task of reconstruct indoor large-scale 3D model from crowd-sourced images. We propose, design, and try to implement IndoorCrowd, a smartphone empowered crowdsourcing system for large-scale indoor 3D scene reconstruction. IndoorCrowd fills a gap in current cloud-based 3D reconstruction systems as it ensures at mobile side that the captured image set fulfills desired quality for indoor large-scene 3D reconstruction. At the cloud side, we deploy an automated image-based 3D reconstruction pipeline, which generates 3D models from images and sensor data.

Lawrence A. Husick, Quantum Cures Foundation, United States
Cloud Based Drug Discovery for Malaria

Abstract: Quantum Cures Foundation is a nonprofit that discovers new drugs for known disease targets and provides those new drug designs to the research community as "open source" for further development (see www.quantumcures.org). The design of new drugs is based on a cloud computing platform with high fidelity molecular modeling which is provided to Quantum Cures by TeraDiscoveries (www.teradiscoveries.com). The drug discovery platform, called Inverse Design, has been validated and runs on Azure. For this project, we propose to design a new drug to combat Malaria by inhibiting three different mutations of the known Malaria protein target pfDHFR-Ts.

Robert Boissy, University of Nebraska Medical Center, United States
Secure, timely, open, pro bono cloud-based data management and analysis services for the molecular detection and continuous international molecular surveillance of known and emerging pathogens

Abstract: Global health and agricultural and economic development would all benefit from the cloud-based deployment of secure, timely, open, pro bono data management and analysis services that are designed to help the physicians, veterinarians, agronomists, biologists, and related scientists and public officials responsible for studying and responding to known and emerging human, animal, and plant pathogens. This proposal describes the development and deployment of a limited number of such services. More importantly, the formation of a computing industry coalition is also proposed that could ensure that these and other similar cloud-based services are further developed, enhanced, and maintained.

Yang(Jon) Zhang, Cornell University, United States
BioHPC Azure Integration

Abstract: With new next generation sequencing techniques producing ever increasing amounts of data, demand for HPC resources for biological research have increased as well. BioHPC was created at Cornell CBSU in part to address this demand. We believe that an Azure based cluster will further increase the flexibility of BioHPC allowing the users to access a much larger pool of computational resources, and also help reduce the cost and simplify the process of installing future instances of BioHPC.

Zhenlin Yang, Oregon State University, United States
Forest Mortality, Economics and Climate Change

Abstract: We have a five-year project FMEC on modeling drought mortality and predicting vulnerability with Community Land Model (CLM) 4.5 over western North America, and linking it with an economic model to evaluate both carbon and economic implications of mitigation actions. We want to utilize 30–50 cores of Azure to run Fortran-written CLM 4.5 for the model spinups and simulations. This will greatly help the progress of FMEC project and help our understanding of forest mortality in Western US.

Thanh N. Truong, University of Utah, United States
Engaging Citizen Scientists in Computer-Aided Drug Discovery

Abstract: Azure cloud computing can open new opportunities for citizen scientists to engage in the actual scientific discovery. The development of e-Science Community Laboratory, a cloud-enabled web portal allows the public to contribute to computer-aided drug design research, and is accessible anytime, anywhere, and by anyone. It uses the crowdsourcing technology to create a social network of citizen scientists within the existing social networks. This is done by changing the virtual drug screening process into a game that allows interested individuals to play against others on who can pick a better drug candidate for a certain disease target.

Viswanath Nandigam, University of California San Diego, United States
Integration of cloud based on-demand geospatial processing services into community earth science data facilities

Abstract: OpenTopography is a NSF funded earth science facility that provides online access to
high-resolution topography data and tools. By leveraging the underlying SOA design of the OpenTopography system, we plan to develop a pluggable services infrastructure that will allow processing routines developed by the community on external cloud resources like Azure to be plugged into the existing OpenTopography system workflow so that the entire community of users can benefit from the new functionality e.g. change detection, differential analysis and time series analysis between datasets.

Yung-Hsiang Lu, Purdue University, United States
Cloud-Based System for Continuous Analysis of Many Cameras

Abstract: Many cameras have been deployed for various purposes, such as monitoring traffic, watching natural scenes, and observing weather. In many cases, the video streams are not analyzed by computers and simply discarded.

Analyzing videos requires significant amounts of computing resources and cloud computing may meet this need. This project aims to develop a cloud-based software infrastructure for analyzing video streams. This infrastructure manages the cloud resources and meets the performance requirements of the analysis programs. The requirements are affected by many factors, including the videos' frame rates, resolutions, scene contents, as well as the types and the numbers of cloud virtual machines. Through this software infrastructure, analysis programs run on Microsoft Azure and process many videos simultaneously. This Microsoft Azure Award allows our team to experiment different approaches designing and developing the software infrastructure.

Ka Yee Yeung-Rhee, University of Washington, United States
Inference of Gene Networks Studying Human Cancers on the Cloud

Abstract: Despite the generation of big data measuring molecular activities of all genes simultaneously, the majority of genes within most species are not mapped to any regulatory or biochemical pathways. Gene networks capture the interactions between biological entities. The inference of gene regulatory networks from big data in biology is of broad interest to scientists, and poses challenging research problems. We will develop computational methods and tools for the construction of gene networks on Microsoft Azure, and the subsequent application of our methods to infer gene networks using large complex biological data studying human cancers.

A. Lucas Stephane, Florida Institute of Technology, United States
Life-Critical Interactive Glass Wall Integration

Abstract: This proposed research and innovation work focuses on complex life-critical information visualization for the current NASA KSC Glass Wall project focusing on rocket launches, where every piece of information counts to insure safety, efficiency and comfort of the underlying space mission. We plan to use Microsoft Azure as an Infrastructure as a Service (IaaS) to explore and anticipate the future NASA KSC service-oriented architecture and enterprise solution. Based on the use of the external Microsoft Azure solution, HCDi will make recommendations to NASA KSC for balancing private (internal) and cloud (external) infrastructures.

Alexander Vyushkov, University of Notre Dame, United States
Modeling Malaria Transmission on Windows Azure

Abstract: We request 195,000 hours of Microsoft Azure compute instance time to generate and fit baseline simulations for use in a decision support tool. The requested compute instance time will be used to fit and pre-compute historic baseline simulations that will then be used as starting points for simulations run on demand by the decision makers who are evaluating the impact of planned malaria interventions. This is a feasibility study, which—if conducted at full scale—will enhance and accelerate research we are performing for the Bill & Melinda Gates Foundation under a project called VecNet.

David Hazel, University of Washington, United States
AMADEUS - Azure Marketplace of Applications for Diverse Environmental Use as a Service

Abstract: UW CWDS will use the AMADEUS platform, hosted on the Microsoft Azure stack, to foster an ecosystem of environmental data applications (both web based and mobile) that consume our models and data sources to integrate them into a variety of modeling scenarios. We will use WWT for 1) simultaneous visualization of multiple datasets, 2) exposing correlations and detected patterns map layers, 3) providing tutorials and example applications to bootstrap others interested in using WWT as a discovery platform. Additionally, partner with domain experts to prepare narrated guided tours for teaching curriculums for university programs in urban planning, GIS and environmental sciences.

Dhruv Batra, Virginia Tech, United States
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service

Abstract: We propose to build CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms on the cloud, as a service to the community.

Hanspeter Pfister, Harvard University, United States
Connectome

Abstract: The goal of the Connectome project is to reverse engineer circuits in neocortex to better understand the brain processes information. We will analyze super resolution volumes generated by electron microscopes to segment neurons and synapses. Data magnitude and noise makes this a significant challenge.

Kelly Smith, University Corporation for Atmospheric Research (UCAR), United States
The Unidata Integrated Data Viewer (IDV) as a Cloud Service

Abstract: The Integrated Data Viewer (IDV) is an open-source application for interactive visualization and analysis of geoscience data1. It is developed at Unidata, a University Corporation for Atmospheric Research (UCAR)’s Community Program (UCP). The IDV enables users, typically scientists or educators, to integrate, subset, and visualize complex data. The IDV is a resource- intensive application, requiring 1GB of RAM or more on a Java Virtual Machine (JVM). With the current shift from traditional laptop- and desktop-style computers towards mobile devices, Unidata is investigating strategies for providing interactive IDV visualization via cloud-enabled technologies.

Richard Dana Loft, National Center for Atmospheric Research, United States
AzurePlanet: A cloud-based system providing access to weather and climate information

Abstract: AzurePlanet will provide integrated and engaging public access to Information about past weather and future climate anywhere in the world, for the period 1979-2100. For performance reasons, AP will focus on key 2D variables such as surface temperature, pressure, precipitation, winds and humidity. AP will use peer-reviewed data sets and proven methods from geostatistics to ensure its scientific accuracy. Streaming visualizations of global weather (e.g. of major hurricane or winter storm events) will provide an engaging “on-ramp” for students, educators and the general public, encouraging them to delve further into weather and climate information.

Susan Borda, California Digital Library, United States
DataUp

Abstract: DataUp is now operational, but it has not reached its full potential as a tool for archiving datasets from a range researchers and disciplines. DataUp has the potential to become a key tool in research data sharing and archiving as envisioned by the NSF DataNet program. To that end, our major project goals for DataUpv2 are to (1) enhance the tool’s user experience and add features, and (2) build the open-source community around DataUp.

Tanya Berger-Wolf, University of Illinois at Chicago, United States
Computational Behavioral Ecology on the Cloud

Abstract: The goal of our project is to develop tools and models to collect and take location traces data of social animals and to create predictive and descriptive models of individual and collective movement, taking into account micro and macro behavior, ecological and habitat context, and the interactions with other individuals.

Yuejie Chi, The Ohio State University, United States
Online Distributed Inference of Large-Scale Data Streams in the Cloud

Abstract: The proposed research develops a novel sensing scheme, called covariance sketching, to acquire the covariance structure of high-dimensional data streams in resource-constrained environments with mathematically provable performance guarantees. Furthermore, adaptive algorithms are designed to track underlying covariance structural changes in a dynamic data stream. The usage of the Microsoft Azure cloud platform allows both theoretical and practical advancements of the proposed research by exploring fundamental trade-offs in regimes which may otherwise be impossible and by applying it to real-world large-scale video and social network data.

Animashree Anandkumar, University of California, Irvine, United States
Large-scale Unsupervised Learning via Tensor Methods: Applications in Social Networks and Text & Image Analysis

Jaime Ruiz, Colorado State University, United States
EQStratus: On-Demand Genome Assembly Using Cloud Infrastructure

Jim Nelson, Brigham Young University, United States
Flood Early Warning System in the Cloud

Kenneth H. Buetow, Arizona State University, United States
Using Data Science Approaches To Map Biologic Processes To Clinical Phenotype And Outcome

Richard P. Hooper, Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), United States
Transforming Water Science with Featured-Based Data Access