Congratulations to the selected projects from the Windows Azure for Research Award program! We look forward to learning how the cloud will help you collect, filter, analyze, and share data across themes such as life sciences, urban sciences, environmental sciences, ecology, and geosciences. You can read abstracts of the selected projects below, which are organized by region.
Soeren Balko, Queensland University of Technology, Australia
Abstract: With "relocate.js", we aim to address fundamental challenges of today's data centre-based cloud computing: poor average hardware utilisation, the negative environmental footprint, and privacy concerns of remotely processing confidential data. We do so by introducing a virtualization approach which dynamically identifies resource-intense components of some cloud-based application and transparently relocates and executes these on the client. Our R&D work addresses associated challenges of statically analysing and cross-compiling legacy code; optimally identifying relocatable application components in real time, subject to multiple optimisation goals; and preserving security of backend application code that runs on the client.
Jian Zhang, University of Technology, Sydney, Australia
Friends Recommendation Based on Graph Correlation
Abstract: In social multimedia, people’s online friendship is related with their behaviours. This project aims to correlate their contact graph and their image tag graph for further recommendations in a large dataset.
Yuedong Yang, Griffith University, Australia
Cloud-based Platform for Genome-scale Prediction of Protein Functional Complex Structures at Experimental Quality
Abstract: Exponential increase in the number of proteins with unknown functions calls for bioinformatics databases with high-quality function prediction to guide experimental studies. A commonly used method is to infer function by sequence homology, but it can cover only a small part of proteins. Hence, it is critically important to identify protein functions in high confidence but in the absence of sequence. Here, we plan to expand our successful template-based approach for function prediction that goes beyond RNA/DNA-binding functions, and make genome-scale high-quality functional annotation. The resulting database will be freely available to scientific community of molecular biology and bioinformatics.
Guangjun Zhang, Peking University, China
Machine learning – parameter estimation for groundwater flow and transport models based on Windows Azure Cloud
Abstract: Machine learning – parameter estimation and interactive analysis of various multi-scale field data with visualization of modeling results on the Windows Azure cloud-computing environment will be a thrilling research spot. This project will implement a parallel parameter estimation (PPEST) approach on the Microsoft cloud for a regional groundwater flow model and mass transport model. The sensitivity and uncertainty of the distributed model parameters will be quantified. Then the optimized modeling results and interactively conduct scenario analysis will be visualized. This study will provide useful information and insights for earth science model researchers.
Huayi Wu, Wuhan University, China
Collaborative Geoprocessing on Windows Azure
Abstract: With the trend towards distributed computing, a geoprocessing algorithm is often encapsulated into a geoprocessing service, providing a standardized interface for web invocation. To build large-scale geoprocessing applications as complex geospatial simulation models, several geoprocessing services can be integrated into a geoprocessing service chain, which enables interoperable, distributed, and collaborative geoprocessing that significantly enhances the capacity to derive geoinformation and knowledge over a network. We designed a collaborative geoprocessing framework, GeoSquare, for geoscientific data/algorithms/models sharing and geoprocessing orchestration. The basic architecture of GeoSquare is service oriented. GeoSquare demonstrates a feasible way to achieve a geoprocessing computing paradigm for the future.
Jitao Sang, Chinese Academy of Sciences, China
Cyber-Physical Footprint Association: Cloud Storage and Computing
Abstract: User behavior is one of the most important sensors towards urban computing. In this project, we propose to exploit the user footprint on both cyber and physical worlds to address the data collection problem in urban computing. In particular, this project is featured in two-fold: (1) cross-platform user behavior data aggregation, and (2) cyber-physical footprint association. We are first motivated to aggregate the user online activities across different social platforms to derive a cross-platform cyber footprint, and then associate the cyber footprint with physical footprint constructed from real-world sensors. Windows Azure well fits to our project for its powerful storage and computing ability. We are applying for Windows Azure resources to storage the raw user heterogeneous behavior data, associate the cross-network behavior data to obtain social media footprints, and deploy the urban computing demo system.
Junjie Wu, Beihang University, China
A System for Heterogeneous Social Media Big Data Analytics in Azure Cloud
Abstract: The explosively growing UGC is of great commercial and research values. This project aims at mining UGC and providing high-quality information services to various kinds of organizations. With a prototypical system built in former national scientific projects, we expect improving it to a business standard with the help of services provided by Windows Azure.
Lei Zou, Peking University, China
Graph Data Management in Urban Computing
Abstract: An inherent challenge in urban computing is the large-scale heterogeneous data management. In this project, we focus on "graph data.". As one of the most popular and powerful representations, graphs have been used to model many application data in urban computing, such as traffic network and location-based social networks. We first design a general-purpose graph engine for urban computing, and then demonstrate the usefulness of graph data management in urban computing by two interesting applications in traffic network and social network.
Xinbo Gao, School of Electronic Engineering, China
Videos analysis and recommendation for online learning
Abstract: With the significant development of the Internet and multimedia related techniques, online learning has become an important supplement to traditional education in classroom. Videos have become the most popular study materials. There are, however, lack of connection between these videos, and learners have no convenient way to find the similar and most related videos for learning. To overcome these problems, we plan to build a platform to process and analyze education videos, build table of content for video searching and recommendation, and design an online learning community for recording, sharing knowledge. This platform need a large number of storage and computational resource which is what we expect from Windows Azure.
Yan Xu, Beihang University, China
Large-scale histopathology image analysis for colon cancer in Azure
Abstract: Histopathology images are critically important means in the colorectal cancer diagnosis and treatment. A standard histopathology slice can be scanned at a resolution of e.g. as big as 200,000 ×200,000, which makes most existing tools for histopathology image segmentation, clustering, and classification infeasible to apply on a single machine. We developed a parallel multiple instance learning algorithm on Windows Azure. We used manual features and parallel multiple instance learning (P-MIL) in the previous framework. To further improve the performance, we will study automatic extraction of fine-grained information form coarse-grained labels using deep learning.
Yuan Juli, Zou Hengming, Shanghai Jiao Tong University, China
A Distributed Algorithm for File Distribution and Replication on Cloud Platform
Abstract: Caching and replication of popular data objects contribute significantly to the reduction of the network bandwidth usage and the overall access time to data. Our focus is to improve the efficiency of object replication within a given distributed replication group .Given the request rates for the objects and the server capacities, find the replica allocation that minimizes the access time over all servers and objects. We design a distributed approximation algorithm that solves this problem and prove that it provides a 2-approximation solution. We also show that the communication and computational complexity of the algorithm is polynomial.
Tony Tung, Kyoto University, Japan
Abstract: Information is collected in various forms from all over the world and beyond. However, it is still difficult for each individual to access, get exposed to, or grasp all existing resources. Hence, we propose to globally enrich the knowledge one could have locally using state of the art map tools, and scalable 3D reconstruction and visualization techniques. Particularly, in addition to existing satellite images and static 3D object representation (e.g., provided by Bing Maps and its competitors), we propose to introduce dynamic information that can be either reconstructed in real-time (e.g., from user inputs), or simulated based on live measurements (e.g., using weather forecast, sun and moon positions, traffic information, etc.). The project outcome could potentially outclass existing 2D/3D map representation by providing to users informative real dynamic 3D snapshots of any place on Earth. For example, one could observe traffic congestion in New York City in real-time, see London on a rainy day, or enjoy –privately- the sight of the Eiffel tower by night, all from real or simulated 2D/3D objects. Additional information about local air quality, available resources (e.g., housing, energy, network, etc.), social behavior, and trends could also be provided. We believe the recent advances in cloud computing technology can handle the challenges of dynamic big data management, multimodal sensing, and scalable real-time 3D vision.
Hwasoo Yeo, Korea Advanced Institute of Science and Technology, Korea
Cloud+P58:Q59 Sensing based Urban Travel Time Prediction with Online Traffic Simulator
Abstract: The objective of this project is to develop a software module that provides the information on the life patterns of the people in urban area. Three major focusing points of this project are established. The first major focus is developing the methodology for categorizing the entire time-dependent location data of smartphones by each transportation mode. The second major focus is developing the methodology for predicting future demand of each transportation mode using the categorized data. The third major focus is developing the methodology for predicting the travel time of highway and arterial road links using the information of predicted future demand.
Hyunju Lee, Gwangju Institute of Science and Technology, Korea
Text mining for identifying disease-gene-biological relationships
Abstract: Genes usually contribute to the development of diseases through biological events such as gene expression, regulation, phosphorylation, localization and protein catabolism. Our disease-gene search engine, DigSee (http://gcancer.org/digsee), services the sentences from MEDLINE abstracts with identified triple relations that ‘which genes’ are involved in the development of ‘which cancer' through ‘which biological events’. Since the current version of DigSee supports only cancer, our goal is to incorporate more diseases types other than cancers into the system. This new system will allow researchers working on various types of diseases to search which genes are related to the disease through which biological events.
Joon Heo, Yonsei University, Korea
Does ‘Gangnam Style’ really exist? - Answers from data science perspective
Abstract: Based on open datasets from city of Seoul, we conduct two major analyses: (1) among numerous polygonal spatial clustering algorithms, which one would be the most suitable for such high dimensional (several thousands) datasets?; (2) based on the finding from the above analysis, what kind of attributes are governing factors for differentiating between Gangnam and other districts? The first and the second analyses can be applied to any other district (Gu) and characterize its differentiating attributes, which would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters.
Muhammad Bilal Amin, Kyung Hee University, Korea
Enabling Data Parallelism for large-scale Biomedical Ontology Matching over Multicore Cloud Instances
Abstract: Ontology matching is among the core techniques used for integration and interoperability resolution between biomedical systems. However, due to the ever-evolving nature of biomedical data, ontologies are becoming large-scale and complex; consequently, leading to performance bottlenecks during matching. We present a parallel ontology matching system for large-scale biomedical ontologies that implements data parallelism over multicore cloud platform for performance benefits. Our system decomposes these ontologies into smaller and simpler subsets depending on matching algorithms. Matching process over subsets is divided from granular to finer-level abstraction of independent matching requests, matching jobs, and matching tasks, running in parallel over cloud instance. Matched results are aggregated to generate mediation bridge ontology.
Chu Hong Steven Hoi, Nanyang Technological University, Singapore
Cloud-Based Mobile Recommender Systems by Online Collaborative Filtering Techniques
Abstract: This project aims to investigate a novel framework of on-the-fly mobile recommender systems in the cloud for enhancing the human living experience by mining GPS trajectory data. Unlike the conventional mobile recommender systems that use batch collaborative filtering techniques, we propose a family of novel online collaborative filtering techniques, which are far more efficient and scalable than the existing techniques for building the on-the-fly mobile recommender systems. The Windows Azure platform helps us to resolve the cloud-based storage and computation tasks in our experimental studies and system implementations.
James M Hogan, Queensland University of Technology, Australia
Bing for Genomes – Information Retrieval Approaches to Genomic Search and Comparison
Tomasz Bednarz, CSIRO, Australia
Image Analysis in Azure Clouds
Guangzhong Sun, University of Science and Technology, China
Smart Campus Construction Based On Rich Campus Datasets
Victor O.K. Li, The University of Hong Kong, China
A Big Data Stream Processing Solution for Hidden Causality Detection of Urban Dynamics
Wenjun Wu, Beihang University, China,
Cloud based MOOC Platform for Self-organized Learning
Yanmin Zhu, Shanghai Jiao Tong University, China
NoiseSense: Crowdsourcing-based Urban Noise Mapping with Smartphones
Peng Gong, Tsinghua University, China
Satellite Remote Sensing for Urban Computing—40 Year Dynamic Information on Land Use for Beijing City from Time Series Landsat Data and Computer Simulation
Aoying Zhou, East China Normal University, China
COBA: Sensing Urban Lifestyle based on Collective Online Behavior Analysis with Windows Azure
Yuguo Li, University of Hong Kong, Hong Kong
SmartComfort - Use of Smartphone and Cloud Technologies for Building Thermal Comfort, and Ventilation and Health Studies in Megacities
Hansol Lee, Korea Military Academy, Korea
Employing a Customized Web-Based Corpus Program For Language Learning
Hojung Cha, Yonsei University, Korea
Development of a Crowd Sensing Framework for Inducing User Participation in Urban Environments
Europe and Russia
Romain Rouvoy, University Lille 1, France
ApiSwarm : Elastic processing of crowd-based datasets
Abstract: This project intends to leverage the APISENSE crowd-sensing platform (http://www.apisense.fr) to support the real-time processing of «big» scientific datasets collected in the physical world from a «large» crowd of smartphones. Examples of case studies covered in this project include the automatic inference of roadmaps, the continuous cartography of network coverage quality, or even the detection and the dynamic analysis of earthquakes. As the volume of data to be collected in the wild is unpredictable, APISENSE requires the adoption of elastic computation models and infrastructures offered by Windows Azure to continuously provision the processing capabilities to fit the feeds of data reports.
Gabriel Antoniu, INRIA, France
Z-CloudFlow: Advanced Data Storage and Processing for Multi-site Cloud Workflows
Abstract: This joint project is carried out within the Joint Inria – Microsoft Research Centre. It addresses the problem of advanced data storage and processing for supporting scientific workflows in the cloud. The goal is to design and implement a framework for efficient processing of scientific workflows executed on several geographically distributed datacenters in the cloud. The main objectives are to reduce the data access latency and minimize the workflow makespan. We are trying to give reasonable answers to following research questions: how to optimize communication and/or computation overhead? How to group tasks and dataset together to minimize data transfers? How to do workload balancing between data centers to avoid bottlenecks? What strategies to use and how for efficient data transfers?
Patrick Hénaff, Université Paris 1, France
Euclide Quant Network
Abstract: Zanadu is a computational finance platform. It provides benchmark data and standard models of mathematical finance, implemented in a real-life environment. The models are presented as IPython notebooks that combine a mathematical description of each model, the code, and calculation results. The platform will be tested by scientists in three research labs, and by graduate students at four universities throughout Europe. The platform is complemented by an on-line forum and reference library, in order to build a scientific community around the project. The quarterly “Parisian Model Validation Seminar”, will provide a venue to discuss the platform.
Frederic Magoules, Ecole Centrale, Paris, France
Advanced Linear Algebra Libraries for the Cloud
Abstract: Our research consists to offer advanced linear algebra libraries to research communities in academia and in industries. The goal of this proposal is to turn our advanced linear algebra libraries into usable applications on Windows Azure.
Jean-Charles Régin, University Nice-Sophia Antipolis, France
Using Windows Azure for High Performance Computing
Abstract: We propose to use cloud computing for solving complex optimization problems and to give to the community the opportunity to use automatically a cloud computing based solver for solving their problems.
Frank Hutter, Freiburg University, Germany
CloudEval: Towards Community-Based Performance Testing and Optimization
Abstract: Applications ranging from software verification to industrial optimization require effective solutions to hard combinatorial problems. Modern heuristic algorithms solve these problems well in practice, but it is not well understood which heuristics and which parameter settings work best on which types of problem instances. As progress is mostly driven by empirics, CloudEval aims to provide a shared platform for the continuous evaluation of solvers on benchmarks. CloudEval will use state-of-the-art automated algorithm configuration tools to customize each solver for each benchmark and will maintain and advertise a leader board of solvers for each category.
Liliana Pasquale, University of Limerick, Ireland
Minority Report: Using the Cloud to Enable Proactive Digital Forensic Investigations
Abstract: The Minority Report project aims to develop an open source toolset and a demonstrative data set to support proactive digital forensic investigations in large and distributed systems. This tool can be very useful for system administrators and investigators since it can suggest in advance the likelihood of potential hypotheses of a crime. To exhibit good performance, even for large sets of events, our tool will leverage the HPC capabilities provided by Windows Azure to decompose each crime hypothesis into smaller and more tractable sub-hypotheses. We will perform the analysis of crime hypotheses by using the Event Calculus and SMT solvers (e.g., Z3).
Elisabetta Di Nitto, Politecnico di Milano, Italy
sAfe CitiEs through clouD and Internet-of-Things (ACED-IoT)
Abstract: ACED-IoT will develop a scalable and reliable prototype of a safety planner for the management of incidents, for example, a fire or a flooding occurring in a large area. The safety planner will exploit an IoT infrastructure and social networks to gather data, and Windows Azure as cloud infrastructure for the back-end data processing and for planning the actions to support safety squads during the emergency.
An area with high-density population will be considered as reference scenario. The system will be self-adaptive and, in particular, will detect and predict the potential failure of the used sources and take proper recovery actions.
Marek Stanislaw Wiewiorka, Warsaw University of Technology, Poland
Towards an interactive secondary analysis of RNA sequencing data service in Widows Azure cloud with Apache Spark framework
Abstract: We would like to continue our current research on the application of new MapReduce frameworks, mainly Apache Spark and Stratosphere in genomics and transcriptomics. Using Windows Azure with the requested resources would give us a chance to find an optimal, fast, scalable, and interactive RNA sequencing analysis service that can be used by the whole bioinformatics community. Our project is complementary the work done by other researches on SeqInCloud (powered by Microsoft Hadoop on Azure [HoA]) solution that provides genome analysis pipeline for DNA.
Evgeny Rogaev, Vavilov Institute of General Genetics, Russian Academy of Science, Russia
Alzheimer Bio Project
Abstract: The goal of the project is to analyze in Windows Azure Alzheimer human sequence data (the Institute have public open data available) and analyze the annotations of various effects for Alzheimer disease. The project will allow prediction and drug analysis. Our partners are eager to collaborate, provide their bio expertise and their unique bio software pipeline for analysis.
Heiko Schuldt, University of Basel, Switzerland
ADAM+ - A Large-Scale Distributed Image and Video Retrieval System
Abstract: ADAM+ focuses on new strategies and methods towards a modern distributed system that stores, organizes and retrieves multimedia data. To that end it subsumes three tasks: the extraction of features of image and video collection, the execution and analysis of queries and the adaption of features and parameters using machine-learning algorithms. With these tasks, the goal of the project is to create a large-scale distributed image and video retrieval search engine that jointly uses various query paradigms, such as keyword search, query-by-example and query-by-sketch for query formulation and is able to adapt to various collections and features.
Ian Gent, University of St Andrews, United Kingdom
Recomputation of Scientific Experiments
Abstract: The discipline of Computer Science has never treated scientific replication seriously. Not only are experiments rarely replicated, they are rarely even replicable in a meaningful way. Computer science must embrace recomputation (exact replication of a previous experiment) as standard practice. This proposal is to help build the tools and repositories that will make recomputation a credible route for researchers in computer science to make their scientific experiments available online, via the Windows Azure platform and elsewhere. It is also to help build the community of researchers contributing both experiments and expertise and tools to this effort.
Paolo Missier, Newcastle University, United Kingdom
Analysis and Interpretation of Human Exome Sequencing for Clinical Diagnosis and EHR Integration in the Cloud
Abstract: Cloud e-genome is a two-year pilot project for the UK National Health Service. Its aim is to facilitate the adoption of systematic genetic testing in clinical practice, and its integration with Electronic Health Record (EHR) management at population scale. The project explores the use of a cloud computing infrastructure, algorithms and tools to enable clinical diagnosis based on “second generation” human whole-exome sequencing. Our use cases are specifically in the area of rare neurological diseases.
The project is a collaboration between the Institute of Genetic Medicine and the School of Computing Science at Newcastle University, UK. Partial funding for the project comes from NIHR (National Institute for Health and Research, UK), as well as from a gift from Microsoft.
Philip Kershaw, STFC Rutherford Appleton Laboratory, United Kingdom
Abstract: JASMIN is a national environmental data computing facility based at STFC Rutherford Appleton Laboratory in the UK. Established in 2012, over the next two years it will see significant expansion funded through NERC’s Big Data Initiative. NERC is the UK’s main government agency for funding and managing research, training and knowledge exchange for the environmental sciences. The challenges of Big Data affecting this community in terms of increased projected data volumes, complexity of data, and timeliness for results. Cloud provides a solution to address some of these challenges. To this end, the JASMIN development team are working with Microsoft to explore a hybrid cloud model augmenting JASMIN’s private cloud with resources from Windows Azure.
Blesson Varghese, University of St Andrews, United Kingdom
Real-time Catastrophe Risk Management on Windows Azure
Abstract: Currently, simulations employed in the financial industry do not support real-time catastrophe risk management due to their memory, data and computational demands. In this novel research, we propose to develop real-time risk management simulations on Windows Azure by implementing techniques to scale out when the memory, data and computational load increases. An interactive online tool built on HDInsight and Hive, which incorporates the simulation and facilitates real-time catastrophe risk management, will be developed and deployed. This tool will be used to make timely decisions when a catastrophe unfolds and thereby reduce damage to the financial market.
Julio Hernandez-Castro, University of Kent, United Kingdom
ChessWitan: Mining chess data to distinguish human from computer play
Abstract: Our aim is to develop a model able to distinguish between computer and unassisted human play through the analysis of the moves of a chess game. The originating motivation is that the widespread availability of powerful chess-playing programs is tainting chess play at all levels, and new high profile cases are arising continuously, where players receive computer assistance in tournament play, either directly or via an intermediary. There are currently no tools available able to give an objective assessment of the likelihood of cheating, and the only recourse is to time-consuming expert opinion on each individual case.
Nadarajen Veerapen, University of Stirling, United Kingdom
Automated Bug Fixing
Abstract: The goal of this project is to help programmers automatically fix software bugs. This requires detecting and extracting bug fixing patterns found in open source repositories. This information will then be used to generate potential fixes when a bug occurs. After testing the possible solutions, the most likely ones will be presented to the programmer. These steps are computationally expensive and will benefit from running within Windows Azure.
Vassilis Glenis, Newcastle University, United Kingdom
Modelling Flood Risk in Urban Areas
Abstract: Using a unique, fully-coupled surface-subsurface urban flood modelling system that is transferable to other urban areas, the project will use the cloud resource to implement multiple simulations of the model in the cities of Newcastle-upon-Tyne and London, driven by multiple realisations of future climate and rainfall, to determine flood risk and subsequently implement multiple simulations with a range of mitigation solutions designed in collaboration with local stakeholders. The complexity of urban drainage components (sewer networks, drainage paths) and the high-resolution data (land elevation, surface features), will benefit from the computational resources available in Windows Azure.
Peter Coveney, University College London, United Kingdom
Collaborative Computational Project in Systems Medicine
Adam Farquhar, The British Library, United Kingdom
British Library Labs in the Cloud
Anna Izabel J. Tostes, Federal University of Minas Gerais, Brazil
A Collaborative Cloud—based Business Intelligence Platform for Transportation Networks
Abstract: This project consists in developing a collaborative platform of business intelligence to deal with traffic management and planning with real-time smart decision-making, preventive actions and contingency plans. A SharePoint portal using Windows Azure cloud computing platform will be developed using data sources based on sensors (road detectors and cameras along the road), geographical information systems (Bing Maps), and social networks, collecting different views of data traffic. The goal is to analyze these data sets using graph theory and social network metrics, and then perform an analysis using OLAP cube and data fusion, producing traffic metrics and congestion indicators.
Carmem Satie Hara, Federal University of Parana, Brazil
Abstract: The increasing amount of RDF data made available nowadays requires data to be partitioned across multiple servers. The RING project aims at providing a scalable RDF storage system and efficient query processing involving both simple and complex pattern matchings. Our approach is based on a workload-aware method for data partitioning for minimizing inter-server communication during query executions. We intend to store the RDF graph in its native form in order to apply graph exploration instead of costly join operations on triples. A grant from Windows Azure would allow us to evaluate the effectiveness of the solution on its cloud platform.
Fernando da Fonseca de Souza, Universidade Federal de Pernambuco, Brazil
Cloud Databases: A model to guarantee data consistency
Abstract: This work proposes a model for ensuring data consistency in cloud databases. More specifically, it shall define a consistency model appropriate for the context of cloud databases; develop a prototype with a mechanism to implement the created model; and validate the mechanism. Thus, an architecture will be designed considering widely recognized and studied data replication and propagation approaches considering the required adaptations. The proposed architecture should be seen as a starting point for the development of a database management system to be hosted in the cloud; its target audience will be composed of users who require assurance of data consistency.
Luiz André Portes Paes Leme, Universidade Federal Fluminense, Brazil
Assessing Recommendation Approaches for Dataset Interlinking
Abstract: Whenever a dataset is published on the Web-of-Data, an exploratory search over the datasets already published must be performed to link resources. To tackle this problem, this project introduces and compares approaches for selecting datasets that most likely can be linked to that being published. The approaches use statistical and social network techniques to compute the relevance of the datasets from selected sets of their features. Moreover, the project aims at producing a full comparison between the approaches, using different feature sets. The experiments will use real-world datasets extracted from catalogues of datasets, VoID descriptions and Linked Data crawlers in order to show the effectiveness of different approaches.
Marcelo Valadares Galdos, Brazilian Bioethanol Science and Technology Laboratory (CTBE) / Brazilian Center of Research in Energy and Materials (CNPEM), Brazil
Using Azure to run an integration of process-based environmental models and geographic information systems
Abstract: Both land use change and crop management changes under sugarcane, a key bioenergy crop, are intensively occurring on a large scale in Brazil. This project aims at using cloud computing to integrate process-based modeling and a Geographic Information System to run simulations of soil carbon stocks at the regional scale. The GEFSOC (Global Environmental Facility - Soil Organic Carbon) system utilizes a dynamic ecosystem model in combination with an MPI-like run control process written in PERL, and a basic relational database implemented in MySQL. The Microsoft cloud will be useful in scaling parallel processing demands for large-scale ecosystem modelling.
Marta Mattoso, Federal Univ Rio de Janeiro, Brazil
User-Steering Phylogenetic Workflows in the Cloud
Abstract: This research aims at helping monitoring large-scale parallel execution of bioinformatics phylogenetic analysis workflow in the cloud. These workflows may be deployed as services and its real-time provenance can be queried during the execution so that the scientist can take actions like staging out part of the results such as a phylogenetic tree for visualization.
Milton Cezar Ribeiro, Sao Paulo State University, Brazil
Integrating phonological, landscape, fauna movement and remote sensing massive data and processing throughout e-Science and Cloud computing
Abstract: The power that e-Science and cloud computing will provide for the already funded (by FAPESP and Microsoft Research Institute) e-Phenology project: (a) to develop models, methods and algorithms to support extraction, management, integration, and analysis of phenology data systems from various scales; (b) to develop novel methods to understanding and simulate landscape-based spatio-temporal connectivity at varying scales; (c) to integrate movement of fauna coupled with phenological and landscape data, in order to understand the interplay between space and time with fauna movements; (d) to combine highly complex data analysis and simulations of movement of fauna in simulated neutral landscapes.
Rafael Duarte Coelho dos Santos, INPE - Brazilian National Institute for Space Research, Brazil
Prototype Deployment of a Data Server for the Brazilian Weather and Climate Virtual Observatory
Abstract: We are developing tools for the Brazilian Weather and Climate Virtual Observatory. Some of the tools are based on Thematic Data Servers—databases and data-access web services that will provide information from a specific sensor network, simulation project, etc. We propose using Windows Azure to demonstrate how a data provider can create one thematic data server and connect it to the Virtual Observatory Register so the data and services on the cloud-based thematic data server can be discovered and used together with the virtual observatory tools.
Ricardo da Silva Torres, Institute of computing, University of Computing, Brazil
Big Image Data Management on the Cloud for e-Science Applications
Abstract: The project we propose here has as objective the investigation of novel algorithms, tools, and solutions aiming the storage, retrieval, and analysis of huge volumes of image data using Big Data technologies. The versatility and the relevance of this proposal will be demonstrated in Remote Sensing Image Recognition applications, which seek to interpret and enrich information obtained from satellites and aerial photos.
Andres M. Pinzon, Center for Bioinformatics and Computational Biology of Colombia, Colombia
A cloud-based system for the integration of molecular data and biodiversity information for Colombian species
Abstract: Biodiverse countries like Colombia have a huge potential for the finding of new or better biological components to be used on different areas such as the health sector and general industry. In order to take advantage of this potential and to become really competitive in a new worldwide bioceconomy based on biological resources and information, is fundamental to deeply describe, list and analyze our biological and molecular resources. Here we propose the development of a Windows Azure cloud service that links taxonomy information from all known Colombia’s species to already published molecular data from public database. This system will promote and support the development of biotechnological policies, bioprospecting strategies as well as law enforcement against biopiracy and national research programs for sustainable natural resources exploitation.
Matthew Bawn, University of San Martin de Porres, Peru
The Use of the Cloud as a Computational Platform for Genomic Analysis
Abstract: With the advent of next-generation sequencing (NGS), medicine is progressing towards the prevention of genetic risk factors for common diseases such as glaucoma. Glaucoma describes a group of disorders characterized by a distinctive optic-nerve damage that causes progressive and irreparable defects in the optical field, which if left untreated, result in complete blindness. We will apply NGS to investigate the genetic basis of glaucoma in Peruvian populations, who as with other Latin-American populations are severely under-represented in current genome databases. The production of interpretable data from NGS will be achieved via the application of Windows Azure based high-performance computational analysis.
Hugo Andres Neyem, Pontificia Universidad Católica de Chile, Chile
Improving the Preservation of Latin America’s Wildlife through a Cloud Shared Workspace
Philippe Desjardins-Proulx, Université du Quebec, Canada
Growing Intelligence with Cloud Markov Logic Networks
Abstract: Modern methods from artificial intelligence and machine learning are now widely used, from business intelligence to ecology, for their ability to build effective predictive models. These methods are at the vanguard of the data revolution, allowing us to tackle complex problems for which no simple models have been found. Yet, unlike humans, most machine-learning algorithms are unable to build on prior knowledge to find better models, a process called "transfer learning". We use the flexibility of Windows Azure to develop new techniques for transfer learning based on probability theory and logic, with a focus on applications to biodiversity.
Yung-Hsiang Lu, Purdue University, United States
Cloud-Based System for Continuous Analysis of Many Cameras
Abstract: Many cameras have been deployed for various purposes, such as monitoring traffic, watching natural scenes, and observing weather. In many cases, the video streams are not analyzed by computers and simply discarded.
Analyzing videos requires significant amounts of computing resources and cloud computing may meet this need. This project aims to develop a cloud-based software infrastructure for analyzing video streams. This infrastructure manages the cloud resources and meets the performance requirements of the analysis programs. The requirements are affected by many factors, including the videos' frame rates, resolutions, scene contents, as well as the types and the numbers of cloud virtual machines. Through this software infrastructure, analysis programs run on Windows Azure and process many videos simultaneously. This Windows Azure Award allows our team to experiment different approaches designing and developing the software infrastructure.
Ka Yee Yeung-Rhee, University of Washington, United States
Inference of Gene Networks Studying Human Cancers On The Cloud
Abstract: Despite the generation of big data measuring molecular activities of all genes simultaneously, the majority of genes within most species are not mapped to any regulatory or biochemical pathways. Gene networks capture the interactions between biological entities. The inference of gene regulatory networks from big data in biology is of broad interest to scientists, and poses challenging research problems. We will develop computational methods and tools for the construction of gene networks on Windows Azure, and the subsequent application of our methods to infer gene networks using large complex biological data studying human cancers.
A. Lucas Stephane, Florida Institute of Technology, United States
Life-Critical Interactive Glass Wall Integration
Abstract: This proposed research and innovation work focuses on complex life-critical information visualization for the current NASA KSC Glass Wall project focusing on rocket launches, where every piece of information counts to insure safety, efficiency and comfort of the underlying space mission. We plan to use Windows Azure as an Infrastructure as a Service (IaaS) to explore and anticipate the future NASA KSC service-oriented architecture and enterprise solution. Based on the use of the external Windows Azure solution, HCDi will make recommendations to NASA KSC for balancing private (internal) and cloud (external) infrastructures.
Alexander Vyushkov, University of Notre Dame, United States
Modeling Malaria Transmission on Windows Azure
Abstract: We request 195,000 hours of Windows Azure compute instance time to generate and fit baseline simulations for use in a decision support tool. The requested compute instance time will be used to fit and pre-compute historic baseline simulations that will then be used as starting points for simulations run on demand by the decision makers who are evaluating the impact of planned malaria interventions. This is a feasibility study, which—if conducted at full scale—will enhance and accelerate research we are performing for the Bill & Melinda Gates Foundation under a project called VecNet, www.vecnet.org.
David Hazel, University of Washington, United States
AMADEUS - Azure Marketplace of Applications for Diverse Environmental Use as a Service
Abstract: UW CWDS will use the AMADEUS platform, hosted on the Windows Azure stack, to foster an ecosystem of environmental data applications (both web based and mobile) that consume our models and data sources to integrate them into a variety of modeling scenarios. We will use WWT for 1) simultaneous visualization of multiple datasets, 2) exposing correlations and detected patterns map layers, 3) providing tutorials and example applications to bootstrap others interested in using WWT as a discovery platform. Additionally, partner with domain experts to prepare narrated guided tours for teaching curriculums for university programs in urban planning, GIS and environmental sciences.
Dhruv Batra, Virginia Tech, United States
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
Abstract: We propose to build CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms on the cloud, as a service to the community.
Hanspeter Pfister, Harvard University, United States
Abstract: The goal of the Connectome project is to reverse engineer circuits in neocortex to better understand the brain processes information. We will analyze super resolution volumes generated by electron microscopes to segment neurons and synapses. Data magnitude and noise makes this a significant challenge.
Kelly Smith, University Corporation for Atmospheric Research (UCAR), United States
The Unidata Integrated Data Viewer (IDV) as a Cloud Service
Abstract: The Integrated Data Viewer (IDV) is an open-source application for interactive visualization and analysis of geoscience data1. It is developed at Unidata, a University Corporation for Atmospheric Research (UCAR)’s Community Program (UCP). The IDV enables users, typically scientists or educators, to integrate, subset, and visualize complex data. The IDV is a resource- intensive application, requiring 1GB of RAM or more on a Java Virtual Machine (JVM). With the current shift from traditional laptop- and desktop-style computers towards mobile devices, Unidata is investigating strategies for providing interactive IDV visualization via cloud-enabled technologies.
Richard Dana Loft, National Center for Atmospheric Research, United States
AzurePlanet: A cloud-based system providing access to weather and climate information
Abstract: AzurePlanet will provide integrated and engaging public access to Information about past weather and future climate anywhere in the world, for the period 1979-2100. For performance reasons, AP will focus on key 2D variables such as surface temperature, pressure, precipitation, winds and humidity. AP will use peer-reviewed data sets and proven methods from geostatistics to ensure its scientific accuracy. Streaming visualizations of global weather (e.g. of major hurricane or winter storm events) will provide an engaging “on-ramp” for students, educators and the general public, encouraging them to delve further into weather and climate information.
Susan Borda, California Digital Library, United States
Abstract: DataUp is now operational, but it has not reached its full potential as a tool for archiving datasets from a range researchers and disciplines. DataUp has the potential to become a key tool in research data sharing and archiving as envisioned by the NSF DataNet program. To that end, our major project goals for DataUpv2 are to (1) enhance the tool’s user experience and add features, and (2) build the open-source community around DataUp.
Tanya Berger-Wolf, University of Illinois at Chicago, United States
Computational Behavioral Ecology on the Cloud
Abstract: The goal of our project is to develop tools and models to collect and take location traces data of social animals and to create predictive and descriptive models of individual and collective movement, taking into account micro and macro behavior, ecological and habitat context, and the interactions with other individuals.
Yuejie Chi, The Ohio State University, United States
Online Distributed Inference of Large-Scale Data Streams in the Cloud
Abstract: The proposed research develops a novel sensing scheme, called covariance sketching, to acquire the covariance structure of high-dimensional data streams in resource-constrained environments with mathematically provable performance guarantees. Furthermore, adaptive algorithms are designed to track underlying covariance structural changes in a dynamic data stream. The usage of the Windows Azure cloud platform allows both theoretical and practical advancements of the proposed research by exploring fundamental trade-offs in regimes which may otherwise be impossible and by applying it to real-world large-scale video and social network data.
Animashree Anandkumar, University of California, Irvine, United States
Large-scale Unsupervised Learning via Tensor Methods: Applications in Social Networks and Text & Image Analysis
Jaime Ruiz, Colorado State University, United States
EQStratus: On-Demand Genome Assembly using Cloud Infrastructure
Jim Nelson, Brigham Young University, United States
Flood Early Warning System in the Cloud
Kenneth H. Buetow, Arizona State University, United States
Using Data Science Approaches To Map Biologic Processes To Clinical Phenotype And Outcome
Richard P. Hooper, Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), United States
Transforming Water Science with Featured-Based Data Access