Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Agenda

Latin American eScience Workshop 2013 | sponsored jointly by Microsoft Research and FAPESP
May 13–15, 2013 | São Paulo, Brazil

Pre-Event Activities

Monday, May 13

Time

Event/Topic

8:00

Registration

9:00

Tutorial: Data Visualization in Layerscape I | video
Presenter: Rob Fatland (Microsoft Research)

 

Abstract: This tutorial will focus on data--specifically visualization towards insight—in relation to the Layerscape toolkit. We will begin with the basic navigation and structural concepts of the WorldWide Telescope (WWT) application including time, viewing modes, data types, and data capacity. We then proceed to example datasets: how these are typically represented and how they can be imported into WWT for rendering. We next explore story-telling around the data through construction of WWT tours and we will discuss tour publication at http://layerscape.org, including preservation of metadata. We then proceed to discuss integration of data services such as a generic web mapping service and FetchClimate. Throughout, we will use Microsoft Excel and the Excel Add-in for Worldwide Telescope. Finally, we turn to the WWT API and the Developer’s Toolkit, “Narwhal,” that facilitates more complex rendering of data. By the conclusion of this tutorial, the attendee should have a good concept of how the various parts of Layerscape fit together around the WWT visualization engine, and how an imaginative translation of geospatial and/or abstract data to pixels can give new understanding of what the numbers really mean.

10:30

Coffee Break

11:00

Tutorial: Data Visualization In Layerscape II | video
Presenter: Rob Fatland (Microsoft Research)

 

Abstract: same as above

12:30

Lunch

14:00

Tutorial: Data-Constrained Environmental Modelling: FetchClimate, Filzbach, and Distribution Modeller I | video
Presenter: Drew Purves (Microsoft Research)

 

Abstract: There is an obvious and urgent need to build predictive models of important environmental phenomena. Such models need to describe how variation in different aspects of the environment—such as climate and soil—affect the phenomenon of interest, for example, primary productivity of plants, agricultural yield, or even land-use change. But to date, the building of such predictive models has been held back by a host of technical barriers, placing it outside the reach of many environmental scientists (and making it annoyingly difficult and slow for the rest!). In the first half of this tutorial, we will concentrate on FetchClimate, an HTML5 browser application that makes it very easy to quickly get the environmental information needed to drive the models. You’ll learn how to use FetchClimate to perform several important classes of query, including grids of climatology statistics, collections of time series (whether year-to-year or day-to-day), and to get to some future climate predictions too. Next, we’ll move on to Filzbach, which is a generic Bayesian parameter estimation engine, allowing you to define an arbitrary model, then parameterize that model against data. We’ll give examples of using Filzbach from C++, R, and Matlab—and explain the key statistical concepts that Filzbach embodies, too. Finally, you will be among the first to try Distribution Modeller, a new browser application that ties FetchClimate, Filzbach, and other pieces together to provide an end-to-end environment for rapidly building and parameterizing models—then pushing them into FetchClimate so that they can be run on demand by anyone, anywhere.

16:00

Coffee Break

16:30

Tutorial: Data-Constrained Environmental Modelling: FetchClimate, Filzbach, and Distribution Modeller II | video
Presenter: Drew Purves (Microsoft Research)

 

Abstract: same as above

17:30

Tutorial: Expanding Your Horizons | video
Presenter: Dan Fay (Microsoft Research)

 

Abstract: Increasing our professional visibility beyond what was possible in the past is becoming easier. Information sharing is faster and wider. The focus of this interactive discussion goes beyond traditional academic approaches to sharing results by examining the role of online connections and communication approaches to include things like social media. Participants are encouraged to share their thoughts and best practices about how to expand research exposure.

18:00

Closing for the day

 

Latin American eScience Workshop 2013

Tuesday, May 14

Time

Event/Topic

8:00

Registration

9:00

Opening Session

9:30

Keynote: Making a Difference for Science and Society | video
Speaker: Tony Hey (Microsoft Research)
Chair: Claudia Bauzer Medeiros (IC-Unicamp)

10:30

Coffee Break

11:00

Keynote: Predicting the Future of All Life on Earth | video
Speaker: Drew Purves (Microsoft Research)
Chair: Rosane Minghin (ICMC-USP)

 

Abstract: If humanity is going to safeguard its collective future, then it will need to make predictions about how different aspects of the Earth system. For example, the carbon cycle, biodiversity, food production, deforestation, wood production, fertilizer use, and of course climate might change in the future under various scenarios. At the very center of each of these aspects is life, so the challenge is nothing less than to predict the future of all life on Earth! At the Computational Ecology and Environmental Science group, we carry out novel, fundamental ecological research that is necessary to building predictive models of ecosystems, and develop the novel software that is required to do so. In this talk, I will give examples of our work pertaining specifically the Amazon basin, including predictive models of carbon cycling and carbon storage, biodiversity (and its response to land-use change), leaf phenology, deforestation, and agricultural productivity. I will explain how we built these models using our in-house tools (including FetchClimate, Filzbach, and Distribution Modeller), and will be keen to discuss how we can combine the predictions of such models in order to support effective decision making in the Amazon, and more broadly.

12:00

Student Presentations
Coordination: Claudia Bauzer Medeiros (IC-Unicamp)

12:30

Lunch

Parallel Sessions

14:00

Data Visualization | video
Chair: Harold Javid (Microsoft Research)

 

Presentations:

  • Improving the User Experience of Big Data Analytics
    Speaker: Rob DeLine (Microsoft Research)
    Abstract: Data science today is like software development in the mainframe era: data scientists twiddle their thumbs waiting for big batch jobs to complete and shuffle data around between multiple independent tools, often through tedious clerical work. A typical workflow might include map/reduce systems (Hadoop), database management systems (MySql), spreadsheets, scripting environments (Python), statistical programs (R, Matlab) and machine learning tools (Weka). These bureaucratic workflows have several disadvantages, including the barrier of learning all these tools, the vigilance needed to prevent mistakes, the difficulty of preserving provenance and reproducibility, and the extra effort required to share data sets and analyses. To address these problems, I'll present a demo of our prototype environment for data science, called “Stat!”. The goal of “Stat!” is to allow a data scientist to accomplish an entire workflow, from raw data to final presentations, in one environment. This integration creates the opportunity for high productivity, automated checking, and preservation of data provenance. The project's long-term goal is to democratize data analysis so that, say, the average spreadsheet user can use statistics and machine learning to draw valid conclusions about a data set of her choice.
  • Challenging Multidimensional Data
    Speaker: Maria Cristina Ferreira de Oliveira (ICMC-USP)
    Abstract: In this talk, I will present some illustrative examples of the difficulties faced daily by many professionals, scientists or not, when trying to make sense of data (their own or others'), which is often high-dimensional. We all need better tools for data analysis, and I hope to make the case that including visualization into our repertoire of tools for data analysis can make a lot of difference. However, the use and development of visualizations demand several kinds of expertise, and many challenges remain to increase availability and usability of current techniques. I would like to motivate you into contributing to visualization research, discussing some of these challenges from the perspective of my experience with a particular category of techniques for visualizing high-dimensional data.
  • Layerscape
    Speaker: Rob Fatland (Microsoft Research)
    Abstract: A cloud-based user experience, Layerscape employs powerful, everyday tools to analyze and visualize complex Earth and oceanic datasets—enabling scientists to gain environmental insights into Earth. Users can create and share 3-D virtual tours based on their discoveries and collaborate with the Earth-science community in ways that previously seemed impossible. Build your own virtual tours and experience the possibilities.

Digital Humanities | video
Chair: Marta Arretche (FFLCH-USP)

 

Presentations:

  • GIS Applications for the Social Sciences and for Urban Studies in Brazil
    Speaker: Eduardo César Marques (USP)
    Abstract: The use of quantitative geographical analyses about social and urban processes is quite frequent internationally. In Brazil, however, the use of these tools began late, mainly due to the absence of publicly available databases. Since the 1990, the situation has improved substantially, with the free-of-charge provision of data from official agencies, but also from research institutions. This presentation explores examples of different uses of spatialized quantitative information on São Paulo and about other Brazilian cities to characterize and investigate social processes, as well as to subsidize decision making processes in social and urban policies.
  • Speaker: Claudio Pinhanez (IBM)

15:30

Coffee Break

16:00

eScience in the Cloud | video

Chair: Kris Tolle (Microsoft Research)

 

Presentations:

  • Cloud Computing for Forth Paradigm Challenges in Scientific Research
    Speaker: Dennis Gannon (Microsoft Research)
    Abstract: Cloud computing provides us with a new paradigm for addressing the computing and data analysis challenges confronting many scientific disciplines. Unlike traditional supercomputers, the cloud can support different styles of computation that are well suited for collaboration and data analysis. For the last three years, we have been working with academic researchers to explore the potential of this new platform. We now have more than 90 research projects using Windows Azure and we have learned a lot. This talk will describe the discoveries we have made and outline the potential we see for future research using the cloud.
  • eScience in the Cloud
    Speaker: Dan Fay (Microsoft Research)
  • User-Steering on Cloud Workflows
    Speaker: Marta Mattoso (COPPE-UFRJ)
    Abstract: Scientific workflow management systems have successfully demonstrated their capabilities on several scientific areas evidencing its contribution to eScience. Executing workflows, with high performance computing, in cloud environments has many advantages and open issues such as user-steering on workflows. Many workflow users demand for steering features such as real-time monitoring, analysis, and execution interference. The workflow execution should respond dynamically to such interference in execution to support the experimentation process, especially in the cloud. In this talk, I will discuss research challenges of steering by scientists and present some ideas on how these issues may be supported in current workflow technologies. Querying workflow provenance data at run time plays an important role in user-steering, as shown in our own approach to face some of those challenges.

Computer Vision and Bioacoustics | video
Chair: Rob Fatland (Microsoft Research)

 

Presentations:

  • Detecting Remote Phenology Patterns: A Computer Vision Approach
    Speaker: Ricardo da Silva Torres (IC-UNICAMP)

    Abstract: e-phenology: The application of new technologies to monitor plant phenology and track climate changes in the tropics.
    Environmental changes are becoming an important issue in the world. An example that represents these problems arises in the context of phenology studies. Phenology, the study of natural recurring phenomena and their relation to climate, is a traditional science of observing the cycles of plants and animals and relates mainly to local meteorological data, as well as to biotic interactions and phylogeny.

    Recently, phenology has gained importance as the simplest and most reliable indicator of the effects of climate change on plants and animals. The strongest results connecting, for instance, changes on timing of first flowering and leafing and bird migration to recent global warming has come from long term phenological series from the Northern Hemisphere, where historical data sets have been collected for decades.

    The scarcity or lack of information and monitoring systems in tropical regions, in particular, South America, has stimulated several research centers. Between many works being developed, e-phenology stands out, a multidisciplinary project combining researches in computing and phenology, with the purpose of solving practical and theoretical problems involved in the use of new technologies to remote observation of phenology

  • Analysing Bio-Acoustic Data for Faunal Monitoring
    Speaker: Paul Roe (Queensland University of Technology)
  • A Cloud-Based Evaluation Infrastructure for Medical Image Analysis and Search
    Speaker: Allan Hanbury (Vienna University of Technology)
    Abstract: The cloud-based infrastructure for evaluating machine learning and information retrieval algorithms on Terabytes of medical images, being built in the EU VISCERAL project, is presented. Instead of downloading data and running evaluations locally, the data will be centrally available on the cloud and algorithms to be evaluated will be run on the cloud, effectively bringing the algorithms to the data. The design of the VISCERAL infrastructure is presented, concentrating on the components for coordinating the participants in the benchmark and managing the ground truth creation. The medical imaging benchmarks that will be run on this infrastructure are also described.

17:30

Demofest—Presentations and Demos
Chair: Juliana Salles (Microsoft Research)

 

Topic

Presenter

GeoFlow

Alexandre Da Veiga

Life in Andes

Cristián Bonacic

Web Based Bioinformatics

Jarek Pillard

CLEO: Cultivating the Long Tail of eScience Observations

Jie Liu

DataUp

Kristin Tolle

ePhenology

Ricardo da Silva Torres

Improving the User Experience of Big Data Analytics

Rob DeLine

FetchClimate and Layerscape

Rob Fatland

ChronoZoom

Roman Snytsar

Medical Imaging Initiative Demo

Simon Mercer

VidWiki: Crowd-Enhanced, Online Educational Videos

Vidya Natampally

19:00

Closing for the day

 

Wednesday, May 15

Time

Event

8:30

Registration

9:30

Keynote: The Use of Computing Sciences in Plant Systems Biology: How can this association help us to produce more and better food and bioenergy at the same time cope with the impacts of the global climate changes?

Presenter: Marcos Buckeridge (USP & CTBE)
Chair: Roberto Marcondes Cesar Jr. (IME-USP)

 

Abstract: Food and biofuel production (R&BP) and global climate change (GCC) are modern transdisciplinary issues in which scientists are key contributors. Because plants are the primary source of biomass production in the planet due to their ability to perform photosynthesis as well as being the greater CO2 consumer, one of the most important scientific aspects that permeate those major issues for humanity is our level of understanding of how plants function. In this presentation, I will talk about how apparently simple results of plant physiology obtained for native plant species around 20 years ago turned out to be seen as extremely complex and interconnected network systems that led me to conclude that the only way to go further would be by integrating plant biology with mathematics and computing sciences. I will present one project (Microsof-FAPESP) that is now been developed in collaboration with computing scientists to develop new tools to integrate different scales within the functioning plant, in other words, transcriptomics, metabolomics, and physiology. One long-term goal is to develop tools that could help to evaluate plant behavior in the environment in a systemic way so that we could better evaluate the impacts of the GCC. The other is for crop plants. We expect that the knowledge and control of how systems integrate at indifferent scales will make possible the application of synthetic biology techniques to plants in order to produce more and better food and bioenergy.

10:30

Coffee Break

11:00

Keynote: The Reality of Reproducibility of Computational Science
Speaker: Carole Goble (University of Manchester)
Chair: Marta Mattoso (UFRJ)

 

Abstract: Reproducibility in principle underpins the scientific method. For an experimental finding to be reproducible its materials must be available and its methods clear, accurate, and transparent. In this talk, I will explore the reality of reproducibility in computational science, focusing on methods that use workflows. I will discuss what we mean by reproducibility; differentiate between the preservation and conservation of workflows; sketch the role provenance has to play; and point to a growing number of initiatives that aim to make in silico science reproducible, including our own first steps towards a reproducibility framework based on Research Objects. Although technical infrastructure helps towards the utopian ideal of truly reproducible science, it is social factors that define the reality.

12:00

Students presentations
Coordination: Claudia Bauzer Medeiros (IC-Unicamp)

12:30

Lunch

Parallel Sessions

14:00

Data Mining, Management, and Computational Ecology | video
Chair: Juliana Salles (Microsoft Research)

 

Presentations:

  • Location Sensing on Mobile Devices
    Speaker: Jie Liu (Microsoft Research)
    Abstract: Location-based services have become ubiquitous, thank to the sensors like GPS and Wi-Fi in our smartphones and other mobile devices. They have the potential to change how scientists collect data and conduct experiments. However, continuous location sensing such as logging, tracking, and geo-fencing consume too much energy and shorten device battery life. In this talk, we take a fresh look at location sensing in both outdoor and indoor settings. For outdoor locations, we dive into the principles of GPS receivers and show that by offloading GPS processing to the cloud, we can reduce the device side energy consumption by three orders of magnitude. For indoor locations, we discover that commercial FM signals are good sources of location signatures that work better than Wi-Fi signatures by themselves, and work even better if combined with Wi-Fi signatures. These low energy alternatives enable always-there location services without users paying battery life penalty.
  • Advanced Network for the Distribution of Endangered Species
    Speaker: Cristian Bonacic (PUC – Chile)
    Abstract: Many of Latin America’s endangered species are insufficiently studied; more information is needed to help preserve them. Scientists at Pontifical Catholic University of Chile, in coordination with Microsoft Research, have developed a tool that they believe will help: LiveANDES (Advanced Network for the Distribution of Endangered Species). The LiveANDES platform stores and parses data points about wildlife and natural areas by using photographs, audio and video recordings, and location and sighting information. Researchers use the data to identify species, where they currently dwell, and possible threats to their future. The tool is also capable of parsing huge volumes of recorded data so that it is manageable for researchers.
  • The Use of Database and Information Systems to Improve Public Policies of Biodiversity Conservation and Restoration
    Speaker: Carlos Joly (IB-UNICAMP & BIOTA-FAPESP)

Health and Wellbeing
Chair: Carlos Alberto Moreira Filho (FM-USP)

 

Presentations:

  • Medical Imaging Initiative Demo
    Speaker: Simon Mercer (Microsoft Research)
    Abstract: Medical imaging is the world’s most prolific generator of big data today. Intelligent machine-based analysis is the only way to practically process such vast amounts of data to produce medically-useful information in a timely manner. We believe that medical image analysis is being held back by a lack of high-quality, sharable, and well-annotated image datasets for training and benchmarking algorithms. We plan a range of activities intended to reduce these bottlenecks, including a new Microsoft Research open source platform built on Windows Azure in association with Stanford University. This platform will be demonstrated.
  • Complex-Network Driven View of Genomic Mechanisms Underlying Common Diseases
    Speakers: Carlos Alberto Moreira Filho (USP) & Luciano da Fontoura Costa (IFSC-USP)
    Abstract: Common diseases (CD), like cancer, heart disease, diabetes, or acquired epilepsy, are caused by the interplay of genomic and environmental factors. Genes underlying these non-transmissible chronic diseases usually display higher number of connections in transcriptional networks, being called hubs because they connect many genes (nodes) that would not be connected otherwise. Hubs may coordinate or link together specific cellular processes. Therefore, in order to acquire a better understanding on the molecular mechanisms involved in a particular CD and its pathophenotypes it is mandatory to develop methods to study the complete set of valid transcripts in disease’s target tissue or cell population. The mathematical and computational tools for this task lay in the field of complex network analysis.

    Here we present a methodology for complex network visualization (3-D) and analysis that allows the categorization of network nodes according to distinct hierarchical levels of gene-gene connections, or node degree (hubs), and of interconnection between node neighbors, or concentric node degree (VIPs, high hubs). This methodology was applied to the investigation of genomic mechanisms underlying febrile (FS) and afebrile (NFS) forms of drug-resistant epilepsy by comparatively studying CA3 hippocampal co-expression networks of FS and NFS cases. Such approach enabled us to identify the distinct roles of the most highly connected hubs in each form of the disease and proved to be a useful tool for systems biology-based antiepileptic drug discovery. The network centrality observed for the hubs, VIPs and high hubs of CO networks, is consistent with the network disease model, where a group of nodes whose perturbation leads to a disease phenotype forms a disease module occupying a central position in the network. This result shows that the probability of exerting therapeutic effect through the modulation of single genes is higher if these genes are highly interconnected in transcriptional networks.
  • Speaker: Mauricio Rodriguez & Andres Pinzon (Centro de Bioinformática y Biología Computacional de Colombia – CBBC)

15:30

Coffee Break

16:00

Keynote: Big Data, Digital Humanities and the New Knowledge Environments of the 21st Century | video
Speaker: Chad Gaffield (Social Sciences and Humanities Research Council of Canada – SSHRC)
Chair: Claudio Pinhanez (IBM)

 

Abstract: The past year proved to be the time when digital scholarship moved to the top of the agenda for schooling at all levels as well as for research fields across all disciplines. From Big Data to Massive Open Online Courses, intense debate among educators, scholars, and policy makers captured public attention around the world and has set the stage for concerted efforts to embrace the digital age. This presentation will focus on the key role that scholars across the social sciences and humanities are now playing to advance digital scholarship especially in interdisciplinary and international research teams. Examples will include those funded through the Image, Text, Sound and Technology program, the Canada Research Chairs program, the Networks of Centres of Excellence program, and the international Digging into Data initiative. Special attention will also be given to the upcoming World Social Science Forum entitled, "Social Transformations and the Digital Age," to be held in Montreal, October 13–16 as well as the proposed TransAtlantic Platform for the Social Sciences and Humanities of which Brazil is a partner.

17:00

Closing