Microsoft Research Latin American Faculty Summit 2011
Cartagena, Colombia | May 18–20, 2011
On This Page
- The Role of Basic Research in Technology
- Open Science, Open Data, and Open Source
- Science, Technology, and Innovation Strategies for Promoting Competitiveness
- Computing and the Future
- The Path to Open Science with Illustrations from Computational Biology
- High-Fidelity Augmented Reality Interactions
- Semantic Computing for eScience
- Scaling Science in the Cloud: From Satellite to Science Variables at the Global Scale with MODISAzure
- Natural User Interface—A New Frontier for Human Centric Computing
- Live Andes (Advanced Network for the Distribution of Endangered Species): A New Tool for Wildlife Conservation
- ANURA: Sensor Networks for Classifying and Monitoring Frogs Based on Their Vocalizations as an Early Indicator for Ecological Stress in Rain Forests
- The Brazilian Biodiversity Database and Information System SinBIOTA: The 10 Years Integration with the BIOTA/FAPESP Program and the Re-Development for the Next Decade
- Scientific Computing Using Windows Azure
- Bridging the Gap Between Formal and Informal Learning with Mobile Devices
- Counting People Waiting in Service Lines Using Computer Vision and Machine Learning Techniques
- Transforming Scholarly Communication—Overview of Recent Projects from Microsoft Research
- Tools of the Trade: Cluster and Cloud Computing On the Operating System That Is Not Linux!
- The Microsoft .NET Gadgeteer Hardware Prototyping Platform
- Sharing Research Worldwide: Advances in Automatic Translation
- The Microsoft Biology Foundation
- Cloud Computing for Science in Europe and Venus-C
- Towards Exaflop Supercomputers
- Future Trends in Software Engineering
- Audio and Video Research for Kinect
- Water from the Mountains, the Fourth Paradigm, and the Color of Snow
The Role of Basic Research in Technology
The role of basic research in the technology field is often misunderstood. In this talk I will look at what basic research in technology is and is not and what role it plays in the overall innovation process. I will draw on my own experiences both at Carnegie-Mellon University and at Microsoft Research and show how research insights can have a dramatic impact both on corporations and on society as a whole. I will talk about the process of moving ideas from research to product and showcase a number of recent technologies and how they evolved.
Open Science, Open Data, and Open Source
This talk will examine the implications of the current explosion of scientific data and the need for new and more powerful tools to visualize and explore this data. We begin with a survey of some of the open source tools that Microsoft Research is creating in collaboration with the scientific community. These are now donated to the Outercurve Foundation, an open source foundation supported by Microsoft. The combination of open tools and services with open data is leading to major opportunities for new ways of organizing and exploring data, and hold great promise for delivering scientific discoveries in new and exciting ways.
Science, Technology, and Innovation Strategies for Promoting Competitiveness
Colciencias, Colombia’s Department of Science, Technology and Innovation, is the government agency charged with formulating, designing and coordinating the national policy to promote research and innovation. The Science, Technology and Innovation Law (Law 1286 of 2009) directs Colciencias to lead the National System of Science, Technology and Innovation, which unites the efforts of state, academia, industry and civil society to build, in Colombia and its regions, a development model based on knowledge generation and use. Through it promotion instruments, Colciencias, supports research, technological development and innovation projects; human resource training for science, technology and innovation; strengthening of the regional science, technology and innovation systems; international mobility of researchers; and social appropriation of knowledge.
The Department’s vision for 2014 is that of a Colombia with a state policy on science, technology, and innovation consolidated through the increase in output, usage, integration, and appropriation of knowledge within the productive apparatus and society at large. The ultimate goal is the contribution to social progress, economic dynamics, sustained and sustainable growth and greater prosperity for everyone. It is expected that, for year 2019, Colombia will have reached human, social, and economic development grounded on the production, dissemination, and use of knowledge and will be a fundamental component towards productivity and international competitiveness, as well as the prevention and resolution of national and regional challenges.
The objectives of the Department of Science, Technology, and Innovation are:
- Consolidate the National System of Science, Technology, and Innovation: science, technology, and innovation management in Colombia and its regions will be supported by clear, widely concerted policies with strong institutions, stable and sufficient resources, effective articulation mechanisms, and the active participation of all of society.
- Increase and involve human resources for research and innovation: the development of skills and competences for science, technology, and innovation will begin in the early childhood stage and will be promoted throughout all educational levels. The support for doctoral training will increase and their incorporation into the ranks of industry will be promoted to support research and innovation.
- Promote knowledge and innovation for the productive and social transformation of the country: the scientific community will have the necessary tools to produce and apply knowledge that contributed to the solution of the problems of the country and its regions. The social innovation and collaborative work among research groups, community, and industry will be promoted. A new culture of innovation will be generated and strengthened among entrepreneurs, the productive sector and the regions, and the capabilities to incorporate knowledge within productive processes will be increased. Science, technology, and innovation will fuel the engines that will pull the country’s growth.
Computing and the Future
The last 40 years have seen computer science evolve as a major academic discipline. Today, the field is undergoing a fundamental change. Some of the drivers of this change are the Internet, the World Wide Web, large quantities of information in digital form, and widespread use of computers for accessing information. The change is requiring universities to revise the content of computer science programs. This talk will cover the changes in the theoretical foundations of computer science that are needed to support the future.
The Path to Open Science with Illustrations from Computational Biology
Science is increasingly conducted in a digital medium, whether it is observationally or hypothesis driven. However, I will argue that we are very ill-equipped to conduct science efficiently and in a way that can maximize discovery in this digital medium. While significant attention has been given to large data, the thousands of scientists who produce and manage laboratory data have been neglected. I will describe some of the issues with the current scientific digital workflow—from idea to publication—and what we might do to better the situation. I will draw examples from my own experience in computational biology, which is progressive in efforts to deal with the digital roadblocks to new understanding.
High-Fidelity Augmented Reality Interactions
This talk explores one simple question: how do you catch a virtual ball with your real hand? To illustrate this concept, I present a series of projects that demonstrate how projectors and depth-sensing cameras (for example, Kinect) can be used to create augmented reality experiences far richer than previously imagined. I discuss how such experiences can be authored and created, and how to achieve high-fidelity interactions with virtual content without requiring the user to wear any additional gear. Ultimately, this talk shows how, with today’s depth-sensing technology, one can blur the line between physical and virtual worlds and make it possible to use your hands and the knowledge of manipulation of the physical world to interact with virtual content. I draw from my experiences in designing several highly publicized projects such as LightSpace, MirageBlocks, and Pinch-the-Sky Dome to illustrate the concepts.
Semantic Computing for eScience
As the volume of data continues to grow, it has not become any easier to find a needle in the data haystack. Data-driven research has fundamentally changed how people interact with data and each other by exposing analyses, discoveries, or recommendations which were invisible to the human eye. For instance, data-driven technologies enable search engines to retrieve relevant documents from billions of documents very efficiently; likewise, these technologies are helping machine translation to break down natural language barriers effectively, while also making online gaming more challenging and interesting by automatically matching players of same levels. As we look for the next generation of interactions with data to deepen our understanding of our environment, semantics emerges as one of the building blocks to start making sense of data in context. In the talk, we will explore eScience scenarios where semantic computing technologies can contribute to moving research forward along the path from data to information, knowledge, and intelligence.
Scaling Science in the Cloud: From Satellite to Science Variables at the Global Scale with MODISAzure
The perfect storm of digital sensors, remote sensing, and commodity computing has created a data deluge of unprecedented scale. At the same time, science questions such as climate change require analysis at global spatial scales and decadal or longer temporal scales. As such, it is increasingly difficult for scientists to obtain, organize, and mine such large datasets. Moreover, many scientists have only limited access to the larger scale computational resources necessary for such large datasets. Cloud computing promises an attractive alternative for hosting eScience data and applications, particularly for those applications which have outgrown the science desktop yet don’t need a supercomputer. In these early days of cloud computing, the cloud also presents unique computing challenges. This talk will present both the eScience potential and near term computer science challenges of cloud computing, drawing upon experiences with MODISAzure over the past two years.
Natural User Interface—A New Frontier for Human Centric Computing
What is “natural” when it comes to computing? What separates useful UI versus easy to use UI and what are the tradeoffs? This talk will focus on how interactions with computers can be more natural, intuitive, and context oriented. This vision talk will explore the differences between what is usable and useful in regards to human and computer interactions. It will will also include some examples of truly ubiquitous computing and demonstrate how it may be possible to push the envelope of scientific discovery by showing a Kinect gesture interaction with the WorldWide Telescope.
Live Andes (Advanced Network for the Distribution of Endangered Species): A New Tool for Wildlife Conservation
The aim of this project is to develop software to input, analyze, and share location data from wildlife sightings in nature. The outcomes of georeferenced data will be available to show in Bing maps. The value of data sharing for wildlife locations mainly refers to the need of mapping presence and distribution of endangered species in native ecosystems. It is also useful to record alien species and any important information about wildlife conservation (for example, illegal hunting, new species distribution range, seasonality, and migration routes). Live Andes is intended to become a simple way to include citizen science in wildlife conservation in the region where ecologists, scientists, park rangers, and even tourists can upload, share, and analyze georeferenced data that are stored in the cloud. Currently, a web-based portal and a mobile phone application are under development. Also, background data for a series of species in Chile is under construction and query criteria for data analysis is still in progress.
ANURA: Sensor Networks for Classifying and Monitoring Frogs Based on Their Vocalizations as an Early Indicator for Ecological Stress in Rain Forests
In this presentation, we introduce the project ANURA, which aims at using sensor networks for classifying and monitoring anura (frogs and toads), based on their calls, as an early indicator for ecological stress in rain forests. We also present the promising results of our preliminary study, in which bioacoustic signals are segmented into smaller units called “syllables”. Then, these units processed by a pre-emphasis filter and a Hamming window that prepares the signal for feature extraction. We used Mel-fourier Cepstral Coefficients (MFCCs) to represent the acoustic signals and two classifiers were evaluated: kNN and SVM. In our experiments, we achieved a classification rate of 98.97 percent, which shows that the MFCC, usually used in speech recognition, can be used for the recognizing anura species as well. The anuran recognition rate was improved in 16.09 percent, using SVM and MFCCs, compared with the state of the art.
The Brazilian Biodiversity Database and Information System SinBIOTA: The 10 Years Integration with the BIOTA/FAPESP Program and the Re-Development for the Next Decade
The program BIOTA/FAPESP was created to provide support for the São Paulo State to achieve the targets of the Convention on Biological Diversity. FAPESP (State Foundation for Research Funding) could stimulate specific research in biodiversity among high-level researchers, but that required the implementation of an organized program. One of the essential elements of the BIOTA/FAPESP program was the development of the information system called SinBIOTA. This system was able, in the past 10 years, to aggregate data that was collated by every research project funded under the program and create informational outputs for education or decision-making processes. However, almost 10 years after its development, not much upgrade has been done, and with the renewal of the BIOTA/FAPESP program a new system is needed based on new technology and with new tools.
Scientific Computing Using Windows Azure
Cloud-based computing provides access to a utility style, on-demand compute resource, billed on a pay-as-you-use basis. There are four key scenarios where cloud computing can prove advantageous:
- Burst capability
- Super scalability
- Data dissemination
- Algorithm development
We examine each of these scenarios with the use of ongoing projects, demonstrating the benefits gained by using Windows Azure.
The Clouds in Space project provides a cloud-based plug-in framework for satellite trajectory propagation and conjunction analysis and is aimed at improving Space Situational Awareness (SSA) by predicting potential satellite collisions. We are extending this to include Near Earth Objects (NEO) close approaches and calculate debris removal strategies to guide future billion dollar space missions. Debris removal requires optimizing the removal of problematic debris with the cost and practicality of space technologies and mission capabilities.
The Atmospheric Science Through Robotic Aircraft (ASTRA) project demonstrates the use of Windows Azure as a compute resource to compliment low powered high altitude scientific instrumentation. ASTRA uses a Windows Phone 7 device as a high altitude stratospheric data logger, sending location data to Windows Azure resource to update a flight prediction model, calculating payload landing locations and flight characteristics.
Bridging the Gap Between Formal and Informal Learning with Mobile Devices
Computers have long been used in support of instructional activities. At first, they were used to support the individual learner sitting in front of a desktop computer. Today, computers take different shapes varying from big electronic whiteboards to small mobile devices, all of them interconnected with networks, allowing computer technology to support various learning scenarios. Although the literature claims they have been very successful supporting one certain (and often isolated) type of learning activity, they can seldom be combined with other systems to enable more consistent and comprehensive support. Researchers have already expressed the importance of learning methodologies and tools to support learners along a continuous flow of different instructional settings. In this work, we present a mobile computer tool supporting a learning methodology that integrates learning activities inside and outside the classroom in a coherent way, by learning how patterns of different nature emerge in the natural environment. With this tool, students learn about design, architectural, geological, or even biological patterns by first learning about them in the classroom and then finding instances of them in the field. The tool also helps learners recognize patterns that were previously unknown to them.
Counting People Waiting in Service Lines Using Computer Vision and Machine Learning Techniques
This talk presents our main achievements under LACCIR project R1208LAC500, a Real Time System Based on Computer Vision Techniques to Supervise and Allocate Cash Register at Grocery Stores. In particular, we discuss the technical details of our approach to detect and count people at waiting lines under complex conditions, such as partial occlusions, different viewpoints, and changes in scale. We show encouraging results when we evaluate the performance of our system by using images coming from real retail stores.
Transforming Scholarly Communication—Overview of Recent Projects from Microsoft Research
In addition to a strong focus on eScience, Microsoft Research is also actively investigating the broader concept of eResearch—with an emphasis on Digital Humanities and eHeritage. This booth will offer demonstrations of several compelling new tools: discussion around Project Big Time (an evolution of the ChronoZoom work by Walter Alvarez at University of California, Berkeley, demos of the Digital Narratives work stemming from the Microsoft Research Lab in India, reference to the Garibaldi/LADS project under Andy van Dam at Brown University, and a deep investigation of the Microsoft Academic Search service from the Microsoft Research Asia lab in Beijing. The breadth of these projects demonstrates the significant interest Microsoft Research is engaged in beyond eScience with a goal of improving productivity and increasing innovation across the entire academy. This talk will highlight several recent projects, focusing on repository work that is currently underway with the Universidad de Bogotá Jorge Tadeo Lozano in Colombia.
Tools of the Trade: Cluster and Cloud Computing On the Operating System That Is Not Linux!
Windows is generally viewed as the operating system you run and develop “Enterprisy” applications on. However during the past few years there’s been a renaissance of both parallel and large-data computing on the platform by using various open-source and proprietary technologies. This talk will provide an overview of which tools are available (especially from a Linux developer’s point of view) and what can be done with them—with an emphasis on free/open source tools. We’ll cover technologies such as MPI, OpenMP, and SOA. We will also talk about how researchers and domain specialists (who are not computer scientists) are using tool such as R, Python, and F# to take advantage of available resources on clusters and cloud. Demos will include running/debugging/profiling Python on a High Performance Computing (HPC) cluster as well as large-scale astronomy image processing on the Windows Azure cloud.
The Microsoft .NET Gadgeteer Hardware Prototyping Platform
Microsoft .NET Gadgeteer is a new prototyping platform that makes it easier to construct, program, and shape new kinds of computing objects. It is comprised of modular hardware, software libraries, and 3-D CAD support. Together, these elements support the key activities that are involved both in the rapid prototyping and the small-scale production of custom embedded, interactive, and connected devices. In this presentation, we will show (through live coding and hardware assembly) how to design, build, and program working devices by using .NET Gadgeteer. The aim is to give those present a feel for how Gadgeteer can be used as a tool for researchers that need to prototype and build bespoke hardware, as well as in classrooms to inspire and educate young computer scientists.
Sharing Research Worldwide: Advances in Automatic Translation
Automatic translation is more and more enabling access to research data, and to everyday documents, across language boundaries. Today’s automatic translation abilities provide free and instant translation in a quality that has not been seen before, in an ever-increasing set of languages. However, we aren’t done yet. This talk covers many of the options for making translation seamlessly available, and points out some of the pitfalls to consider. Recent and near-term future developments make adoption and use of automatic translation services more interesting than ever before.
The Microsoft Biology Foundation
The Microsoft Biology Foundation (MBF) is a general-purpose library of useful functions for the assembly, comparison and manipulation of DNA, RNA, and protein sequences. Built on the Microsoft .NET platform, this toolkit enables the scientific programmer to rapidly develop the applications that are needed by genomics scientists to cope with extracting knowledge from the increasing volumes of data that are common in the field of genomics research. Under development in Microsoft Research for the past three years, MBF contains a core of standard functions but also enables easy access to a wide range of other Microsoft technologies, including Silverlight, DeepZoom, and Microsoft Office as well as unique research tools such as Sho and PhyloD. MBF is open source under the Apache 2 license, and is freely available for commercial and academic use. Developers are encouraged to adapt and extend the basic library, making their modifications available for others to use at the MBF CodePlex project site.
Cloud Computing for Science in Europe and Venus-C
We will review some Microsoft Research engagements in cloud computing in Europe, describe the European project VENUS-C where Microsoft is making a major investment to support development of scientific and industrial applications on an interoperable computing platform, and explain a recent extension of the VENUS-C platform to Latin America in collaboration with Brazilian scientists.
Towards Exaflop Supercomputers
Having recently surpassed the Petascale barrier, supercomputer designers and users are now facing the next challenge. A thousand-fold performance increase will be reached around 2018 if the improvement rate of the last decades continues. Being power the main constraint and facing many hardware challenges, software is probably the biggest one. Worldwide and cooperative initiatives are being started to perform research facing such objectives. The Barcelona Supercomputing Center is involved in such initiatives and carries out the MareIncognito research project, which aims at developing some of the technologies that we consider will be of key relevance on the way to Exascale. The talk will briefly discuss relevant issues, foreseen architectures, and software approaches that will have to be developed in order to successfully install and operate such machines.
Future Trends in Software Engineering
For too long software engineers have relied not on data, but on experience and unaided judgment to make decisions about what process to adopt, method to deploy, or tool to use. In the future, software engineers will use software analytics—,in other words the data resulting from the development process and their artifacts—to make better decisions and to take the right actions. I will showcase this for branch analytics.
Formal methods always held the promise to change the way that software systems are developed. Enabled by the recent increase in the capabilities of automatic theorem provers, their time finally has come. The future will see a new generation of logic-enabled tools, from design time verification to automatic test case generation, and from design space optimization to automatic program synthesis. As an example, I will introduce software equivalence checking.
Twenty years ago, the world had no web; today more than 1.8 billion people are connected via the Internet. And the next shift is already on the way: in 2011, mobile devices (smartphones, tablets) will outsell PCs. With these platform changes, new software engineering challenges emerge: how to share data and programs, how to guarantee security and privacy, and how to program the cloud and/or smartphone. I will introduce coding games for the web and a new touch-enabled development environment for the phone, which try to address some of these problems.
Audio and Video Research for Kinect
In this talk, I will describe some research technologies for Microsoft Kinect. I will describe audio processing techniques such as microphone design, multi-channel echo cancelation, array beam forming, and speech recognition. I will also talk about face and pose tracking from video. Finally, I will illustrate with a few scenarios where these technologies are useful, such as Avatar Kinect and voice control.
Visualizing Scientific Data: Yesterday, Today, and Tomorrow
Data visualization is an important stage on the path to scientific discovery. Methods and tools for visualization are constantly evolving—offering exciting new mechanisms for mining information and knowledge from data. This talk will cover recent Microsoft Research investments in this area and offer an in-depth look at the new geo-spatial data visualization capabilities available when Microsoft Excel and WorldWide Telescope | Earth are used together.
Addressing Societal Challenges Through Innovation and Partnerships—Microsoft Research in India
The Microsoft Research India Lab was established in January 2005 in Bangalore, India, with the same basic goals as all other Microsoft Research labs worldwide: to advance the state of the art in all areas of research in which we engage and to improve Microsoft products (and have a positive effect on the world) through our research. In addition, as part of the worldwide Microsoft Research Connection efforts, we work with our academic and scientific partners in India (and elsewhere) to enhance the computer science research ecosystem, and to address societal challenges through research partnerships.
In this talk, I will summarize the progress made in Microsoft Research India during our first six years in all our activities, and describe some of our key efforts and progress towards addressing societal challenges through collaborative innovation.
Water from the Mountains, the Fourth Paradigm, and the Color of Snow
In the 21st century, a crucial question for seasonal mountain snow worldwide is: How do we reliably predict snowmelt runoff and associated demand as climate changes, populations grow, land use evolves, and individual and societal choices are made? Our traditional forecasting methods are based on statistical relations developed while climate is changing. The long-term data that we have document trends already, but uncertainty will get worse without more physically based approaches. At the same time, we can take advantage of two emerging trends:
- Data-intensive science, The Fourth Paradigm, which goes beyond computational modeling to foster discoveries and analyses from large datasets
- An ability to remotely sense snow properties suitable for energy balance models at a spatial scale appropriate for mountain regions
Were our eyes sensitive to radiation through the whole solar spectrum, snow would be one of nature’s most “colorful” surface covers. We can map snow and its reflectivity, but to get at snow-water equivalent, the amount of water that the melt will produce, two independent estimates are possible:
- Interpolations from ground measurements, which are available soon after acquisition, constrained by measurements of snow-covered area
- From an energy balance snow-depletion calculation, more accurate but possible only after the snow has melted.
The two methods give different answers, but reconciling them through data mining could improve the accuracy of snowmelt runoff forecasts, even in basins with sparse river gauging.