Kristin M. Tolle is the Director of the Data Science Initiative in Microsoft Research Outreach, Redmond, WA.
Since joining Microsoft in 2000, Dr. Tolle has acquired numerous patents and worked for several product teams including the Natural Language Group, Visual Studio, and the Microsoft Office Excel Team. Since joining Microsoft Research’s outreach program in 2006, she has run several major initiatives from Biomedical computing and environmental science to more traditional computer and information science programs around natural user interactions and data curation. She was also directed the development of the Microsoft Translator Hub and the Environmental Science Services Toolkit.
Dr. Tolle is an editor, along with Tony Hey and Stewart Tansley, of one of the earliest books on data science, The Fourth Paradigm: Data Intensive Scientific Discovery. Her current focus is develop an outreach program to engage with academics on data science in general and more specifically around using data to create meaningful and useful user experiences across devices platforms.
Prior to joining Microsoft, Tolle was an Oak Ridge Science and Engineering Research Fellow for the National Library of Medicine and a Research Associate at the University of Arizona Artificial Intelligence Lab managing the group on medical information retrieval and natural language processing. She earned her Ph.D. in Management of Information Systems with a minor in Computational Linguistics.
Dr. Tolle's present research interests include global public health, climate change, mobile computing to enable field scientists and inform the public, sensors used to gather ecological and environmental data, and integration and interoperability of large heterogeneous environmental data sources. She collaborates with several major research groups in Microsoft Research including the natural language processing group, eScience, computational science laboratory, computational ecology and environmental science, and the sensing and energy research group.
- Environmental Science Services Initiative: Development Director of this project for creating a common, easy-to-use infrastructure and user experience that gives ready access to existing and new Microsoft Research tools targeted at climate change and environmental science.
- National Flood Interoperability Experiment: Working with researchers at several major institutions as well as government agencies who provide much of the data to develop a real-time flood mapping system to improve better National disaster response.
- DataUp: Project Director and Development Manager to create a data curation tool to enable environmental scientists to preserve and share their datasets (largely in Microsoft Excel) with the broader community.
- Microsoft Translator Hub: Product and Program Director to create a tool to enable the creation of custom translation models for businesses and language communities.
- Cultural Preservation Initiative: Primary member of this group focused on language preservation.
- Devices Sensors and Mobility for Healthcare: Founder and Program Director for this program founded in 2006 to seed the development of over 25 mHealth projects and culminated in starting an annual mHealth Summit run by the Foundation for the National Institutes of Health.
- Computational Challenges in Genome Wide Association Studies: Program Manager for this program designed to further computer science research in support of furthering scientific discoveries in GWAS.
- Strata + Hadoop World-San Jose
29-31 March 2016
San Jose, CA, USA
11-14 April 2016
Montreal, Quebec, Canada
- Intl Conf on Big Data Management and Analytics
25-26, April 2016
- Strata + Hadoop World - London
22-25 February 2016
San Francisco, CA
- ICML @ NYC
19-24 June 2016
New York City, New York
13-17 August 2016
San Francisco, CA
- Data science at Microsoft Research
Researchers who can best capture, process, predict, and visualize data will be able to accelerate their work and push the boundaries of what is possible. We want to help you take advantage of our research and development efforts so that you can spend more time doing data science. Besides a range of tools and services, we provide valuable datasets across many areas of interest. You can download more than 40 datasets from this website and nearly 150 from the Azure Marketplace. Learn more.
- Distribution Modeller
Distribution Modeller (temporary name only!) is CEES' end-to-end tool that lets the researcher to rapidly import data, supplement that data with environmental info from FetchClimate, specify an arbitrary model by point and click or in code, parameterize the model against the data using Filzbach, make and visualize predictions with a full propagation of parameter uncertainty – then package and share everytihng, in a way that is inspectable, repeatable, and modifiable.
- Project CLEO
The goal of project CLEO is to develop devices and services to encourage and enable participatory sensing and citizen scientists. A core technology developed in the project is to make location sensing energy efficient, so devices can be small and light, sample more frequently, and of low cost. The approach is called Cloud-Offloaded GPS (or CO-GPS).
- Microsoft Translator Hub
Microsoft Translator Hub empowers businesses and communities to build, train, and deploy customized automatic language translation systems—bringing better and specialized translation quality to established languages, as well as to the many languages of the world that are not yet supported by major translation providers.
A cloud-based user experience, Microsoft Layerscape makes it easy for the Earth-sciences community to visualize and analyze large, complex datasets to facilitate the discovery of new environmental insights into Earth. By using powerful, everyday tools like Microsoft Excel, Layerscape enables users to explore new ways of looking at Earth and oceanic data, and build predictive modeling in areas such as climate change, health epidemics, and oceanic shifts.
Retrieve global environmental information with the click of a button or a few lines of code. FetchClimate is a fast, free, intelligent environmental information retrieval service that operates over the cloud to return only the environmental data you need. FetchClimate can be accessed either through a simple web interface or via a few lines of code inside any program.
SenseCam is a wearable camera with a wide-angle lens that takes periodically photos without user intervention. This simple device turns out to have many valuable applications.
- Kukjin Lee, Arnd Christian Konig, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, and Jeff Naughton, Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), ACM – Association for Computing Machinery, 26 June 2016.
- Anshumali Srivastava, Arnd Christian König, and Misha Bilenko, Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams, in ACM SIGMOD Conference, ACM – Association for Computing Machinery, 26 June 2016.
- Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang, Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), ACM – Association for Computing Machinery, June 2016.
- Yeye He, Kaushik Chakrabarti, Tao Cheng, and Tomasz Tylenda, Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora, WWW – World Wide Web Consortium (W3C), April 2016.
- Mohan Yang, bolin ding, surajit chaudhuri, and kaushik chakrabarti, Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers, VLDB – Very Large Data Bases, August 2015.