Voices in the Valley

Microsoft Research Silicon Valley Distinguished Speaker Series 2014

Shafi Goldwasser

Speaker: Shafi Goldwasser

Date: 11 August 2014, 11:00-12:00noon
(refreshments at 10:30am)
[please note unusual time for the talk, this time]

Title: The Cryptographic Lens

Abstract: Going beyond the basic challenge of private communication, in the last 35 years, cryptography has become the general study of correctness and privacy of computation in the presence of a computationally bounded adversary, and as such has changed how we think of proofs, reductions, randomness, secrets, and information.

In this talk I will discuss some beautiful developments in the theory of computing through this cryptographic lens, and the role cryptography can play in the next successful shift from local to global and remote computation.

About the speaker: Goldwasser is the RSA Professor of Electrical Engineering and Computer Science at MIT, and a professor of computer science and applied mathematics at the Weizmann Institute of Science in Israel. Goldwasser received a BS degree in applied mathematics from Carnegie Mellon University in 1979, and MS and PhD degrees in computer science from the University of California, Berkeley, in 1984. Goldwasser was the recipient of the Gödel Prize in 1993 and another in 2001 for her work on interactive proofs and connections to approximation. She was awarded the ACM Grace Murray Hopper award, the RSA award in mathematics, the ACM Athena award for women in computer science, the Benjamin Franklin Medal in Computer and Cognitive Science, the IEEE Emanuel R. Piore award, and the ACM Turing Award for 2012. She is a member of the AAAS, NAS and NAE.


John Gustafson

Speaker: John Gustafson 

Date: 22 July 2014, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: The End of Numerical Error

Abstract: It is time to overthrow a century of methods based on floating point arithmetic. Current technical computing is based on the acceptance of rounding error using numerical representations that were invented in 1914, and acceptance of sampling error using algorithms designed for a time when transistors were very expensive. By sticking to an antiquated storage format (now codified as an IEEE standard) well into the exascale area, we are wasting power, energy, storage, bandwidth, and programmer effort. The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second.

We introduce the ‘unum’ (universal number), a superset of IEEE Floating Point, that contains extra metadata fields that actually save storage, yet give more accurate answers that do not round, overflow, or underflow. The potential they offer for improved programmer productivity is enormous. They also provide, for the first time, the hope of a numerical standard that guarantees bitwise identical results across different computer architectures. Unum format is the basis for the ‘ubox' method, which redefines what is meant by “high performance” by measuring performance in terms of the knowledge obtained about the answer and not the operations performed per second. Examples are given for practical application to structural analysis, radiation transfer, the n-body problem, linear and nonlinear systems of equations, and Laplace’s equation. This is a fresh approach to scientific computing that allows proper, rigorous representation of real number sets for the first time.

About the speaker: Dr. John Gustafson is an applied physicist and mathematician best known for his work in High Performance Computing, having introduced cluster computing in 1985 and having first demonstrated scalable parallel performance on real applications in 1988, for which he won the inaugural Gordon Bell Award. 'Gustafson’s Law’ is taught in computer science courses everywhere. He was Director of the Extreme Research Lab at Intel, where he led their exascale computing program and other energy-efficient computing research. At Sun Microsystems, he led a $50 million supercomputer project centered on breaking "the memory wall". He is the recipient of the IEEE Computer Society’s Golden Core Award. An honors graduate of Caltech and Iowa State University, he has held a number of CTO and CEO positions at publicly held and startup companies, and most recently was Chief Product Architect at AMD, before accepting the CTO role at Ceranovo Inc. in November 2013.


Microsoft Research Silicon Valley Distinguished Speaker Series 2013

Frans Kaashoek

Speaker: Frans Kaashoek, MIT

Date: 8 October 2013, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: The Multicore Evolution and Operating Systems

Multicore chips with hundreds of cores will likely be available soon. Although many applications have significant inherent parallelism (e.g., mail servers), their scalability on many cores can be limited by the underlying operating system. We have built or modified several kernels (Corey, Linux, and sv6) to explore OS designs that scale with increasing number of cores. This talk will summarize our experiences by exploring questions such as what is the impact of kernel scalability on application scalability, is a revolution in kernel design necessary to achieve kernel scalability, and how to build perfectly scalable operating systems.

Joint work with: S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, R. Morris, and N. Zeldovich

About the speaker: M. Frans Kaashoek is a professor at MIT, where he coleads the parallel and distributed operating systems group (http://www.pdos.csail.mit.edu/). Frans is a member of the National Academy of Engineering and the recipient of the ACM SIGOPS Mark Weiser award and the 2010 ACM-Infosys Foundation award. He was cofounder of Sightpath, Inc. and Mazu Networks, Inc.


Speaker: Jeff Dean, Google

Date: 5 September 2013, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Taming Latency Variability and Scaling Deep Learning

Today's large-scale web services operate in warehouse-sized datacenters and run on clusters of machines that are shared across many kinds of interactive and batch jobs.

In the first part of this talk, I'll describe a collection of techniques and practices for lowering response times (especially in the tail of the latency distribution) in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception.

In the second part of the talk, I'll highlight some recent work on using large-scale distributed systems for training deep neural networks. I'll discuss how we can utilize both model-level parallelism and data-level parallelism in order to train large models on large datasets more quickly. I'll also highlight how we have applied this work to a variety of problems in domains such as speech recognition, object recognition, and language modeling.

About the speaker: Jeff joined Google in 1999 and is currently a Google Fellow in Google's Knowledge Group. He has co-designed/implemented five generations of Google's crawling, indexing, and query serving systems, and co-designed/implemented major pieces of Google's initial advertising and AdSense for Content systems. He is also a co-designer and co-implementor of Google's distributed computing infrastructure, including the MapReduce, BigTable and Spanner systems, protocol buffers, LevelDB, systems infrastructure for statistical machine translation, and a variety of internal and external libraries and developer tools. He is currently working on large-scale distributed systems for machine learning. He is a Fellow of the ACM, a member of the U.S. National Academy of Engineering, and a recipient of the Mark Weiser Award and the ACM-Infosys Foundation Award in the Computing Sciences. 



Speaker: Jennifer Widom, Stanford University

Date: 2 May 2013, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Data-Centric Human Computation + From 100 Students to 100,000

Abstract: This talk will have two completely independent parts -- one related to research and the other to education.

In the first part of the talk, I'll describe our ongoing research in leveraging human computation for tasks related to data. Human computation ("crowdsourcing") augments traditional computation with the use of human abilities to solve sub-problems that are difficult for computers, e.g., object or image comparisons, information extraction, relevance judgements, and data gathering. We are addressing two different types of data-centric human computation: (1) Fundamental algorithms, such as sorting, clustering, and data cleaning, in which the basic operations (e.g., compare, filter) are performed by humans. (2) A database-system like platform in which declarative queries are posed by users, and the system orchestrates a combination of stored and crowdsourced data to answer them. Common to both areas is the need to formalize and optimize new tradeoffs among latency (humans are much slower than computers), cost (humans require real money to perform tasks), and quality (humans are inaccurate and inconsistent).

In the second part of the talk, I'll describe my recent experience teaching introductory databases to 60,000 students. Admittedly only 25,000 of them submitted their homework, and a mere 6500 achieved a strong final score. But even with 6500 students, I more than quadrupled the total number of students I've taught in my entire 18-year academic career. I began by "flipping" the way I teach my Stanford course and, as a side-effect, making all components of the course freely available online. But the big inflection point came when I offered the online course in a structured fashion with a schedule, automatically-graded assignments and exams, and most importantly a worldwide community of students (now known as a "MOOC", or Massive Open Online Course). I'll cover a variety of topics related to the massive online course, both logistical and social, while avoiding speculation on the future of higher education.

About the speaker: Jennifer Widom is the Fletcher Jones Professor and Chair of the Computer Science Department at Stanford University. She received her Bachelor's degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering and the American Academy of Arts & Sciences; she received the ACM SIGMOD Edgar F. Codd Innovations Award in 2007 and was a Guggenheim Fellow in 2000; she has served on a variety of program committees, advisory boards, and editorial boards.


Speaker: Michael J. Carey, Information Systems Group, UC Irvine

Date: 8 Feb 2013, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: One Size Fits A Bunch: The ASTERIX Approach to Big Data Management

Abstract: Like most fields, the database field has gone through various eras - a.k.a. pendulum swings - and we are currently in the era of "One Size Fits All: An Idea Whose Time Has Come and Gone". This is great news for industry sectors such as the Bubble Gum industry and the international consortium of Baling Wire manufacturers, and it is also very good news for Information Integration enthusiasts. Why? Because the current state of practice related to "Big Data" involves somehow piecing together many systems whose target sizes fit different use cases. This talk will provide an overview of the ASTERIX project at UC Irvine, a counter-cultural systems project in SoCal in which we are building a new, coherent, scalable, open-source "Big Data" software stack that we hope will solve a range of problems that today require too many piece parts to solve.

About the speaker: Michael J. Carey is an ex-long-time member of the NorCal database community. Carey defected to SoCal in 2008, where he is currently a Bren Professor of Information and Computer Sciences at UC Irvine. Prior to his defection, he worked at BEA Systems in NorCal as the chief architect of (and an engineering director for) BEA's AquaLogic Data Services Platform. Carey also did time as a Professor at the University of Wisconsin-Madison, at IBM Almaden as a database researcher/manager, and as a Fellow (and briefly VP of Software) at e-commerce software startup Propel Software during the 2000-2001 Internet bubble. He is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E. F. Codd Innovations Award. His current research interests are centered around data-intensive computing and scalable data management.

Microsoft Research Silicon Valley Distinguished Speaker Series 2012


Speaker: Scott Shenker, UC Berkeley and ICSI

Date: 28 November 2012, 3:00-4:00 pm
         (refreshments at 2:30pm)

Title: Software-Defined Networking: Introduction and Implications


Abstract: Software-Defined Networking (SDN) has all the signs of a fad: massive hype in the trade rags, an increasing number of academic papers, and widespread confusion about what SDN really means (Isn't it just OpenFlow? Isn't it all about centralization? Isn't it all just hot air?). This talk will try to dispel some of this confusion by discussing how SDN arises from a few natural abstractions for the network control plane.

About the speaker: Scott Shenker spent his academic youth studying theoretical physics but soon gave up chaos theory for computer science. Continuing to display a remarkably short attention span, his research over the years has wandered from performance modeling and networking to game theory and economics. Unable to focus on any single topic, his current research projects include cluster programming models, genomic sequence aligners, software-defined networking, and Internet architecture. Unable to hold a steady job, he currently splits his time between the U. C. Berkeley Computer Science Department and the International Computer Science Institute. His career highlight is to have once coauthored a paper with Doug Terry.  



Speaker: Michael Kearns, UPenn

Date: 2 Feb 2012, 3:00-4:00 pm
         (refreshments at 2:30pm)

Title: Experiments in Social Computation


Abstract: What do the theory of computation, economics and related fields have to say about the emerging phenomena of crowdsourcing and social computing?

Most successful applications of crowdsourcing to date have been on problems we might consider "embarrassingly parallelizable" from a computational perspective. But the power of the social computation approach is already evident, and the road cleared for applying it to more challenging problems.

In part towards this goal, for a number of years we have been conducting controlled human-subject experiments in distributed social computation in networks with only limited and local communication. These experiments cast a number of traditional computational problems --- including graph coloring, consensus, independent set, market equilibria, biased voting and network formation --- as games of strategic interaction in which subjects have financial incentives to collectively "compute" global solutions. I will overview and summarize the many behavioral findings from this line of experimentation, and draw broad comparisons to some of the predictions made by the theory of computation and microeconomics.

About the speaker: Michael Kearns is a professor of Computer and Information Science at the University of Pennsylvania, where he is the director of the new Penn program in Market and Social Systems Engineering (www.mkse.upenn.edu). His research interests include topics in machine learning, algorithmic game theory, social networks, computational finance and artificial intelligence. More information is available at www.cis.upenn.edu/~mkearns.

Microsoft Research Silicon Valley Distinguished Speaker Series  2011


Michael Franklin, UC Berkeley

Date: 17 November 2011, 3:00-4:00 pm
(refreshments at 2:30pm) 

Title: AMPLab: Making Sense of Big Data with Algorithms, Machines and People

Abstract: Organizations of all types are trying to figure out how to extract value from their data. The key challenge is that the massive scale and diversity of the continuous flood of information we are faced with breaks existing technologies. State-of-the-art Machine Learning algorithms do not scale to massive data sets. Existing data analytics frameworks cope poorly with incomplete and dirty data and cannot process heterogeneous multi-format information. Current large-scale processing architectures struggle with diversity of programming models and job types and do not support the rapid marshalling and unmarshalling of resources to solve specific problems. All of these limitations lead to a Scalability Dilemma: beyond a point, our current systems tend to perform worse as they are given more data, more processing resources, and involve more people — exactly the opposite of what should happen.

To address these issues, a group of us from machine learning, systems, databases, and networking at Berkeley started a new five-year research effort called the AMPLab, where AMP stands for "Algorithms, Machines, and People". AMPLab envisions a world where massive data, cloud computing, communication and people resources can be continually, flexibly and dynamically be brought to bear on a range of hard problems by huge numbers of people connected to the cloud via increasingly powerful mobile and other client devices. We are developing a new data analytics stack that implements this vision. AMPLab's research is supported in part by Microsoft and 18 other leading technology companies, including founding sponsors Google and SAP.In this talk, I will give an overview of the AMPLab motivation and research agenda and discuss several of our initial projects. One such project, CrowdDB, is developing infrastructure to support hybrid cloud/crowd query answering systems - leveraging the very different skills, and performance, reliability, and cost characteristics of large groups of machines and large groups of people. More information is at http://amplab.cs.berkeley.edu

About the speaker: Michael Franklin is a Professor of Computer Science at UC Berkeley, focusing on new approaches for data management and data analysis. At Berkeley he directs the Algorithms, Machines and People Laboratory (AMPLab). He is founder and CTO of Truviso, Inc. a real-time data analytics company that enables customers to quickly make sense of diverse, high-speed, continuous streams of information. He is a Fellow of the Association for Computing Machinery, and a recipient of the National Science Foundation CAREER award and the ACM SIGMOD "Test of Time" award. He was recently awarded the Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is currently serving as a committee member on the US National Academy of Science study on Analysis of Massive Data. He received his Ph.D. from the University of Wisconsin in 1993.



Speaker: Hal Varian, Google and UC Berkeley

Date: 11 October 2011, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Predicting the Present with Search Engine Data


Abstract: It is now possible to acquire real time information on economic variables using various commercial sources. I illustrate how one can use Google Trends data to measure the state of the economy in various sectors, and discuss some of the ramifications for research and policy.

About the speaker: Hal R. Varian is the Chief Economist at Google. He started in May 2002 as a consultant and has been involved in many aspects of the company, including auction design, econometric analysis, finance, corporate strategy and public policy. He is also an emeritus professor at the University of California, Berkeley in three departments: business, economics, and information management. He received his SB degree from MIT in 1969 and his MA in mathematics and Ph.D. in economics from UC Berkeley in 1973. He has also taught at MIT, Stanford, Oxford, Michigan and other universities around the world. Dr. Varian is a fellow of the Guggenheim Foundation, the Econometric Society, and the American Academy of Arts and Sciences. He was Co-Editor of the American Economic Review from 1987-1990 and holds honorary doctorates from the University of Oulu, Finland and the University of Karlsruhe, Germany.

Professor Varian has published numerous papers in economic theory, industrial organization, financial economics, econometrics and information economics. He is the author of two major economics textbooks which have been translated into 22 languages. He is the co-author of a bestselling book on business strategy, Information Rules: A Strategic Guide to the Network Economy and wrote a monthly column for the New York Times from 2000 to 2007.




Speaker: Margo Seltzer, Harvard

Date: 11 August 2011, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Provenance Everywhere

Abstract: Digital provenance describes the ancestry or history of a digital object. Computer science research in provenance has addressed issues in provenance capture in operating systems, command shells, languages, workflow systems and applications. However, it's time to begin thinking seriously about provenance interoperability, what it means, and how we can achieve it. We have undertaken several projects that integrate provenance across multiple platforms. Doing so introduces many challenging research opportunities.

In this talk, I'll present our Provenance-Aware Storage System, focusing on our experiences integrating provenance across different layers of abstraction. I'll present some of our use cases and discuss important issues for further research.

About the speaker: Margo I. Seltzer is a Herchel Smith Professor of Computer Science in the Harvard School of Engineering and Applied Sciences. Her research interests include provenance, file systems, databases, and transaction processing systems. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an Architect for Oracle Corporation. She is the Vice-President of the USENIX Association and a member of the Computing Research Association's Computing Community Consortium. She is a Sloan Foundation Fellow in Computer Science, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship. She is recognized as an outstanding teacher and mentor. She received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, and the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010.

Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.




Speaker: Gerard Berry, INRIA

Date: 18 January 2011, 3:00-4:00pm
(refreshments at 2:30pm) 

Title: Hardware and software synchronous design and verification with Esterel v7

Esterel v7 is a formal language for hardware design and synthesis, with software simulation and formal property verification. Esterel v7 embodies specific time- and event-based temporal design primitives that help designing complex control behavior and lead to more efficient implementation and verification. The language also embodies data path constructs that are type-checked in a way that helps optimal bit-level-sizing. The language and semantics support single or multiple clocks, with automatic clock-gating generation in hardware linked to temporal statements. Implementation is performed by compiling into Verilog / VHDL, simulation by compiling into C / C++, and verification by compiling into the Prover SL model-checker. The semantics and implementation guarantee that simulation and verification always exactly match the hardware implementation. Esterel v7 can also be used for the efficient design and verification of pure software controllers.

Esterel v7 and its Esterel Studio IDE have been developed at Esterel Technologies from 2001 to 2009, following the former Esterel v5 academic development that started in 1983. The system has been used for production and research at Texas Instruments, ST Microelectronics, Intel, NXP, etc., with formal verification very early in the cycle. Several designs have been realized by pairing Esterel for complex control parts with C-based solutions for signal processing data path. Esterel Studio is now property of Synopsys. Its cousin synchronous technology SCADE developed by Esterel Technologies is heavily used for safety-critical software development in avionics, railways, etc.

In the talk, Professor Berry will discuss the design principles of Esterel v7 and present its main synthesis, compiling, and verification algorithms. He will give examples of application of the Esterel approach to systems-on-chip components design, and discuss the advantages and limitations of the approach.

About the Speaker: Gérard Berry is currently Director of Research at INRIA. His main area of research is the design and verification of hardware circuits and software embedded systems. He was Chief Scientist of the Esterel Technologies company from 2001 to 2009. He acted as Professor at Collège de France in 2007-2008 and 2009-2010. He is a member of Académie des sciences, Académie des technologies, and Academia Europaea. 


Past Speakers 


Professor and head of the Machine Learning Department at Carnegie Mellon University

Speaker: Tom Mitchell, CMU

Date: 19 October 2010, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Never-Ending Learning



Abstract: What would it take to develop machine learners that run forever, each day improving their performance and also the accuracy with which they learn? This talk will describe our attempt to build a never-ending language learner, NELL, that runs 24 hours per day, forever, and that each day has two goals: (1) extract more structured information from the web to populate its growing knowledge base, and (2) learn to read better than yesterday, by using previously acquired knowledge to better constrain its subsequent learning.

The approach implemented by NELL is based on two key ideas: coupling the semi-supervised training of hundreds of diffent functions that extract different types of information from different web sources, and automatically discovering new constraints that more tightly couple the training of these functions over time. NELL has been running nonstop since January 2010 (follow it at http://rtw.ml.cmu.edu), and had extracted a knowledge base containing approximately 400,000 beliefs as of September 2010. This talk will describe NELL, its successes and its failures, and use it as a case study to explore the question of how to design never-ending learners.

About the speaker: Tom M. Mitchell is the E. Fredkin University Professor and head of the Machine Learning Department at Carnegie Mellon University. His research interests lie in machine learning, artificial intelligence, and cognitive neuroscience. Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI). Mitchell believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.

Professor of Computer Science and Electrical Engineering, Stanford University


Speaker: Hector Garcia Molina, Stanford University

Date: 2 August 2010, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: CourseRank: A Social Site for Academic Course Planning and Evaluation




Abstract: CourseRank is a course planning tool we have developed for Stanford students, and is already in use by most undergraduates. In this talk I will give an overview of CourseRank, and some of the research that has gone into its recommendation engine, its requirements checking engine, and its browsing interface.

About the speaker: Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. He was the chairman of the Computer Science Department from January 2001 to December 2004. From 1997 to 2001 he was a member the President's Information Technology Advisory Committee (PITAC). From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. His research interests include distributed computing systems, digital libraries and database systems. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. He holds an honorary PhD from ETH Zurich (2007). Garcia-Molina is a Fellow of the Association for Computing Machinery and of the American Academy of Arts and Sciences; is a member of the National Academy of Engineering; received the 1999 ACM SIGMOD Innovations Award; is a Venture Advisor for Onset Ventures, and is a member of the Board of Directors of Oracle.

Professor (Research) of Computer Science at Stanford University.


Speaker: John Ousterhout

Date: 26 August 2010, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: RAMCloud: Scalable High-Performance Storage Entirely in DRAM



Abstract: Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of new large-scale Web applications, and improvements in disk capacity have out-stripped improvements in access speed. In this talk I will describe a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. A RAMCloud can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. By combining low latency and large scale, RAMClouds will enable a new class of applications that manipulate large datasets more intensively than has ever been possible.

About the speaker: John Ousterhout is Professor (Research) of Computer Science at
Stanford University. His current research focuses on infrastructure for Web applications and cloud computing. Ousterhout's prior positions include 14 years in industry where he founded two companies (Scriptics and Electric Cloud), preceded by 14 years as Professor
of Computer Science at U.C. Berkeley. He is the creator of the Tcl scripting language and is also well known for his work in distributed operating systems and file systems.

Ousterhout received a BS degree in Physics from Yale University and a PhD in Computer
Science from Carnegie Mellon University. He is a member of the National Academy of Engineering and has received numerous awards, including the ACM Software System Award, the ACM Grace Murray Hopper Award, the National Science Foundation Presidential Young Investigator Award, and the U.C. Berkeley Distinguished Teaching

Kai LiPaul M. Wythes ’55, P’86 and Marcia R. Wythes P’86 Professor at Princeton University


Speaker: Kai Li

Date: 15 July 2010, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Building a Commercial Deduplication Storage System



Abstract: We all hate tapes, but they have been used to store backup and archive data in data centers for several decades. In 2001, Data Domain set its mission to replace tape libraries by developing deduplication storage system products. Since 2003, Data Domain has launched several deduplication storage appliances and data replication eco-systems for data centers to replace tape libraries. These products can reduce storage footprints, WAN bandwidth requirements, and power consumptions by an order of magnitude. To date, it has deployed about 30,000 systems to about 5,000 data centers, and deduplication storage has now become the new standard for online data protection.

In this talk, I will give a retrospective overview of developing a commercial deduplication storage system for data centers, and a summary of the lessons we have learned in building and deploying a commercially successful deduplication storage product line.

About the speaker: Kai Li is a Paul M. Wythes ’55, P’86 and Marcia R. Wythes P’86 Professor at Princeton University, where he worked as a faculty member since 1986. Before that, he received his PhD degree from Yale University, M.S. degree from University of Science and Technology of China, Chinese Academy of Sciences, and B.S. degree from Jilin University. He has led several influential projects such as Shared Virtual Memory (SVM) project which proposed and prototyped the initial Distributed Shared Memory for a cluster of computers, Scalable High-performance Really Inexpensive MultiProcessor (SHRIMP) project which developed fast protected user-level communication mechanisms for clusters, Scalable Display Wall project for large-scale data visualization, and Content-Aware Search System project for large feature-rich datasets. He was elected as ACM fellow in 1998. In 2001, he co-founded Data Domain, Inc. Since then, he served as CTO and consulting Chief Scientist.


Amit Singhal - Google FellowGoogle Fellow Google Inc

Speaker: Amit Singhal

Date: 12 October 2009, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: Web Search: Challenges and Directions


Abstract: These are exciting times for the field of Web search. Search engines are used by millions of people every day, and the number is growing rapidly. This growth poses unique challenges for search engines: they need to operate at unprecedented scales while satisfying an incredible diversity of information needs. Furthermore, user expectations have expanded considerably, moving from "give me what I said" to "give me what I want". Finally, with the lure of billions of dollars of commerce guided by search engines, we have entered a new world of "Adversarial Information Retrieval". This talk will show that the world of algorithm and system design for commercial search engines can be described by two of Murphy's Laws: a) If anything can go wrong, it will; and b) Even if nothing can go wrong, it will anyway.

About the speaker: Amit Singhal is a Google Fellow. According to the New York Times, Mr. Singhal is the master of what Google calls its “ranking algorithm” — the formulas that decide which Web pages best answer each user’s question. A native of India, Amit got his bachelors degree in Computer Science from IIT Roorkee in 1989. Amit holds a MS in Computer Science from University of Minnesota, Duluth, and a Ph.D. in Computer Science from Cornell University in Ithaca, NY. At Cornell Amit studied with Gerard Salton, a pioneer in the field of Information Retrieval. Amit runs a team in Google's Search Quality group. He is and his team are responsible for the Google search algorithms.

Robert E. Dinning Professor of Computer Science and Engineering at the University of Washington


Speaker: Thomas Anderson

Date: 29 September 2009, 3:00-4:00 pm
(refreshments at 2:30pm)

Title: A Case for OneSwarm



Abstract: OneSwarm is a platform for wide-scale distributed peer-to-peer applications, designed as an open-source alternative to cloud computing for sharing user-generated content. While storing and sharing data through centralized data centers offers many advantages to the system designer, it is not without its drawbacks for the rest of us: a loss of privacy, application lock-in, susceptibility to censorship, applicability to the long tail of unprofitable user-generated content, and limits on system scalability and reliability. Peer-to-peer systems to date suffer from many of these same issues, leaving us without much of an alternative. OneSwarm is our attempt to address these issues, starting with the fundamental assumption of no centralized trust — OneSwarm is not only open source, it is designed to resist being “owned”. This raises questions of incentives, user interface design, storage, privacy, and application management, that will be the focus of the talk.

About the speaker: Thomas Anderson is the Robert E. Dinning Professor of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical, robust, and efficient computer systems, including distributed systems, operating systems, computer networks, multiprocessors, and security. He is an ACM Fellow, winner of the ACM SIGOPS Mark Weiser Award, winner of the IEEE Bennett Prize, and he has co-authored over a dozen award papers.

IBM Professor of Engineering and Applied Mathematics in Computer Science, Cornell University


Speaker: John Hopcroft

Date: 17 August 2009, 3:00-4:00 pm

Title: Future Research Directions in Computer Science




Abstract: The field of computer science is changing rapidly due to the increased power of computing, the size and complexity of problems we deal with, the merging of computing and communication, and the availability of vast amounts of information in digital form. Because of this, research in programming languages, compilers, operating systems, and algorithms that has been a major focus of researchers in the past will be supplemented by new areas. In the future we will deal with social networks, tracing the flow of ideas in the scientific literature, high dimensional data, searching, ranking, collaborative filtering and detecting changes in data or transactions before they become obvious. This talk will explore research topics in several of these areas: tracking communities in social networks, tracing the flow of ideas in scientific literature and understanding high dimensional data.

About the speaker: John E. Hopcroft is the IBM Professor of Engineering and Applied Mathematics in Computer Science at Cornell University. He received his BS (1961) from Seattle University and his M.S. (1962) and Ph.D. (1964) in electrical engineering from Stanford University. His research centers on theoretical aspects of computer science. He served as dean of Cornell University’s College of Engineering from 1994 until 2001. He is a member of the National Academy of Sciences, of the National Academy of Engineering, and a fellow of the American Academy of Arts and Sciences, the American Association for the Advancement of Science, the Institute of Electrical and Electronics Engineers, the Association of Computing Machinery, and the Society of Industrial and Applied Mathematics. In 1986 he was awarded the A. M. Turing Award for his research contributions. In 1992, he was appointed by President Bush to the National Science Board, which oversees the National Science Foundation, and served through May 1998. He received the IEEE Harry Goode Memorial Award in 2005, the Computing Research Association’s Distinguished Service Award in 2007, and the ACM Karl V. Karlstrom Outstanding Educator Award in 2009. He has honorary degrees from Seattle University, the National College of Ireland, the University of Sydney and is an honorary professor of the Beijing Institute of Technology. He serves on the Packard Foundation’s Science Advisory Board, Microsoft Technical advisory board for Research Asia and the advisory boards of IIIT Delhi and the College of Engineering at Seattle University.

Silvio MicaliDougald Jackson Professor of Electrical Engineering and Computer Science, MIT
Speaker: Silvio Micali

Date: 27 May 2009, 3:00-4:00 pm

Title: A New Approach to Auctions and Mechanism Design



Abstract: Mechanism design's goal is to guarantee a given property P, defined on "state information" known only to a set of players, by leveraging the players' knowledge and rationality. Traditionally, this goal is interpreted as designing a game G such that P holds at one or more of G's equilibria. But due to collusion, as well as computational complexity and privacy concerns, traditional mechanisms may be very far from guaranteeing their desired properties.

We thus put forward a new approach to mechanism design that (1) does not rely on equilibria, (2) is resilient to collusion, and (3) harmonizes incentives with computational complexity and privacy. We exemplify our approach for revenue in combinatorial auctions.

(Joint work with Jing Chen, and part of a joint research effort with Paul Valiant)

About the speaker: Silvio Micali received his Laurea in Mathematics from the University of Rome in 1978, and his Ph.D. in Computer Science form the University of California at Berkeley in 1983. He joined MIT's faculty in 1983, where he is now the Dougald Jackson Professor of Electrical Engineering and Computer science.

Silvio's research interests are cryptography, proof systems, zero knowledge, pseudo-random generation, secure protocols, and mechanism design.

Silvio is the recipient of the Goedel Prize (in theoretical computer science) and the RSA prize (in Cryptography). He is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences.

Daphne KollerProfessor of Computer Science at Stanford University


Speaker: Daphne Koller

Date: 12 November 2008, 3:00-4:00pm

Title: Probabilistic Models for Holistic Scene Understanding


Abstract: Over recent years, computer vision has made great strides towards annotating parts of an image with symbolic labels, such as object categories (things) or segment types (stuff). However, we are still far from the goal of providing a semantic description of an image, such as "a man, walking a dog on a sidewalk, carrying a backpack".

In this talk, I will describe some projects we have done that attempt to use probabilistic models to move us closer towards the goal. The first part of the talk will present methods that use a more holistic scene analysis to improve our performance at core tasks such as object detection, segmentation, or 3D reconstruction.

The second part of the talk will focus on finer-grained modeling of object shape, so as to allow us to annotate images with descriptive labels related
to the object shape, pose, or activity (e.g., is a cheetah running or standing). These vision tasks rely on novel algorithms for core problems in machine learning and probabilistic models, such as efficient algorithms for probabilistic correspondence, transfer learning across related object classes for learning from sparse data, and more.

About the speaker: Dr. Daphne Koller is a professor of Computer Science at Stanford University. Her research focuses on developing and using machine learning and probabilistic methods to model and analyze complex domains, such as biological systems or the physical world. She is the author of over 150 refereed papers in venues as diverse as Science, Nature Genetics, Machine Learning, Neural Information Processing Systems, and Games and Economic Behavior. She is the recipient of numerous awards, including the ONR Young Investigator Award (1998), the Presidential Early Career Award for Scientists and Engineers (1999), the IJCAI Computers and Thought Award (2001), the Cox Medal for Fostering Excellence in Undergraduate Research (2003), the MacArthur Fellowship (2004), and the ACM/Infosys Award (2008).

Michael StonebrakerAdjunct Professor of Computer Science at M.I.T


Speaker: Michael Stonebraker

Date: 2 October 2008, 3:00-4:00pm

Title: Morpheus: A Deep Web Search Engine


Abstract: Much attention has been paid to searching the shallow web for information visible on pages accessible to crawlers. However, information behind web forms and other kinds of interfaces is typically invisible to current crawlers, and cannot be found using current shallow web systems. It is estimated that the deep web is 500 times as large as the shallow web. Much of the deep web is data (as opposed to text) and contains transportation schedules, pricing information, availability, tardiness, and personal data.

This talk describes Morpheus, a search system oriented toward the deep web which we have built at M.I.T. The architecture of the systems will be described, and a demo given. Open issues will be considered, and efforts to "wrap" the shallow web will be explored. Finally, some of the experiences of a startup (currently in stealth mode) to commercialize Morpheus will be recounted.

About the speaker: Dr. Michael Stonebraker has been a pioneer of data base research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, the object-relational DBMS, POSTGRES, and the federated data system, Mariposa. All three prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. He is the founder of three successful Silicon Valley startups, whose objective was to commercialize these prototypes.

Professor Stonebraker is the author of scores of research papers on data base technology, operating systems and the architecture of system software services. He was awarded the prestigious ACM System Software Award in 1992, for his work on INGRES. Additionally, he was awarded the first annual Innovation award by the ACM SIGMOD special interest group in 1994, and has been recognized by Computer Reseller News as one of the top five software developers of the century. Moreover, Forbes magazine named him one of the 8 innovators driving the Silicon Valley wealth explosion during their 80th anniversary edition in 1998. He was elected to the National Academy of Engineering in 1998 and is presently an Adjunct Professor of Computer Science at M.I.T.

Robbert Van RenesseCo-founder and Technical advisor, FAST Enterprise Search; Principal Research Scientist, CS dept, Cornell University


Speaker: Robbert Van Renesse

Date: 19 May 2008, 3:00-4:00pm

Title: Building scalable and fault-tolerant enterprise search platforms




Abstract: Enterprise search has become a critical part of an organization's infrastructure. At large organizations, documents are generated at a high rate and have to be available for search within seconds. High availability and high performance for search are both essential, while the use of specialized hardware should be avoided. In this talk I will cover some of the distribution and replication techniques that we have developed in order to meet the difficult requirements.

Joint work with Fred B. Schneider, Johannes Gehrke, and Dag Johansen.

About the speaker: Dr. Robbert Van Renesse is a Principal Research Scientist at the Department of Computer Science. He received his Ph.D. from the Vrije Universiteit in Amsterdam in 1989 where he developed the Amoeba Distributed Operating System. Subsequently he worked on the Plan 9 operating system at AT&T Bell Laboratories. Since joining Cornell in 1991 he has worked on fault-tolerant distributed systems. He co-founded D.A.G. Labs that was acquired by FAST, and Reliable Network Solutions whose technology was acquired by Amazon.com. Other companies that use technology developed by Van Renesse include Microsoft, IBM, and Stratus.

FAST is a global provider of enterprise search technologies. FAST's solutions are used by more than 2,600 global customers and partners, including America Online, Dell, IBM, Reuters, and the US Army. FAST is headquartered in Norway. The FAST Group operates globally with presence in Europe, the United States, Asia Pacific, Australia, South America, and the Middle East. For further information about FAST, please visit www.fastsearch.com.

Bruce CroftDistinguished Professor, University of Massachusetts Amherst; Director of the Center for Intelligent Information Retrieval


Speaker: Bruce Croft

Date: 9 June 2008, 3:00-4:00pm

Title: Longer Queries, Better Answers?




Abstract: Web search engines produce effective rankings for queries consisting of a small number of keywords, within the accepted limitations of what it means for a Web page to be “relevant” to a query. On the other hand, people are perfectly capable of describing what they are looking for more precisely with a longer query. The problem is that we currently don’t know what to do with these longer queries, unless they happen to be a “factoid” query of the type used in question answering systems. Even in TREC evaluations, where query response times are not an issue, long queries generally are less effective than short queries. In this talk, I will review the approaches that have been taken with longer queries, and present two pieces of our research related to this issue; generating keyword queries from long queries and finding answers in a community-based question and answer archive.

About the speaker: W. Bruce Croft is a Distinguished Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he became the Director of the Center for Intelligent Information Retrieval (CIIR), which combines basic research with technology transfer to a variety of government and industry partners. He has published more than 180 articles related to information retrieval. Dr. Croft was elected a Fellow of ACM in 1997, received the Research Award from the American Society for Information Science and Technology in 2000, and received the Gerard Salton Award from the ACM Special Interest Group in Information Retrieval (SIGIR) in 2003.


2007 Speakers

Brewster KahleDigital Librarian, Director and Co-Founder, Internet Archive.

Speaker: Brewster Kahle

Date: 27 March 2007, 12:00pm

Title: Universal Access to Human Knowledge (Or Public Access to Digital Materials)


Abstract: The goal of universal access to our cultural heritage is within our grasp. With current digital technology we can build comprehensive collections, and with digital networks we can make these available to students and scholars all over the world. The current challenge is establishing the roles, rights, and responsibilities of our libraries and archives in providing public access to this information. With these roles defined, our institutions will help fulfill this epic opportunity of our digital age.

About the speaker: Brewster has built technologies, companies, and institutions to advance the goal of universal access to all knowledge. He currently oversees the non-profit Internet Archive as founder and Digital Librarian, which is now one of the largest digital archives in the world. As a digital archivist, Brewster has been active in technology, business, and law.

Fernando PereiraAndrew and Debra Rachleff professor and chairman, Dept. of Computer and Information Science, U Penn


Speaker: Fernando Pereira

Date: 11 April 2007

Title: Learning to Analyze Sequences.




Abstract: Sequential data -- speech, text, genomic sequences -- floods our storage servers. Much useful information in these data is carried by implicit structure: phonemes and prosody in speech, syntactic structure in text, genes and regulatory elements in genomic sequences. Over the last six years, several of us have been investigating structured linear models, a unified discriminative learning approach to sequence analysis problems. I will review the approach and illustrate it with applications to parsing, information extraction, and gene finding. I will conclude with a summary of other applications and current research questions.

Joint work with Axel Bernal, Koby Crammer, John Lafferty, Andrew McCallum, Ryan McDonald, and Fei Sha.

About the speaker: Professor Fernando Pereira is chairman of the department of Computer and Information Science, University of Pennsylvania. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982. Before joining Penn, he held industrial research and management positions at SRI International, at AT&T Labs, where he led the machine learning and information retrieval research department from September 1995 to April 2000, and at WhizBang Labs, a Web information extraction company. His main research areas are computational linguistics and machine learning, and he is a main contributor to several advances in finite-state models for speech and text processing in everyday industrial use. He has 97 research publications on computational linguistics, speech recognition, machine learning, and logic programming, and several issued and pending patents on speech and language processing, and on human-computer interfaces. He was elected Fellow of the American Association for Artificial Intelligence in 1991 for his contributions to computational linguistics and logic programming, and he is a past president of the Association for Computational Linguistics.

Noam NisamProfessor, School of Engineering and Computer Science, Hebrew University, and Google.


Speaker: Noam Nisan

Date: 26 July 2007, 12:00pm

Title: Algorithmic Mechanism Design




Abstract: One of the challenges that the Internet raises is the necessity of designing distributed protocols for settings where the participating computers are owned and operated by different owners with different goals. Over the last decade or so there has been much research that aims to address these issues using ideas taken from the micro-economic field of mechanism design. In this talk I will survey the current state of the field: how mechanism design is applied in computational settings, how far can classical ideas go, and what are the challenges for further research. Among the applications discussed will be combinatorial auctions, cost sharing, scheduling, and routing in networks.

About the speaker: Professor Noam Nisan received his Ph.D. from the University of California, at Berkeley, and is now a Professor of Computer Science in the Hebrew University of Jerusalem. He published three books and numerous research papers on algorithms, complexity theory, communication, computerized auctions, and electronic commerce. He has received several professional awards including the ACM 1988 Distinguished Dissertation award for his dissertation "Using Hard Problems to Create Pseudorandom Generators.", and the Michael Bruno award, granted annually by Yad Hanadiv (also known as the Rothschild Foundation) to outstanding Israelis in the field of science and learning, for his research in the field: Electronic markets and auctions and economic mechanisms in computation.

Adi ShamirPaul and Marlene Borman Professor, Computer Science and Applied Mathematics, Weizmann Institute


Speaker: Adi Shamir

Date: 28 August 28 2007, 12:00pm.

Title: A Top View of Side Channel Attacks





Abstract: Side channel attacks are powerful techniques which can bypass the mathematical security of many cryptosystems by observing the physical properties of their implementations. In this talk I will survey some new side channel attacks developed by me and my colleagues during the last couple of years on PC's, smart cards, RFID tags, etc.

About the speaker: Professor Adi Shamir obtained his MSc and PhD in Computer Science from the Weizmann Institute in 1975 and 1977 respectively. His thesis was titled, "Fixed Points of Recursive Programs". After a year postdoc at Warwick University, he did research at MIT from 1977–1980 before returning to be a member of the faculty of Mathematics and Computer Science of the Weizmann Institute. He was one of the inventors of the RSA algorithm (along with Ron Rivest and Len Adleman), and has made numerous contributions to the fields of cryptography and computer science.

Shamir is the winner of the 2002 ACM Turing award, jointly with Leonard M. Adleman, Ronald R. Rivest, "For their ingenious contribution for making public-key cryptography useful in practice." Shamir has also received ACM's Kannelakis Award, the Erdős Prize of the Israel Mathematical Society, the IEEE's W.R.G. Baker Prize, the UAP Scientific Prize, The Vatican's PIUS XI Gold Medal and the IEEE Koji Kobayashi Computers and Communications Award.

Bruce Maggsprofessor, School of Computer Science, CMU, and Vice President, Research Akamai Technologies.


Speaker: Bruce Maggs

Date: 23 October 2007, 12:00pm

Title: Lessons in Engineering Self-Managed Networks





About the speaker: Dr. Bruce Maggs received the S.B., S.M., and Ph.D. degrees in computer science from the Massachusetts Institute of Technology in 1985, 1986, and 1989, respectively. His advisor was Charles Leiserson. In 1994, he joined Carnegie Mellon, where he is now a Professor in the Computer Science Department. While on a two-year leave-of-absence from Carnegie Mellon, Maggs helped to launch Akamai Technologies, serving as its Vice President for Research and Development, before returning to Carnegie Mellon. He retains a part-time role at Akamai as Vice President for Research. Maggs is spending the 2007-2008 academic year at Duke University. He has also held visiting faculty positions at M.I.T. and Princeton University.

Maggs's research focuses on networks for parallel and distributed computing systems. In 1986, he became the first winner (with Charles

Leiserson) of the Daniel L. Slotnick Award for Most Original Paper at the International Conference on Parallel Processing, and in 1994 he received an NSF National Young Investigator Award. He was co-chair of the 1993-1994 DIMACS Special Year on Massively Parallel Computation.

Maggs serves on the ACM Council as a Member-at-Large, and and has served on the steering committees for the ACM Symposium on Parallel Algorithms and Architectures (SPAA) and ACM Internet Measurement Conference (IMC).

Raj ReddyMozah Bint Nasser professor, School of Computer Science, CMU


Speaker: Raj Reddy

Date: 3 Dec 2007, 12:00pm

Title: Global Access to Information: Research Issues in Data Mining and Text Mining



Abstract: In this talk we will present research issues that arise when attempting to provide "Global Access to Information". To be true to this vision, one must also resolve the problems of Language Divide and Literacy Divide. Over 80% of the global population is not English-literate and over 20% of the population is functionally illiterate, i.e., they cannot read and understand in any language! We will use example from Million Book Digital Library projectand from a project in India to provide health information to illiterate people. Jaime Carbonell stated, about 10 years ago, that the CMU Language Technology Institute research mission is "getting the right information, to the right people, at the right time, on the right medium, in the right language and with the right level of detail". In spite of major advances in search technologies, we are not close to achieving the information society bill of rights in providing global access to iformation. This talk will provide a forum for discussion on the research agenda in Data Mining and Text Mining necessary for fulfilling this vision.

About the speaker: Dr. Raj Reddy began his academic career as an Assistant Professor at Stanford in 1966. He has been a member of the Carnegie Mellon faculty since 1969. He served as the founding Director of the Robotics Institute from 1979 to 1991 and the Dean of School of Computer Science from 1991 to 1999. Dr. Reddy's research interests include the study of human-computer interaction and artificial intelligence. His current research interests include Million Book Digital Library Project; a Multifunction Information Appliance that can be used by the uneducated; Fiber To The Village Project; Mobile Autonomous Robots; and Learning by Doing.

He is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. He was president of the American Association for Artificial Intelligence from 1987 to 89. Dr. Reddy was awarded the Legion of Honor by President Mitterand of France in 1984. He was awarded the ACM Turing Award in 1994, the Okawa Prize in 2004, the Honda Prize in 2005, and the Vannevar Bush Award in 2006. He served as co-chair of the President's Information Technology Advisory Committee (PITAC) from 1999 to 2001 under Presidents Clinton and Bush.

Practical Information

Talks are open to the public. Events will be held in the Titan conference room in Building 6 on Microsoft’s Silicon Valley Campus (1288 Pear Avenue), and light refreshments will be served.

Mailing List

The talks are announced via a low-volume mailing list. To subscribe, send an email to listserv AT lists.research.microsoft.com with no subject and a single line containing "subscribe svc-dss <your email> <your name>" (without quotes).

Series Coordinators

  • Marcos K. Aguilera, Microsoft Research
  • Abhimanyu Das, Microsoft Research
Upcoming Speakers a Glance