MSR NYC Data Science Seminar Series

This seminar series is bringing data science researchers from Columbia University, NYU, Cornell Tech and Microsoft Research together. Our goal is to increase interactions within the broader New York data science community, and to provide a new forum for discussions on data science research.


The events in this series start with a formal talk session (45 minutes). Invited speakers present short talks, providing their views on the opportunities and challenges in data science research. The second part of the event (2 hours) is a wine and cheese social designed to enable researchers to exchange ideas in a relaxed setting.

Upcoming Events

December 4th, 2014

Speaker: David Blei, Columbia University

 Title: Topic Models and User Behavior

Abstract: Probabilistic topic models provide a suite of tools for analyzing
large document collections. Topic modeling algorithms discover the
latent themes that underlie the documents and identify how each
document exhibits those themes. Topic modeling can be used to help
explore, summarize, and form predictions about documents. Topic
modeling ideas have been adapted to many domains, including images,
music, networks, genomics, and neuroscience.

Traditional topic modeling algorithms analyze a document collection
and estimate its latent thematic structure. However, many collections
contain an additional type of data: how people use the documents. For
example, readers click on articles in a newspaper website, scientists
place articles in their personal libraries, and lawmakers vote on a
collection of bills. Behavior data is essential both for making
predictions about users (such as for a recommendation system) and for
understanding how a collection and its users are organized.

In this talk, I will review the basics of topic modeling and describe
our recent research on collaborative topic models, models that
simultaneously analyze a collection of texts and its corresponding
user behavior. We studied collaborative topic models on 80,000
scientists' libraries from Mendeley and 100,000 users' click data from
the arXiv. Collaborative topic models enable interpretable
recommendation systems, capturing scientists' preferences and pointing
them to articles of interest. Further, these models can organize the
articles according to the discovered patterns of readership. For
example, we can identify articles that are important within a field
and articles that transcend disciplinary boundaries.

More broadly, topic modeling is a case study in the large field of
applied probabilistic modeling. Finally, I will survey some recent
advances in this field. I will show how modern probabilistic modeling
gives data scientists a rich language for expressing statistical
assumptions and scalable algorithms for uncovering hidden patterns in
massive data.

Bio: David Blei is a Professor of Statistics and Computer Science at
Columbia University. His research is in statistical machine learning,
involving probabilistic topic models, Bayesian nonparametric methods,
and approximate posterior inference. He works on a variety of
applications, including text, images, music, social networks, user
behavior, and scientific data.David earned his Bachelor's degree in Computer Science and Mathematics from Brown University (1997) and his PhD in Computer Science from the
University of California, Berkeley (2004). Before arriving to
Columbia, he was an Associate Professor of Computer Science at
Princeton University. He has received several awards for his
research, including a Sloan Fellowship (2010), Office of Naval
Research Young Investigator Award (2011), Presidential Early Career
Award for Scientists and Engineers (2011), Blavatnik Faculty Award
(2013), and ACM-Infosys Foundation Award (2013).

Previous Events

September 23, 2014



       Speaker: Deborah Estrin, Cornell Tech

Title: Small, n=me, data

Abstract: Consider a new kind of cloud-based app that would create a picture of an individuals life over time by continuously, securely, and privately analyzing the digital traces they generate 24x7. The social networks, search engines, mobile operators, online games, and e-commerce sites that they access every hour of most every day extensively use these digital traces to tailor service offerings and to improve system performance and in some cases to target advertisements. Our premise is that these diverse and messy, but highly personalized, data can be analyzed to draw powerful inferences about an individual, and for that individual. Use of applications that are fueled by these traces could enhance, and even transform, our experiences as consumers, patients, passengers, customers, family members, as well as users of online media. This talk will discuss precedents for small data in mobile health, and the opportunities and challenges of broadening the scope of small data capture, storage, and use.

Bio: Deborah Estrin (PhD, MIT (1985); BS, UCB (1980)) is a Professor of Computer Science at Cornell Tech in New York City ( and a Professor of Health Policy and Research at Weill Cornell Medical College. She is a co-founder of Open mHealth ( Her current focus is on mobile health and small data, leveraging the pervasiveness of mobile devices and digital interactions for health and life management (TEDMED Estrin was the founding director of the NSF-funded Science and Technology Center for Embedded Networked Sensing (CENS) at UCLA (2002-12). Awards include: ACM Athena Lecturer (2006) and Anita Borg Institute's Women of Vision Award for Innovation (2007). She is an elected member of the American Academy of Arts and Sciences (2007) and National Academy of Engineering (2009).



April 24, 2014 - Opening Event



Opening Remarks

Speaker: Duncan Watts

Bio: Prior to joining Microsoft, Duncan Watts was a Senior Principal Research Scientist at Yahoo! Research, where he directed the Human Social Dynamics group. Prior to joining Yahoo!, he was a full professor of Sociology at Columbia University, where he taught from 2000-2007. His research on social networks and collective dynamics has appeared in a wide range of journals, from Nature, Science, and Physical Review Letters to the American Journal of Sociology and Harvard Business Review. He is also the author of three books, most recently Everything is Obvious (Once You Know The Answer) (Crown Business, 2011). He holds a B.Sc. in Physics from the Australian Defense Force Academy, and a Ph.D. in Theoretical and Applied Mechanics from Cornell University.

Technical Talks

Yann LeCun, NYU and Facebook

Title & Abstract: Yann presents a demo on deep learning and vision. 

Bio: Yann is Director of AI Research at Facebook, and Silver Professor of Dara Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department. He received the Electrical Engineer Diploma from Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the 2014 IEEE Neural Network Pioneer Award.

Mor Naaman, Cornell Tech

Title: Data and People in Connective Media

Abstract: In five minutes or less, I will talk about how we use methods from social science, people-centered design, data science and machine learning to understand social media data large and small, and build new applications that help us make sense of the city from (public) social media data. I'll also say a word about Cornell Tech and our Connective Media hub. OK, six minutes may be needed to squeeze it all in.

Bio: Naaman is an associate professor at Cornell Tech's Jacobs Institute. He is also a co-founder and Chief Scientist at, a startup founded to make sense of the real-time web and social media. Mor's research applies multidisciplinary methods to gain new insights about people and society from social media data, and to develop novel tools to make this data more accessible and usable in various settings. He gets awards, too, including the NSF Early Faculty CAREER Award, research awards from Google, Yahoo!, and Nokia, and three best paper awards.

Tony Jebara, Columbia University

Title: Learning From Network Connectivity and Mobile Phone Data

Abstract: Many real-world networks are described by both connectivity information as well as features for every node. While most network growth models are based on link analysis, we explore how an individual's data profile without any connectivity information can be used to infer their connectivity with other users. For example, in a class of incoming freshmen students with no known friendship connections, can we predict which pairs will become friends at the end of the year using only their profile information? Similarly, can we using co-location to predict communication? In other words, by observing only the mobile location data from users, can we predict what pairs of users are likely to communicate? To learn how to reconstruct these networks, we present structure-preserving metric learning and apply it to Facebook data, Wikipedia data, FourSquare data and mobile phone call detail records,

Bio: Tony is Associate Professor of Computer Science at Columbia University. He chairs the Center on Foundations of Data Science as well as directs the Columbia Machine Learning Laboratory. His research intersects computer science and statistics to develop new frameworks for learning from data with applications in social networks, spatio-temporal data, vision and text. Jebara has founded or advised startups including Sense Networks (acquired by, AchieveMint, Agolo, and Bookt (acquired by RealPage NASDAQ:RP). He is the author of the book Machine Learning: Discriminative and Generative. In 2004, Jebara was the recipient of the Career award from the National Science Foundation.

Panel Discussion

Panel Topic: Opportunities and Challenges in Data Science Research
Panel Moderator: Jennifer Chayes, Managing Director, MSR New York City

Bio: Jennifer Tour Chayes is Managing Director of Microsoft Research New York City as well as the Microsoft Research New England lab in Cambridge. Before this, she was research area manager for Mathematics, Theoretical Computer Science and Cryptography at Microsoft Research Redmond. Chayes joined Microsoft Research in 1997, when she co-founded the Theory Group. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, and algorithmic game theory. She is the co-author of almost 100 scientific papers and the co-inventor of more than 20 patents.

Panel Members: Yann LeCun, Mor Naaman, Tony Jebara


