Nina Mishra

Microsoft Research, Search Labs

Background

Data mining, machine learning, and privacy-preserving algorithms

Research Interests

The design and analysis of algorithms for unearthing patterns in massively large, dynamic datasets, specifically:

  • Internet: Many new data sets are generated by virtue of the Internet.  My work investigates algorithms for mining this data, for example, to improve web search or to discover communities in social networks.
  • Scalability: Modern data sets are massively large and often streaming.  Given a known algorithm that only works on a small, static data set, I think about how best to modify the algorithm for a large, dynamic data set, while also approximately retaining its original functionality.
  • Clustering: A process that, given a collection of points, groups similar points together and places dissimilar points apart.  The points can vary from vertices in a graph to points in a metric space.  I study algorithms for efficiently discovering good clusters.
  • Privacy: Many data sets that are mined today contain confidential information.  My research seeks to strike a fine balance between simultaneously enabling the discovery of large-scale statistical patterns while disabling the recovery of private information.

Biography

Publications: Edited Proceedings

Theoretical Advances in Data Clustering.  Special Issue of the Machine Learning Journal.  With Rajeev Motwani.  2004.

Proceedings of the Twentieth International Conference on Machine Learning.  With Tom Fawcett.  2003

Publications: Journal Articles

Finding Strongly-Knit Clusters in Social Networks.With Robert Schreiber, Isabelle Stanton and Robert E. Tarjan. Internet Mathematics. To Appear

Privately Finding Specifications. With Westley Weimer. IEEE Transactions on Software Engineering. 34(1)21-32, 2008.

A New Conceptual Clustering Framework. With Dana Ron and Ram Swaminathan. Machine Learning journal.  Kluwer Academic Publishers. 56(1-3): 115-151, 2004.

Version Spaces and the Consistency Problem. With Haym Hirsh and Leonard Pitt. Artificial Intelligence Journal (AIJ). 156(2):115-138, 2004.

Clustering Data Streams: Theory and Practice. With Sudipto Guha, Adam Meyerson, Rajeev Motwani and Liadan O'Callaghan. IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE). Special issue on Online Analysis and Querying of Continuous Data Streams. 15(3): 515-528, 2003.

Efficient Read-Restricted Monotone CNF/DNF Dualization by Learning with Membership Queries. With Carlos Domingo, Leonard Pitt. Machine Learning journal. Kluwer Academic Publishers. 37(1): 89-110 (1999).

Learning from a Consistently Ignorant Teacher. With Michael Frazier, Sally A. Goldman, Leonard Pitt. Journal of Computer and System Sciences (JCSS). Special Issue on Computational Learning Theory. 52(3): 471-492 (1996).

Publications: Conference/Workshop/Other

Spatial Scan Statistics for Graph Clustering. With Bei Wang, Jeff Phillips, Robert Schreiber, Dennis Wilkinson and Robert E. Tarjan. SIAM Conference on Data Mining. 2008.

A Survey of Query Auditing Techniques for Data Privacy. With Shubha Nabar, Krishnaram Kenthapadi and Rajeev Motwani. Book chapter in Privacy-Preserving Data Mining: Models and Algorithms. To be published by Kluwer Academic Publishers. 2008.

Clustering Data Streams. With Sudipto Guha. Book chapter in Data Stream Management: Processing High-Speed Data Streams. To be published by Springer-Verlag. 2007.

Clustering Social Networks. With Robert Schreiber, Isabelle Stanton and Robert E. Tarjan. 5th Workshop On Algorithms And Models For The Web-Graph (WAW). 2007.

Towards Robustness in Query Auditing. With Shubha Nabar, Bhaskara Marthi, Krishnaram Kenthapadi and Rajeev Motwani. 32nd International Conference on Very Large Data Bases (VLDB). 2006.

When Random Sampling Preserves Privacy. With Kamalika Chaudhuri. 26th Annual International Cryptology Conference (CRYPTO). 2006.

Privacy via Pseudorandom Sketches. With Mark Sandler. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 2006.

Finding Closely-Related Groups of Objects in Very Large Datasets. With Robert Schreiber and Robert Tarjan. Hewlett-Packard Technical Conference (HP TechCon), 2006.

Sublinear Projective Clustering with Outliers. With Rajeev Motwani and Sergei Vassilvitskii. 15th Annual Fall Workshop on Computational Geometry and Visualization, 2005.

Simulatable Auditing. With Krishnaram Kenthapadi and Kobbi Nissim. 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2005. Earlier version "How Auditors May Inadvertently Compromise Your Privacy" appeared in PORTIA Workshop on Sensitive Data in Medical, Financial, and Content-Distribution Systems, 2004.

Enabling Privacy for the Paranoids (Vision Paper). With Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram Kenthapadi, Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, Jennifer Widom and Ying Xu
Proceedings of the 30th Intl Conference on Very Large Databases (VLDB), 2004.

On Identifying Stable Ways to Configure Systems. With Gagan Aggarwal, Mayur Datar and Rajeev Motwani. Proceedings of the 1st International Conference on Autonomic Computing (ICAC). 2004.

Secure Computation of the kth Ranked Element. With Gagan Aggarwal and Benny Pinkas. Advances in Cryptology - EUROCRYPT 2004: International Conference on the Theory and Applications of Cryptographic Techniques. 2004.

On Finding Large Conjunctive Clusters. With Dana Ron and Ram Swaminathan.
Proceedings of the 16th Annual Conference on Learning Theory (COLT). 2003.

Towards the Visualization of Overlapping Sets. With Xavier Boyen and Liadan O'Callaghan. DIMACS Workshop on Computational Geometry. 2002.

Large Clusters of Web Pages. With Dana Ron and Ram Swaminathan. Workshop on Algorithms and Models for the Web Graph (WAW). 2002.

Streaming Data Algorithms for High-Quality Clustering. With Liadan O'Callaghan, Adam Meyerson, Sudipto Guha, and Rajeev Motwani.
Proceedings of the 18th International Conference on Data Engineering (ICDE). 2002.

Sublinear Time Approximate Clustering. With Dan Oblinger and Leonard Pitt.
Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2001.

Clustering Data Streams. With Sudipto Guha, Rajeev Motwani and Liadan O'Callaghan. 41st Annual Symposium on Foundations of Computer Science (FOCS), IEEE Press, 2000.

Learning from a Monotonous, Ignorant Teacher. PhD Thesis.  UIUC Technical Report, UIUCDCS-R-97-2023, October 1997.

Generating all Maximal Independent Sets of Bounded-Degree Hypergraphs.
With Leonard Pitt. Proceedings of the 10th Annual Conference on Computational Learning Theory (COLT).  pp. 211-217, ACM Press, 1997.

Version Spaces without Boundary Sets. With Haym Hirsh, Leonard Pitt.
Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). pp.491-496. 1997.

Learning from a Consistently Ignorant Teacher. With Michael Frazier, Sally A. Goldman, Leonard Pitt. Proceedings of the Annual Conference on Computational Learning Theory (COLT). 1995.

Professional Activities

Editorial Boards

Program Chair

Program Committees

  • PODS’09: Principles of Database Systems
  • AAAI'08: Conference on Artificial Intelligence
  • KDD'08: Knowledge Discovery and Data Mining
  • KDD'07: Knowledge Discovery and Data Mining
  • KDD'06: Knowledge Discovery and Data Mining
  • ICML'06: International Conference on Machine Learning
  • ICML'05: International Conference on Machine Learning
  • KDD'05: Knowledge Discovery and Data Mining
  • SDM'05: SIAM International Conference on Data Mining
  • KDD'04: Knowledge Discovery and Data Mining
  • SDM'03: SIAM International Conference on Data Mining
  • ICDM'02: IEEE International Conference on Data Mining
  • SDM'02: SIAM International Conference on Data Mining
  • KDD'01: Knowledge Discovery and Data Mining
  • IJCAI'01: International Joint Conference on Artificial Intelligence
  • ICML'00: International Conference on Machine Learning

NSF Panelist: 2002, 2004, 2007.

Advisory Board,ICML'04

Teaching

Groups

Contact Info

  • Microsoft Research
    Silicon Valley Campus
    1065 La Avenida
    Mountain View, CA 94043