Microsoft Research Knowledge Tools group: we create productivity tools by applying machine learning to tough problems in technical computing, computer systems, and security.
Overview
The Knowledge Tools group started in July, 2004. Out mission is to improve the data/human interface. People at work have to cope with large, confusing data sets in order to get their job done and make decisions. We want to build tools to help these people cope with data complexity. To build these tools, we must make advances in three areas:
- Programming languages/tools
- Machine learning algorithms that scale to large data sets
- User interfaces and visualization for interacting with data
Areas
We believe that several different types of knowledge workers can benefit from our tools:
- Security analysts must monitor large event logs and gigantic flows of network traffic
- System administrators need to understand the complex activity of large server farms
- Engineers and scientists want to explore data and collaborate.
To make research progress, we build prototype tools and get them into the hands of these types of users. We build many of our prototype tools on top of IronPython, a version of Python for .NET.
Primary contact: John Platt
Other colleagues
Publications
We have analyzed system and network behavior, in order to build more secure and efficient networks and computers:
- Finding Similar Failures using Callstack Similarity by K. Bartz, J.W. Stokes, J.C. Platt, R. Kivett, D. Grant, S. Calinoiu, G. Loihle, Proc. SysML, (2008).
- Fast Variational Inference for Large-Scale Internet Diagnosis by J.C. Platt, E. Kıcıman, D.A. Maltz, Advances in Neural Informations Processing Systems 20, (2008).
- Why did my PC suddenly slow down? by S. Basu, J. Dunagan, G. Smith, Proc. SysML (2007).
- Analyzing and Improving a BitTorrent Network's Performance Mechanisms by A. Bharambe, C. Herley, V.N. Padmanabhan, Proc InfoCom (2006).
- Mining Web Logs to Debug Distant Connectivity Problems by E. Kıcıman, D.A. Maltz, M. Goldszmidt, J.C. Platt, SIGCOMM Workshop on Mining Network Data, (2006).
- Automatically Extracting Fields from Unknown Network Protocols by K. Gopalratnam, S. Basu, J. Dunagan, H. Wang, Proc. Systems and Machine Learning Workshop, (2006).
- Some Observations on BitTorrent by A. Bharambe, C. Herley, V.N. Padmanabhan, ACM Sigmetrics, (2005).
- Automatic Misconfiguration Troubleshooting with PeerPressure by H.J. Wang, J. Platt, Y. Chen, R. Zhang, Y.-M. Wang, Proc. 6th Symposium on Operating System Design and Implementation, (2004).
We have published papers on security at scale: exploiting data statistics to make systems and networks more secure. Publications in this area include:
- Nobody Sells Gold for the Price of Silver: Dishonesty, Uncertainty, and the Underground Economy by C. Herley and D. Florêncio, Proc. WEIS, (2009).
- Passwords: If We're So Smart, Why Are We Still Using Them? by C. Herley, P.C. van Oorschot, and A.S. Patrick, Proc. Financial Crypto, (2009).
- A Profitless Endeavor: Phishing as Tragedy of the Commons by C. Herley and D. Florêncio, Proc. NSPW, (2008).
- One-time Password Access to Any Server Without Changing the Server by D. Florêncio and C. Herley, Proc. ISC, (2008).
- Can "Something You Know" be Saved? by B. Coskun, C. Herley, Proc. ISC, (2008).
- Protecting Financial Institutions from Brute-Force Attacks by C. Herley and D. Florêncio, Proc. SEC, (2008).
- A Large Scale Study of Automated Web Search Traffic by G. Buehrer, J. Stokes, K. Chellapilla, Proc. Int'l Workshop on Adversarial Information Retrieval on the Web, (2008).
- Do Strong Web Passwords Accomplish Anything? by D. Florêncio, C. Herley, and B. Coskun, Proc. USENIX HotSEC, (2007).
- Evaluating a Trial Deployment of Password Re-Use for Phishing Prevention by D. Florêncio, C. Herley, Proc. APWG eCrime, (2007).
- A Large Scale Study of Web Password Habits by D. Florêncio, C. Herley, Proc. WWW, (2007).
- KLASSP: Entering Passwords on a Spyware Infected Machine using a Shared-Secret Proxy by D. Florêncio, C. Herley, Proc. ACSAC, (2006).
- How to Login from an Internet Cafe without Worrying about Keyloggers by D. Florêncio, C. Herley, Symp. on Usable Privacy and Security, (2006).
- Password Rescue: A New Approach to Phishing Prevention by D. Florêncio, C. Herley, 1st USENIX Workshop on Hot Topics in Security, pp. 7-11, (2006).
- Analyzing and Improving Anti-Phishing Schemes by D. Florêncio, C. Herley, Proc. SEC (2006).
We have studied how to help knowledge workers find and maintain awareness of important information:
- Learning to Generalize for Complex Selection Tasks by A. Ritter, S. Basu, Proc. IUI, (2009).
- Learning from Multi-topic Web Documents for Contextual Advertisement by Y. Zhang, A.C. Surendran, J.C. Platt, M. Narasimhan, Proc. KDD, pp. 1051-1059, (2008).
- BLEWS: Using Blogs to Provide Context for News Articles by M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, A.C. König, Proc. Int'l Conf. on Weblogs and Social Media, (2008).
- Adaptive Layout for Dynamically Aggregated Documents by E. Schrier, M. Dontcheva, C. Jacobs, G. Wade, D. Salesin, Proc. Intelligent User Interfaces, pp. 99-108, (2008).
- Scalable Summaries of Spoken Conversations by S. Basu, S. Gupta, M. Mahajan, P. Nguyen, J.C. Platt, Proc. Intelligent User Interfaces, (2008).
- Parsing Ink Annotations on Heterogeneous Documents by X. Wang, M. Shilman, S. Raghupathy, Eurographics Workshop on Sketch-Based Interfaces and Modeling, (2006).
- Incremental Aspect Models for Mining Document Streams by A.C. Surendran, S. Sra, ECML/PKDD, (2006).
- SWISH: Semantic Analysis of Window Titles and Switching History by N. Oliver, G. Smith, C. Thakkar, A.C. Surendran, Int'l Conference on Intelligent User Interfaces, (2006).
- Automatic Discovery of Personal Topics to Organize Email by A.C. Surendran, J.C. Platt, E. Renshaw, 2nd Conference on Email and Anti-Spam, (2005)
- Adaptive Document Layout by C. Jacobs, W. Li, E. Schrier, D. Bargeron, D. Salesin, Comm. ACM, Vol 47, Issue 8, (2004)
- Modeling Conversational Dynamics as a Mixed-Memory Markov Process by T. Choudhury, S. Basu, Proc. NIPS, Vol. 17 (2004).
- Toward Universal Mobile Interaction for Shared Displays by T. Paek, M. Agrawala, S. Basu, S. Drucker, T. Kristjansson, R. Logan, K. Toyama, A. Wilson, Proc. CSCW, pp. 266-269, (2004).
Finally, we have created many generic machine learning algorithms, to better build these applications:
- Learning to Classify with Missing and Corrupted Features by O. Dekel, O. Shamir, L. Xiao, Machine Learning Journal, to appear, (2009).
- Fast Low-Rank Semidefinite Programming for Embedding and Clustering by B. Kulis, A.C. Surendran, J.C. Platt, AISTATS, (2007).
- Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning by A. Kapoor, E. Horvitz, S. Basu, IJCAI, (2007).
- Online Decoding of Markov Models under Latency Constraints by M. Narasimhan, P. Viola, M. Shilman, ICML, (2006).
- Multiple Instance Boosting for Object Detection by P. Viola, J.C. Platt, C. Zhang, NIPS, Vol 18, pp. 1417-1426, (2006).
- Semi-Supervised Learning with Conditional Harmonic Mixing by C.J.C. Burges, J.C. Platt, in "Semi-Supervised Learning", O. Chapelle, B. Schölkopf, and A. Zien, eds., MIT Press, (2006).
- Redundant Bit Vectors for Quickly Searching High-Dimensional Regions by J. Goldstein, J.C. Platt, C.J.C. Burges, Proc. Sheffield Machine Learning Workshop, Springer Lecture Notes in Computer Science 3635, (2005).
- Extensions of the Informative Vector Machine by N.D. Lawrence, J.C. Platt, M.I. Jordan, Proc. Sheffield Machine Learning Workshop, Springer Lecture Notes in Computer Science 3635, (2005).
- Learning to Rank using Gradient Descent by C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, 22nd International Conference on Machine Learning, (2005).
- FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms by J.C. Platt, Proc. 10th International Workshop on Artificial Intelligence and Statistics, pp. 261-268, (2005).
- Learning to Learn with the Informative Vector Machine by N.D. Lawrence, J.C. Platt, International Conference on Machine Learning, Paper No. 65, (2004).
For our publications on machine learning related to media, please see our Statistical Media Processing page.



