Jin Li, Partner Research Manager
Cloud Computing and Storage
MSR Technologies
Email: jinl@microsoft.com

Jin LiDr. Jin Li is a Partner Researcher Manager of the Cloud Computing and Storage (CCS) group in Microsoft Research - Technologies. He engaged research in an end-to-end approach, and believes that the ultimate milestone of cool system research is a product of significant impact. In addition to pursue original research and publishing papers in premier venues, he leads the team to go the extra miles to work with product groups and create huge business impact for Microsoft.

Dr. Li's latest passion is Prajna, a distributed computing platform. Prajna is developed echoing the call for Microsoft to be the productivity and platform company for the mobile-first and cloud-first world. It fills the void of real-time big data computing on .Net platform. Prajna is open sourced at https://github.com/msrccs/Prajna/. It is designed to be a generic distributed computing platform, with core functionality being the execution of an arbitrary closure (C#, F#, native code, etc.) on any remote node, in public cloud or in private cluster. It supports interactive big data computing across a cluster with in-memory computation. The programming API is similar to Spark. Prajna has also a managed web service (Prajna Hub), which can help developer to quickly prototype and host cloud service and run services on mobile Apps. Prajna also supports distributed machine learning (e.g., distributed neural network trainer using Caffe on each node).

Dr. Li has advocated the use of erasure coding in cloud from 2006. Through out the years, he has evangelized erasure coding to dozens of Microsoft product groups, and according to the feedback he got from the product group engineers, has fined tuned both the design of erasure coded storage system and the erasure code used. Partner with Azure, he and a number of other MSR researchers have participated in the local reconstruction code (LRC) project in Windows Azure Storage. This is a new family of erasure codes that provide significant reduction in storage overhead and cut down the minimum number of fragments that need to be read to reconstruct a data fragment. It leads to hundreds of millions of dollars of savings for Microsoft, a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. His group has also architected the erasure code used in Storage Spaces in Windows 8.1 and Windows Server 2012 R2, and the erasure code used in Lync, Xbox and RemoteFX.

Picking up the rising interest in deduplication from Microsoft Technical Community Network, he has partnered with Windows File Server group to architect and implement the Primary Data Deduplication feature in Windows Server 2012 [Paper] and End-to-End Deduplication for Storage Virtualization in Windows Server 2012 R2. Key contributions include a new data chunking algorithm, a low RAM footprint indexing data structure to detect duplicate data (based on ChunkStash), and a data partitioning and reconciliation technique, the latter two for scaling index resource usage with data size. It leads to major saving to customers (20-82%), and is among top 3 features for Windows File Server introduced at Windows Server 2012. The feature has received rave reviews ( The Register, IT Pro, Arts Technica, IT World, Tech Republic ), and there are evidence that some customers upgrading to WIndows Server 2012 for the primary data deduplication feature only.

When evangelizing for erasure coded storage, he noticed that the storage engineers care dearly for disk I/O performance, while Solid State Drive (SSD) disrupts Hard Disk Drive (HDD) in term of I/O performance. He conducted a series of research to exploit the benefit of SSD for storage applications. "FlashStore" has implemented a SSD optimized, low RAM footprint key-value store that organizes storage on flash in a log-structured manner. It was techtransferred to Pegasus SSD in Microsoft backend. SkimpyStash has implemented an ultra-low RAM footprint key-value store. The storage layer design of SkimpyStash has been incorporated into BW-Tree, a joint project among CCS, MSR Database group, and Azure DocumentDB team, and is shipping in SQL Server 2014 (Hekaton) and Azure DocumentDB.

Dr. Li has assisted in the technical evaluation for the acquisition of Calista Technologies by Microsoft. After the close of acquisition, he partnered with the Remote Desktop Virtualization (RDV) team, and has assisted to architect and implement the RemoteFX for WAN feature in Windows 8 and Windows Server 2012, which provides fast and fluid user experience in a remote session running over any WAN and wireless networks [Paper].

Dr. Li received his Ph.D. (with honor) from Tsinghua University (Beijing, China) in 1994. He joined Microsoft in 1999, as one of the founding members of Microsoft Research Asia (Beijing, China) (he has won a Microsoft Gold Star service award in 1999 for his contribution). From 2000, Dr. Li has also served as an Affiliated Professor in Tsinghua University. He was awarded the prestigious Microsoft Gold Star Service Award 4 times, in 1999, 2001, 2006 and 2010.
Dr. Li was the recipient of Young Investigator Award from Visual Communication and Image Processing’98 (VCIP) in 1998, the ICME 2009 Best Paper Award, and the USENIX ATC 2012 Best Paper Award. He is/was the Associate Editor/Guest Editor of IEEE Trans. On Multimedia, Journal of Selected Area of Communication, Journal of Visual Communication and Image Representation, P2P networking and applications, Journal of Communications. He is the current ICME steering committee chair. He has served on the TPCs and Organization Committee of many conferences, e.g., as the General Chair of PV2009, the lead Program Chair of ICME 2011, the TPC Chair of CCNC 2013 and the TPC Chair of ACM Multimedia 2016. He is an IEEE Fellow.