Jin Li, Principal Researcher & Research Manager
Cloud Computing and Storage
MSR Technologies
Email: jinl@microsoft.com

Cloud Storage

  1. Erasure coding
  2. Dr. Li has advocated the use of erasure coding in cloud from 2006. Through out the years, he has evangelized erasure coding to dozens of Microsoft product groups, and according to the feedback he got from the product group engineers, has fined tuned both the design of erasure coded storage system and the erasure code used. Partner with Azure, he and a number of other MSR researchers have participated in the local reconstruction code (LRC) project in Windows Azure Storage. This is a new family of erasure codes that provide significant reduction in storage overhead and cut down the minimum number of fragments that need to be read to reconstruct a data fragment. It leads to hundreds of millions dollars of savings for Microsoft, a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. His group has also architected the erasure code used in Storage Spaces in Windows 8.1 and Windows Server 2012 R2.

  3. Primary data deduplication and end-to-end deduplication
  4. Picking up the rising interest in deduplication from Microsoft Technical Community Network, he has partnered with Windows File Server group to architect and implement the Primary Data Deduplication feature in Windows Server 2012 [Paper] and End-to-End Deduplication for Storage Virtualization in Windows Server 2012 R2. Key contributions include a new data chunking algorithm, a low RAM footprint indexing data structure to detect duplicate data (based on ChunkStash), and a data partitioning and reconciliation technique, the latter two for scaling index resource usage with data size. It leads to major saving to customers (20-82%), and is among top 3 features for Windows File Server introduced at Windows Server 2012. The feature has received rave reviews ( The Register, IT Pro, Arts Technica, IT World, Tech Republic ), and there are evidence that some customers upgrading to WIndows Server 2012 for the primary data deduplication feature only.

  5. SSD(Flash) based storage
  6. When evangelizing for erasure coded storage, he noticed that the storage engineers care dearly for disk I/O performance, while Solid State Drive (SSD) disrupts Hard Disk Drive (HDD) in term of I/O performance. He conducted a series of research to exploit the benefit of SSD for storage applications. "FlashStore" has implemented a SSD optimized, low RAM footprint key-value store that organizes storage on flash in a log-structured manner. It was techtransferred to Pegasus SSD in Microsoft backend. SkimpyStash has implemented an ultra-low RAM footprint key-value store. The storage layer design of SkimpyStash has been incorporated into BW-Tree, a joint project among CCS, MSR Database group, and Azure DocumentDB team, and is shipping in SQL Server 2014 (Hekaton) and Azure DocumentDB.