Girish Subramanian and Yogesh Simmhan
16 October 2009
With the increasing throughput of Next Generation DNA sequencing machines, it has become important to come up with efficient ways of processing the sequence data and producing assembled whole human genome sequences for research and diagnostic purposes.
In this paper, we describe our efforts in scaling HapCUT, a haplotype phasing from UCSD, using a parallel implementation that runs on Windows Azure Cloud infrastructure. One of our novel contributions is a tool to reduce the effort required to port, deploy and execute existing methods in a .NET library or a Windows executable within the cloud; we use this tool to run the extant phasing libraries within Azure. We are currently conducting experiments to study the performance implications and advantages of running the haplotype phasing on the cloud as compared to a local Windows HPC cluster.
In Microsoft Research eScience Workshop
Publisher Microsoft Research
© 2009 Microsoft Corporation. All rights reserved.