Supporting NGS Pipelines in the cloud

  • Ignacio Blanquer Blanquer ,
  • Goetz Brasche ,
  • Jacek Cala ,
  • Fabrizio Gagliardi ,
  • Dennis Gannon ,
  • Hugo Hiden ,
  • Hakan Soncu ,
  • ,
  • Andrés Tomás ,
  • Simon Woodman

EMBnet.journal |

The availability of workflow management systems and public cloud computing infrastructures have become a major breakthrough in the usage of computing resources for scientists. However, the combination of both approaches has shortcomings, such as the need to reduce administration effort to user, or the need for simple programming models for the transition from previous more conventional computing approaches and the support of legacy software. With this in mind, Microsoft Research has started several initiatives to improve the use of clouds in science. The “cloud4science” initiative, see http://www. cloud4science.eu/, considers next generation sequencing (NGS) as an excellence reference use case. This initiative builds on the results of the VENUS-C and e-Science Central projects, in which two different scientific workflow engines, namely the Generic Worker and e-Science Central were applied to solve specific bioinformatics problems requiring intensive computing. We propose an integration and enhancement of these two workflow engines with a set of selected bioinformatics tools to provide an easy-to-use framework for a cloud- enabled NGS pipeline for mutation analysis. The resulting framework and components will simplify the deployment of processing services, the access to data and the sharing of the results.