Towards Resource-Elastic Machine Learning

  • Dhruv Mahajan ,
  • Sundararajan Sellamanickam ,
  • ,
  • Keerthi Selvaraj

In this article, we argue that resource elasticity is a key requirement for distributed machine learning. Not only do computational resources disappear without warning (e.g. due to machine failure), modern resource managers also re-negotiate the available resources while a job is running: Additional machines may have become available or already reserved ones have been re-assigned to other jobs. We show how to formalize this problem and present an initial approach for linear learners.