Shravan Narayanamurthy, Markus Weimer, Dhruv Mahajan, Tyson Condie, Sundararajan Sellamanickam, and S. Sathiya Keerthi
In this article, we argue that resource elasticity is a key requirement for distributed machine learning. Not only do computational resources disappear without warning (e.g. due to machine failure), modern resource managers also re-negotiate the available resources while a job is running: Additional machines may have become available or already reserved ones have been re-assigned to other jobs. We show how to formalize this problem and present an initial approach for linear learners.
|Publisher||NIPS 2013 BigLearn Workshop|