Lin Xiao
May 2009
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term. Traditional online algorithms, such as stochastic gradient descent, has limited capability of exploiting the problem structure, and their inherent low accuracy often makes it hard to obtain the desired regularization effects, e.g., sparsity under $\ell_1$-regularization. In this paper, we develop extensions of Nesterov's dual averaging method, that can explicitly exploit the regularization structure in an online setting. For general convex regularizations, we show that the regularized dual averaging method achieves the optimal convergence rate $O(1/\sqrt{t})$, where $t$ is the number of iterations or samples in an online algorithm. For strongly convex regularizations, we develop a variant that has a faster convergence rate $O(\ln t/t)$. Computational experiments are presented for the special case of $\ell_1$-regularized online learning.
![]() PDF file |
| Type: | TechReport |
| Number: | MSR-TR-2009-100 |