David Wipf, Bhaskar Rao, and Srikantan Nagarajan
Many practical methods for finding maximally sparse coefficient expansions involve solving a regression problem using a particular class of concave penalty functions. From a Bayesian perspective, this process is equivalent to maximum a posteriori (MAP) estimation using a sparsity-inducing prior distribution (Type I estimation). Using variational techniques, this distribution can always be conveniently expressed as a maximization over scaled Gaussian distributions modulated by a set of latent variables. Alternative Bayesian algorithms, which operate in latent variable space leveraging this variational representation, lead to sparse estimators reflecting posterior information beyond the mode (Type II estimation). Currently, it is unclear how the underlying cost functions of Type I and Type II relate, nor what relevant theoretical properties exist, especially with regard to Type II. Herein a common set of auxiliary functions is used to conveniently express both Type I and Type II cost functions in either coefficient or latent variable space facilitating direct comparisons. In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, non-factorial coefficient priors. One prior (at least) from this class maintains several desirable advantages over all possible Type I methods and utilizes a novel, non-convex approximation to the L0 norm with most, and in certain quantifiable conditions all, local minima smoothed away. Importantly, the global minimum is always left unaltered unlike standard L1-norm relaxations. This ensures that any appropriate descent method is guaranteed to locate the maximally sparse solution.
|Published in||IEEE Transactions on Information Theory|