Understanding and Evaluating Sparse Linear Discriminant Analysis

  • Yi Wu ,
  • David Wipf ,
  • Jeong-Min Yun

Published by Artificial Intelligence and Statistics (AISTATS)

Linear discriminant analysis (LDA)
represents a simple yet powerful technique for partitioning a p-dimensional
feature vector into one of K classes based on a linear projection learned from
N labeled observations. However, it is well-established
that in the high-dimensional setting (p > N) the underlying projection
estimator degenerates. Moreover, any linear discriminate function involving a
large number of features may be difficult to interpret. To ameliorate these
issues, two general categories of sparse LDA modifications have been proposed,
both to reduce the number of active features and to stabilize the resulting
projections. The first, based on optimal scoring, is more straightforward to
implement and analyze but has been heavily criticized for its ambiguous
connection with the original LDA formulation. In contrast, a second strategy
applies sparse penalty functions directly to the original LDA objective but
requires additional heuristic trade-off parameters, has unknown global and
local minima properties, and requires a greedy sequential optimization
procedure. In all cases the choice of
sparse regularizer can be important, but no rigorous guidelines have been
provided regarding which penalty might be preferable. Against this backdrop, we
winnow down the broad space of candidate sparse LDA algorithms and promote a
specific selection based on optimal scoring coupled with a particular, complementary
sparse regularizer. This overall process ultimately progresses our
understanding of sparse LDA in general, while leading to targeted modifications
of existing algorithms that produce superior results in practice on three high-dimensional
gene data sets.