The benefits of selecting phenotype-specific variants for applications of mixed models in genomics

  • Christoph Lippert ,
  • Gerald Quon ,
  • Eun Yong Kang ,
  • Carl Kadie ,
  • Jennifer Listgarten ,

MSR-TR-2013-138 |

Applications of linear mixed models (LMMs) to problems in genomics include phenotype prediction, correction for confounding in genome-wide association studies, estimation of narrow sense heritability, and testing sets of variants (e.g., rare variants) for association. In each of these applications, the LMM uses a genetic similarity matrix, which encodes the pairwise similarity between every two individuals in a cohort. Although ideally these similarities would be estimated using strictly variants relevant to the given phenotype, the identity of such variants is typically unknown. Consequently, relevant variants are excluded and irrelevant variants are included, both having deleterious effects. For each application of the LMM, we review known effects and describe new effects showing how variable selection can be used to mitigate them.