BLR-D: Applying Bilinear Logistic Regression to Factored Diagnosis Problems

  • ,
  • John Dunagan ,
  • Kevin Duh ,
  • Kiran-Kumar Muniswamy-Reddy

Operating Systems Review, December 2011. Also presented at the SLAML workshop at SOSP 2011. |

Publication

In this paper, we address a pattern of diagnosis problems in which each of J entities produces the same K features, yet we are only informed of overall faults from the ensemble. Furthermore, we suspect that only certain entities and certain features are leading to the problem. The task, then, is to reliably identify which entities and which features are at fault. Such problems are particularly prevalent in the world of computer systems, in which a datacenter with hundreds of machines, each with the same performance counters, occasionally produces overall faults. In this paper, we present a means of using a constrained form of bilinear logistic regression for diagnosis in such problems. The bilinear treatment allows us to represent the scenarios with J+K instead of JK parameters, resulting in more easily interpretable results and far fewer false positives compared to treating the parameters independently. We develop statistical tests to determine which features and entities, if any, may be responsible for the labeled faults, and use false discovery rate (FDR) analysis to ensure that our values are meaningful. We show results in comparison to ordinary logistic regression (with L1 regularization) on two scenarios: a synthetic dataset based on a model of faults in a datacenter, and a real problem of finding problematic processes/features based on user-reported hangs.