Predicting Lipophilicity of Drug Discovery Molecules using Gaussian Process Models

  • Timon Schroeter ,
  • ,
  • Sebastian Mika ,
  • Antonius ter Laak ,
  • Detlev Sülzle ,
  • Ursula Ganzer ,
  • Nikolaus Heinrich ,
  • Klaus-Robert Müller

ChemMedChem | , Vol 2: pp. 1265-1267

Many drug failures are due to an unfavorable ADMET profile (Absorption, Distribution, Metabolism, Excretion & Toxicity). Lipophilicity is intimately connected with ADMET and in today’s drug discovery process, the octanol water partition coefficient log P and it’s pH dependant counterpart log D have to be taken into account early on in lead discovery. Commercial tools available for ’in silico’ prediction of ADMET or lipophilicity parameters usually have been trained on relatively small and mostly neutral molecules, therefore their accuracy on industrial in-house data leaves room for considerable improvement (see Bruneau et al. and references therein).[1] Using modern kernel-based machine learning algorithms – so called Gaussian Processes2 (GP)– this study constructs different log P and log D7 models that exhibit excellent predictions which compare favorably to state-of-the-art tools on both benchmark and in-house data sets.