Video-based paired-comparison ranking: a validation tool for fine-grained measurements of motor dysfunction in multiple sclerosis

  • J. Burggraaff ,
  • J. Dorn ,
  • M. D'Souza ,
  • C. P. Kamm ,
  • P. Tewarie ,
  • P. Kontschieder ,
  • ,
  • A. Criminisi ,
  • F. Dahlke ,
  • L. Kappos ,
  • B. M. J. Uitdehaag ,
  • Antonio Criminisi ,

Multiple Sclerosis Journal | , Vol 21: pp. 127-128

Background: Measurement instruments used in the neurological examination, including the Expanded Disability Status Scale (EDSS), cannot report small changes in patient performance that may be needed to evaluate the clinical course of neurological diseases and the efficacy of treatment. For a more reliable and sensitive assessment of disability in multiple sclerosis (MS) we are developing the ASSESS-MS system, which relies on the automated image analysis algorithm of defined recorded movements. To validate a more sensitive tool, a method to capture clinical judgement in a more discriminatory manner than ordinal scales is required. Therefore, we investigate the ability of experienced neurologists to evaluate motor dysfunction in MS patients via video-based paired comparisons.

Objectives: To assess whether video-based paired comparisons of patient movements allow a reliable and finer-grained capture of clinical judgment of motor dysfunction in MS, than commonly used measurement instruments.

Methods: Thirteen experienced neurologists were presented with pairs of video recordings of MS patients (EDSS 1-7) performing standardized movements like the finger nose test (FNT) and movements based on activities of daily living (ADL), such as drinking from a cup. The clinicians were asked to compare patients’ performance, evaluating as worse, equal or ‘judgement not possible, because of confounding factors’. The pairwise comparisons were evaluated and mapped onto a continuous scale using the Bradley-Terry-Luce (BTL) model, a commonly applied model in the analysis of paired comparison data. The results were analysed for intra- an interrater reliability and were correlated with the matching EDSS subscores.

Results: The paired gradings were significantly correlated with the matching subscores (e.g. Kendall’s tau for FNT was 0.5 with p< 0.01) and allowed to discriminate between patients, even in those with the same EDSS subscore. Internal reliability, which characterizes rater consistency, was above 90%, and short-term test-retest reliability was ~90%. Further results will be presented.

Conclusions: Paired comparisons of video-captured defined movements of MS patients appear to reliably capture neurological judgment of motor dysfunction. By providing finer-grained differentiation, gradings based on paired comparisons may serve as an improved external validation in the development of more sensitive (automated) outcome measures.