James Bornholt, Todd Mytkowicz, and Kathryn S. McKinley
Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain<T>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain<T> improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain<T> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain<T> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain<T> a compelling programming model for modern applications facing the challenge of uncertainty.
|Publisher||Architectural Support for Programming Languages and Operating Systems (ASPLOS)|
2014 ACM 978-1-4503-2305-5/14/03
James Bornholt, Todd Mytkowicz, and Kathryn S. McKinley. Uncertain<T>: A First-Order Type for Uncertain Data, Microsoft Technical Report, April 2013.