James Bornholt, Todd Mytkowicz, and Kathryn S. McKinley
Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain<T>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain<T> improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain<T> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain<T> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain<T> a compelling programming model for modern applications facing the challenge of uncertainty.
|Published in||ACM International Conference on Architectural Support for Programming Languages and Systems (ASPLOS), pp. 51-66, Selected for <b>ACM SIGPLAN Research Highlight 2014</b>, and <b>IEEE MICRO Top Picks 2015</b>|
|Publisher||ACM – Association for Computing Machinery|
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.
James Bornholt, Todd Mytkowicz, and Kathryn S. McKinley. Uncertain<T>: A First-Order Type for Uncertain Data, Microsoft Technical Report, April 2013.