BEK: Modeling Imperative String Operations with Symbolic Transducers

MSR-TR-2010-154 |

Web applications must use special string-manipulating sanitization functions on untrusted user data, but writing these functions correctly is error prone and time consuming. We present a domain-specific imperative language, BEK, that is expressive enough to capture real web sanitizers used in the Internet Explorer XSS Filter and the Google AutoEscape framework. We exhibit a translation from the BEK language to symbolic finite state transducers (SFTs), a novel representation for transducers that annotates transitions with logical formulae. Symbolic finite state transducers give us a new way to marry the classic theory of finite state transducers with the recent progress in satisfiability modulo theories(SMT) solvers. We exhibit algorithms for checking equivalence of images of regular languages, idempotency, commutativity, and other properties of SFTs that scale quadratically in the number of states, and we show that BEK’s implementation of these algorithms scales near-linearly in practice. We then show how BEK can be applied to checking key security properties of web sanitizers, and how programs written in the BEK language can be compiled to traditional languages such as JavaScript. BEK makes it possible for web developers to write sanitizers supported by deep analysis, yet deploy the analyzed code directly to real applications.