Modeling Imperative String Operations with Transducers

MSR-TR-2010-96 |

We present a domain-specific imperative language, Bek, that directly models low-level string manipulation code featuring boolean state, search operations, and substring substitutions. We show constructively that Bek is reversible through a semantics-preserving translation to symbolic finite state transducers, a novel representation for transducers that annotates transitions with logical formulae. Symbolic finite state transducers give us a new way to marry the classic theory of finite state transducers with the recent progress in satisfiability modulo theories (SMT) solvers. We exhibit an efficient well-founded encoding from symbolic finite state transducers into the higher-order theory of algebraic datatypes. We evaluate the practical utility of Bek as a constraint language in the domain of web application sanitization code. We demonstrate that our approach can address real-world queries regarding, for example, the idempotence and relative strictness of popular sanitization functions.