Decision Procedures for Composition and Equivalence of Symbolic Finite State Transducers

MSR-TR-2011-32 |

Finite automata model a wide array of applications in software engineering, from regular expressions to specification languages. Finite transducers are an extension of finite automata to model functions on lists of elements, which in turn have uses in fields as diverse as computational linguistics and model-based testing. Symbolic finite transducers are a further generalization of finite transducers where transitions are labeled with formulas in a given background theory. Compared to classical finite transducers, symbolic transducers are far more succinct in the case of finite alphabets, because they have no need to enumerate all cases of a transition; symbolic transducers can also use theories, such as the theory of linear arithmetic over integers or reals, with infinite alphabets.
Given a decision procedure for the background theory, we show novel algorithms for composition and equivalence checking for a large class of symbolic finite transducers, namely the class of single-valued transducers. Our algorithms give rise to a complete decidable algebra of symbolic transducers. Unlike previous work, we do not need any syntactic restriction of the formulas on the transitions, only a decision procedure; in practice we leverage recent advances in satisfiability modulo theory (SMT) solvers. We show how to decide single-valuedness, which means that symbolic finite transducers arising from practice can be checked to see if our algorithms apply. Our base algorithms are unusual in that they are nonconstructive, so we exhibit a separate model generation algorithm that can quickly find counterexamples in the case two symbolic finite transducers are not equivalent.
As a key example, string manipulation is a particularly important application of our theoretical results with immediate benefits to bug finding. Our work makes symbolic finite transducers a practical approach for software engineering applications, such as the analysis of security-critical sanitization functions in web pages and model based testing.