Static Analysis of String Encoders and Decoders

There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software.

One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale.

In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

VMCAI13.pdf
PDF file

In  VMCAI 2013

Publisher  Springer Verlag

Details

TypeInproceedings
Pages209-228
Volume7737
SeriesLNCS
> Publications > Static Analysis of String Encoders and Decoders