BEK is a domain specific language for writing common string functions, combined with state of the art analysis. With BEK, you can answer questions like Do these two programs output the same string? Can this program ever output a target string? What happens if I compose these two programs? Does the order matter? More general than regular expressions, BEK has been specifically tailored to capture common idioms in string manipulating functions.

## Bek architecture

## Try out BEK online!:

## Property checking

The following BEK program represents a basic sanitizer that escapes (unescaped) single and double quotes. It iterates over an input string *t* using a character variable *c* and a single boolean state variable *b* (initially *false*) to track whether the previous character was an unescaped slash.

programsanitize(t);

string s;

s :=iter(cint){b :=false;}{

case(!(b) && ((c == '\'') || (c == '\"'))) :

b :=false;

yield('\\');

yield(c);

case(c == '\\') :

b := !(b);

yield(c);

case(true) :

b :=false;

yield(c);

};

returns;

For example, if the input is \\" the double quote is not considered escaped, and the sanitized output is \\\". If we apply the sanitizer to \\\" again, the output is the same. A fundamental question is whether this holds for *any* output string. In other words, is the sanitizer idempotent?

We use **automata-theoretic techniques** to answer such questions. In particular, for checking idempotency, three steps are performed:

- The sanitizer is translated into a
*symbolic finite transducer**A*:

that is a automaton with symbolic transitions (input guard)/[output sequence]. For example in state !*b*(*b*=*false*), if the character is a single or double quote (*c*=''')|(*c*='\"') the output sequence is '\' followed by*c*and the state does not change. *A*is composed with itself to form the composed transducer*A*•*A*. It turns out (in this case) that the composed transducer*A*•*A*looks exactly the same as*A*.*A*and*A*•*A*are checked for: check if for all t,*equivalence**A*(t) =*A*•*A*(t) by using an equivalence-checking algorithm for transducers.

## Code generation

Once the sanitizer has been checked for various desired properties efficient code can be automatically generated. The following * functionally equivalent* JavaScript function is generated for the above sanitizer:

functionsanitize(t)

{

vars =

function($){

varresult =newArray();

for(i=0; i<$.length; i++){

varc =$[i];

if((~(b) && ((c == String.fromCharCode(39))

|| (c == String.fromCharCode(34)))))

{

b :=ff;

result.push(String.fromCharCode(92));

result.push(c);

}

if((c == String.fromCharCode(92)))

{

b := ~(b);

result.push(c);

}

if(tt)

{

b :=ff;

result.push(c);

}

};

returnresult.join('');

}(str);

returns;

}

## Visual validation

The ability to visualize BEK programs as transducers provides also an effective way to visually inspect and validate the intended behavior.

Consider the BEK program above but where the yields of the first case statement are swapped by mistake:

programsanitizeB(t);

string s;

s :=iter(cint){b :=false;}{

case(!(b) && ((c == '\'') || (c == '\"'))) :

b :=false;

yield(c);yield('\\');case(c == '\\') :

b := !(b);

yield(c);case(true) :b :=false;yield(c);};

returns;

In this case the corresponding transducer, say *B*, looks similar to *A* above:However, the composition *B*•*B* visually shows that idempotency fails:

- Margus Veanes, Todd Mytkowicz, David Molnar, and Benjamin Livshits, Data-Parallel String-Manipulating Programs, in
*POPL 2015: 42nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages*, ACM – Association for Computing Machinery, January 2015 - Loris D'Antoni and Margus Veanes, Minimization of Symbolic Automata, in
*POPL'14*, ACM, January 2014 - Loris D'Antoni and Margus Veanes, Equivalence of Extended Symbolic Finite Transducers, in
*25th International Conference on Computer Aided Verification (CAV'13)*, Springer, July 2013 - Margus Veanes, Applications of Symbolic Finite Automata, in
*CIAA'13*, Springer, July 2013 - Loris D'Antoni and Margus Veanes, Static Analysis of String Encoders and Decoders, in
*VMCAI 2013*, Springer Verlag, 2013 - Loris D'Antoni and Margus Veanes, Equivalence of Extended Symbolic Finite Transducers, no. MSR-TR-2013-4, January 2013
- Margus Veanes and Nikolaj Bjorner, Symbolic Automata: The Toolkit, in
*TACAS'12*, Springer Verlag, March 2012 - Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjorner, Symbolic Finite State Transducers: Algorithms and Applications, in
*POPL’12*, ACM SIGPLAN, January 2012 - Margus Veanes, David Molnar, Benjamin Livshits, and Lubomir Litchev, Generating Fast String Manipulating Code Through Transducer Exploration and SIMD Integration, no. MSR-TR-2011-124, November 2011
- Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes, Fast and Precise Sanitizer Analysis with BEK, in
*USENIX Security'11*, August 2011