Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
BEK

BEK is a domain specific language for writing common string functions, combined with state of the art analysis. With BEK, you can answer questions like Do these two programs output the same string? Can this program ever output a target string? What happens if I compose these two programs? Does the order matter? More general than regular expressions, BEK has been specifically tailored to capture common idioms in string manipulating functions.

Bek architecture

Property checking

The following BEK program represents a basic sanitizer that escapes (unescaped) single and double quotes. It iterates over an input string t using a character variable c and a single boolean state variable b (initially false) to track whether the previous character was an unescaped slash.

program sanitize(t);
    string s;
    s := iter(c in t){b := false;}{
            case (!(b) && ((c == '\'') || (c == '\"'))) :
                b := false;
                yield ('\\');
                yield (c);
            case (c == '\\') :
                b := !(b);
                yield (c);
            case (true) :
                b := false;
                yield (c);
            };
    return s;


For example, if the input is \\" the double quote is not considered escaped, and the sanitized output is \\\". If we apply the sanitizer to \\\" again, the output is the same. A fundamental question is whether this holds for any output string. In other words, is the sanitizer idempotent?

We use automata-theoretic techniques to answer such questions. In particular, for checking idempotency, three steps are performed:

  1. The sanitizer is translated into a symbolic finite transducer A:
    that is a automaton with symbolic transitions (input guard)/[output sequence]. For example in state !b (b = false), if the character is a single or double quote (c=''')|(c='\"') the output sequence is '\' followed by c and the state does not change.
  2. A is composed with itself to form the composed transducer AA. It turns out (in this case) that the composed transducer AA looks exactly the same as A.
  3. A and AA are checked for equivalence: check if for all t, A(t) = AA(t) by using an equivalence-checking algorithm for transducers.

Code generation

Once the sanitizer has been checked for various desired properties efficient code can be automatically generated. The following functionally equivalent JavaScript function is generated for the above sanitizer:

function sanitize(t)
{
    var s =
    function ($){
    var result = new Array();
    for(i=0; i<$.length; i++){
        var c =$[i];
        if ((~(b) && ((c == String.fromCharCode(39))
            || (c == String.fromCharCode(34)))))
        {
            b := ff;
            result.push(String.fromCharCode(92));
            result.push(c);
        }
        if ((c == String.fromCharCode(92)))
        {
            b := ~(b);
            result.push(c);
        }
        if (tt)
        {
            b := ff;
            result.push(c);
        }
    };
    return result.join('');
    }(str);
    return s;
}
 

Visual validation

The ability to visualize BEK programs as transducers provides also an effective way to visually inspect and validate the intended behavior.

Consider the BEK program above but where the yields of the first case statement are swapped by mistake:

program sanitizeB(t);
string s;
s := iter(c in t){b := false;}{
case (!(b) && ((c == '\'') || (c == '\"'))) :
b := false;
yield (c);
                yield ('\\');
            case (c == '\\') :
b := !(b);
yield (c);
            case (true) :
                b := false;
                yield (c);
            };
return s;
 

In this case the corresponding transducer, say B, looks similar to A above:However, the composition BB visually shows that idempotency fails:

 

 
 
Publications