1st Microsoft Research and IMDEA Software Institute Collaboration Workshop

2–4 April 2014 | Madrid, Spain

- Home
- Abstracts

## Keynote Abstracts

#### Finding Malware on a Web Scale

#### Ben Livshits - Microsoft Research

Over the last several years, JavaScript malware has emerged as one of the most popular ways to deliver drive-by attacks to unsuspecting users through the browser. This talk covers recent Microsoft Research experiences with finding malware on the web. It highlights two tools: Nozzle and Zozzle. Nozzle is a runtime malware detector that focuses on finding heap spraying attacks. Zozzle is a mostly static detector that finds heap sprays and other types of JavaScript malware. Both are extremely precise: Nozzle false positive rate is close to one in a billion; Zozzle's is about one in a million.

Both are deployed by Bing and are used daily to find thousands of malicious web sites. This effort constitutes one of the largest deployment of static and runtime analysis techniques worldwide. This talk will focus on interesting interplay between static and runtime analysis and cover what it takes to migrate research ideas into real-world products.

#### A Timely Dataflow Model and the Naiad System

#### Martin Abadi - Microsoft Research

Many important data processing applications require high-throughput data ingestion, iterative computation, and low-latency responsiveness. Accordingly, a variety of recent systems aim to address these needs, relying on diverse models of computation.

Naiad is a distributed platform for the execution of data-parallel, cyclic dataflow programs. Naiad combines the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. It subsumes simpler models that support either batch processing or stream processing by enriching dataflow computation with pointstamps, which represent logical points in a computation and help track its progress.

This talk is an introduction to Naiad and to timely dataflow. It emphasizes basic concepts in the computational model and a key element in its realization, namely a distributed algorithm for tracking progress. It is a report on work in progress, joint with Derek Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Tom Rodeheffer at Microsoft Research.

#### Reasoning about Eventual Consistency and Replicated Data Types

#### Alexey Gotsman - IMDEA Software Institute

Modern databases underlying large-scale Internet services guarantee immediate availability and tolerate network partitions at the expense of providing only weak forms of consistency, commonly dubbed eventual consistency. Even though the folklore notion of eventual consistency is very weak, in reality, the semantics and programming interfaces of systems implementing it can be very subtle. In particular, such systems can resolve conflicting updates using complex protocols, called replicated data types (aka CRDTs), and can allow varying kinds of anomalous behaviour.

So far the semantics provided by eventually consistent systems have been poorly understood. I will present a novel framework for their formal specification and discuss how recent nontrivial replicated data type implementations fit into it. I will then illustrate the utility of the framework by presenting techniques for proving the correctness of replicated data type implementations.

This is joint work with Sebastian Burckhardt (Microsoft Research), Hongseok Yang (University of Oxford) and Marek Zawirski (INRIA & UPMC-LIP6).

## Security Session - Wednesday

#### Cost-effective protection against side-channel attacks

#### Boris Koepf, IMDEA Software Institute

Side-channel attacks break the security of systems by exploiting signals that are unwittingly emitted by their implementation. Unfortunately, extremely defensive implementations that completely avoid side-channel leakage are often not practical because they come with dramatic penalties in terms of power, memory, or time consumption.

In this talk, I will outline how static analysis and game theory can be leveraged for identifying cost-effective countermeasures against side-channel attacks, that is, countermeasures that offer a sufficient degree of protection and high performance at the same time.

#### Delegation of Computation on Outsourced Data

#### Dario Fiore, IMDEA Software Institute

We consider the problem in which a client stores a large amount of data with an untrusted server in such a way that, at any moment, the client can ask the server to compute a function on some portion of its outsourced data. In this scenario, the client must be able to efficiently verify the correctness of the result despite no longer knowing the inputs of the delegated computation, it must be able to keep adding elements to its remote storage, and it does not have to fix in advance (i.e., at data outsourcing time) the functions that it will delegate. Even more ambitiously, clients should be able to verify in time independent of the input-size -- a very appealing property for computations over huge amounts of data.

In this talk I will present novel cryptographic techniques that solve the above problem for the class of computations which can be expressed by arithmetic circuits of bounded degree. In particular I will discuss an efficient solution for the case of quadratic polynomials over a large number of variables. This class covers a wide range of significant arithmetic computations -- notably, many important statistics. To confirm the efficiency of our solution, we show encouraging performance results, e.g., correctness proofs have size below 1 kB and are verifiable by clients in less than 10 milliseconds. I will conclude the talk by discussing open challenges and ongoing research in this area.

#### Automating Proofs of Relational Properties of Probabilistic Programs

#### Andrey Rybalchenko, Microsoft Research

Some security properties go beyond what is expressible in terms of an individual execution of a single program. In particular, many security policies in cryptography can be naturally phrased as relational properties of two open probabilistic programs. Writing and verifying proofs of such properties is an error-prone task that calls for automation and tool support. One of the main difficulties in automating these proofs lies in finding adequate relational invariants for loops and specifications for program holes. In this paper we show how to automate proofs of relational properties of open probabilistic programs by adapting techniques for the automatic generation of universally quantified invariants in a non-relational setting.

Joint work with Klaus v. Gleissenthall and Santiago Zanella-Beguelin

#### Computer-Aided Cryptographic Design and Analysis

#### Benedikt Schmidt, IMDEA Software Institute

Computer-aided verification provides effective means of analyzing the security of cryptographic primitives. However, it has remained a challenge to achieve fully automated analyses yielding guarantees that hold against computational (rather than symbolic) attacks. This paper meets this challenge for public-key encryption schemes built from trapdoor permutations and hash functions. Using a novel combination of techniques from computational and symbolic cryptography, we present proof systems for analyzing the chosen-plaintext and chosen-ciphertext security of such schemes in the random oracle model. Building on these proof systems, we develop a toolset that bundles together fully automated proof and attack finding algorithms. We use this toolset to build a comprehensive database of encryption schemes that records attacks against insecure schemes, and proofs with concrete bounds for secure ones. We will also report on recent results about the fully automated analysis of cryptographic assumptions in generic group models.

#### CyberProbe: Towards Internet-Scale Active Detection of Malicious Servers

#### Juan Caballero, IMDEA Software Institute

Cybercriminals use different types of geographically distributed servers to run their operations such as C&C servers for managing their malware, exploit servers to distribute the malware, payment servers for monetization, and redirectors for anonymity. In this paper, we propose a novel active probing approach for detecting malicious servers and compromised hosts that listen for (and react to) incoming network requests. Our approach sends probes to remote hosts and examines their responses, determining whether the remote hosts are malicious or not. It identifies different malicious server types as well as malware that listens for incoming traffic such as P2P bots. Compared with existing defenses, our active probing approach is fast, cheap, easy to deploy, and achieves Internet scale. We have implemented our active probing approach in a tool called CyberProbe. We have used CyberProbe for Internet-scale probing, identifying 151 malicious servers (75% of them unknown to publicly available databases of malicious servers) and 7,881 P2P bots. Our results reveal an important provider locality property: operations hosts an average of 3.2 servers on the same hosting provider to amortize the cost of setting up a relationship with the provider.

Joint work with Antonio Nappa, Zhaoyan Xu, M. Zubair Rafique, and Guofei Gu

## Languages Session - Thursday

#### Communicating State Transition Systems for Fine-Grained Concurrent Resources

#### Ilya Sergey – IMDEA Software Institute

This is a join work with Aleksandar Nanevski, Ruy Ley-Wild, Ilya Sergey and Germán Andrés Delbianco In this talk, we will present a novel model of concurrent computations with shared memory and provide a simple, yet powerful, logical framework for uniform Hoare-style reasoning about partial correctness of coarse- and fine-grained concurrent programs. The key idea is to specify arbitrary resource protocols as communicating state transition systems (STS) that describe valid states of a resource and the transitions the resource is allowed to make, including transfer of heap ownership. We demonstrate how reasoning in terms of communicating STS makes it easy to crystallize behavioral invariants of a resource. We also describe entanglement operators to build large systems from an arbitrary number of STS components, by interconnecting their lines of communication. Furthermore, we show how the classical rules from the Concurrent Separation Logic (CSL), such as scoped resource allocation, can be generalized to fine-grained resource management. This allows us to give specifications as powerful as Rely-Guarantee, in a concise, scoped way, and yet regain the compositionality of CSL-style resource management. We proved the soundness of our logic with respect to the denotational semantics of action trees (variation on Brookes’ action traces). We formalized the logic as a shallow embedding in Coq and implemented a number of examples, including a construction of coarse-grained CSL resources as a modular composition of various logical and semantic components.

#### Cipherbase – Secure query processing on encrypted databases

#### Kapil Vaswani – Microsoft Research

Abstract: Data confidentiality is a key challenge for applications hosted on public cloud platforms, where applications are open to attacks from rogue cloud administrators and co-located applications. We consider a hybrid architecture for ensuring data confidentiality based on the notion of a trusted client. In this architecture, data is stored on the untrusted server, sensitive columns are encrypted (using partially homomorphic encryption schemes), and the encryption keys are stored with a trusted client. We describe a secure query compiler for SQL, which automatically migrates existing database applications to this architecture while ensuring confidentiality. Experiments with industry standard transactional benchmarks suggest that real-world applications can be migrated to this architecture with reasonable performance overheads and no developer effort. We also describe an architecture that uses tamper proof hardware (e.g. FPGAs) to serve as a trusted client and improve performance even further.

#### Programming Language Techniques for Low-Level Cryptographic Security

#### François Dupressoir – IMDEA Software Institute

Recent events have made clear the need for strong security guarantees on cryptographic software, with prominent researchers calling for the formalization of both the mathematics and the software itself. In this talk, I will present two recent applications of EasyCrypt, an SMT-based tool for formally proving relational properties of probabilistic programs, to obtaining such strong security guarantees on cryptographic implementations, relying only on standard cryptographic assumptions. The first application illustrates the power and flexibility of EasyCrypt's new module system by abstractly formalizing a security proof for a Secure Function Evaluation (SFE) protocol assuming secure Oblivious Transfer (OT) and Garbling Schemes (GS), and then applying it to concrete primitives to obtain a security result on an SFE protocol. From the provably secure specification, through notions of simulation, we then extract a reasonably efficient ML implementation. 1 The second application shows how the EasyCrypt tool can be combined with a certified compiler (CompCert in our case) to obtain security guarantees at the level of ASM code. We extend CompCert with an ad-hoc treatment of separate compilation. We consider very strong adversaries, proving that any such adversary that breaks the ASM implementation can only do so by breaking the underlying cryptographic primitive.

#### Gradual Typing Embedded Securely in JavaScript

#### Pierre-Yves Strub, IMDEA Software Institute

JavaScript's flexible semantics makes writing correct code hard and writing secure code extremely difficult. To address the former problem, various forms of gradual typing have been proposed, such as Closure and TypeScript. However, supporting all common programming idioms is not easy; for example, TypeScript deliberately gives up type soundness for programming convenience. In this talk, I will present TS*, a gradual type system that provides important safety and security guarantees.

In contrast to prior gradual type systems, TS* features full runtime reflection over three kinds of types: (1) simple types for higher-order functions, recursive datatypes and dictionary-based extensible records; (2) the type [any], for dynamically type-safe TS* expressions; and (3) the type [un], for untrusted, potentially malicious JavaScript contexts in which TS* is embedded. After type-checking, the compiler instruments the program with various checks to ensure the type safety of TS* despite its interactions with arbitrary JavaScript contexts, which are free to use `eval', stack walks, prototype customizations, and other offensive features.

Our examples illustrate how web security patterns that developers currently program in JavaScript (with much difficulty and still with dubious results) can instead be programmed naturally in TS*, retaining a flavor of idiomatic JavaScript, while providing strong safety guarantees by virtue of typing.

This is a joint work with N. Swamy, C. Fournet, A. Rastogi, K. Bhargavan, J. Chen & G. Bierman.

#### Generating provably correct x86 using a proof assistant

#### Andrew Kennedy – Microsoft Research

We have built a model of sequential x86 execution in the Coq proof assistant, including instruction decoding. On top of this model we have built an assembler, a novel separation logic used to give specifications to machine code programs, and tactics to help prove that programs meet their specifications. We use notation to support conventional assembly code syntax inside Coq, including lexically-scoped labels and user-defined macros. Assembly code can be assembled within Coq, producing a sequence of hex bytes, or a full Windows executable. We intend to use the framework as the basis for constructing provably correct systems code and compilers. 2 This is joint work with Nick Benton.

## Verification Session

#### Solving Horn Clauses with Cardinality Constraints

#### Andrey Rybalchenco, Microsoft Research

Verification of quantitative program properties, such as bounds on resource usage or information leakage, can often be formalized using cardinality based arguments. In this paper we propose an extension of Horn constraints with cardinality operators, together with a corresponding solving method that can be used for the automatic verification of such properties. Our method relies on Barvinok's theory to find solutions to recursion-free Horn constraints which are then combined to solutions to recursive Horn constraints in a counterexample guided abstraction refinement loop. Finally, we present an experimental evaluation of our method.

Joint work with Klaus v. Gleissenthall and Boris Koepf

#### Verification of Observational Refinement by Counting

#### Michael Emmi, IMDEA Software Institute

Observational refinement between software library implementations is key to modular reasoning: knowing that any program behavior using some sophisticated library implementation L is also possible using a simplistic implementation S, allows replacing S for L, thus simplifying reasoning about any module using L. High-performance data structure implementations and their abstract specifications are canonical examples where suchrefinement is beneficial.

While automating refinement checking between deterministic sequential implementations is relatively straightforward, by ensuring each sequence of operations executed by L is also executable by S, checking refinement between a concurrent implementation L and its (sequential, usually) specification S presents a severe complication: to which linear order σ should concurrently-executed operations of L be resolved, in order to check whether σ is also executable by S? Existing automated approaches either require manual effort in fixing the linearization point of each operation, thus uniquely determining σ, or suffer from exponential explosion in considering all possible linearizations σ.In this work we develop a fundamentally different approach to automated refinementchecking, called operation counting. Rather than reasoning over sequences of executed operations, we reason over counts of executed and executing library operations; we say implementation L refines S when any valuation to the counters which is reachable with L is also reachable with S. We demonstrate that this notion of refinement is not only equivalent toobservational refinement, but leads to effective means for automatic checking.

#### Algorithmic Verification of Stability of Hybrid Systems

#### Pavithra Prabhakar, IMDEA Software Institute

Hybrid systems refer to systems exhibiting mixed discrete-continuous behaviors and arise as a natural byproduct of the interaction of a network of embedded processors with physical systems. Hybrid systems manifest in safety critical application domains including aeronautics, automotive, robotics and power systems. Reliability is of utmost importance and a grand challenge in the area is the development of techniques and tools to aid the development of high-confidence hybrid systems.

In this talk, we focus on the verification of stability of hybrid systems. Stability is a fundamental property in control system design and captures the notion that small perturbations to the initial state or input to the system result in only small variations in the eventual behavior of the system. We present foundations and concrete techniques for abstraction based stability analysis. In contrast to the well-known methods for automated verification of stability based on Lyapunov functions, which are deductive, we present algorithmic techniques for stability analysis.

#### Energy and Resource Usage Analysis and Verification in the CiaoPP System

#### Pedro Lopez-Garcia, IMDEA Software Institute

We present a general resource usage analysis framework which is parametric with respect to resources and type of approximation (lower-and upper-bounds) that infers bounds on the resource usage of all the procedures in a program, providing such bounds as functions of input data sizes. We also present instantiations of this general analysis framework for inferring energy consumption at different program representation layers: assembly, internal compiler representations and source code. Such instantiations use an energy model of the XS1-L instruction set architecture, developed together with XMOS Ltd. Based on the general analysis, we also present a framework for (static) verification of general resource usage program properties. The outcome of the static checking of assertions can express intervals for the input data sizes such that a given specification can be proved for some intervals but disproved for others.

Joint work with: K. Eder, K. Georgiou, N. Grech, M.V. Hermenegildo, S. Kerrison, U. Liqat and A. Serrano.

#### Verification Modulo Versions

#### Francesco Logozzo, Microsoft Research

I will introduce Verification Modulo Versions (VMV), a new automatic verification technique that dramatically reduces the number of alarms while providing sound semantic guarantees. VMV extracts the environmental conditions from a previous version of the program and it injects them in the version under analysis. VMV can be instantiated to automatically find regressions or to provide a relative proof of correctness. We implemented VMV on the top of the abstract-interpretation based tool cccheck, and applied it to huge code bases, showing how it is effective in reducing the number of alarms by more than 72%.

Joint work with Shuvendu Lahiri, Manuel Fahndrich, and Sam Blackshear. To appear at PLDI’14

#### Groups

#### Organization

**Steering Committee**

- Gilles Barthe, IMDEA Software Institute
- Judith BIshop, Microsoft Research
- Georges Gonthier, Microsoft Research
- Manuel Hermenegildo, IMDEA Software Institute

**Convenors**

- Languages: Pierre-Yves Strub and Georges Gonthier
- Security: Juan Caballero and Ben Livshits
- Verification: Alexey Gotsman and Francesco Logozzo

#### Logistics

Paola Huerta and Andrea Iannetta, IMDEA Software Institute

Contact: msr-imdeasw-opening@software.imdea.org

**Registration**

The Workshop is by invitation only. The registration link will be sent to all invited participants.

**Hotels**

These hotels are recommended. Make your own reservations. We shall try to provide transport:

- Hotel Meninas euro 85
- Hotel Opera euro 95
- Mercure Madrid Santo Domingo euro 92

**Venue**

Instituto IMDEA Software

Campus Montegancedo s/n

28223 Pozuelo de Alarcon, Madrid

SPAIN

Telephone: +34-91-101-2202