Photo: Sylvain Crouzet
About
I am a postdoctoral researcher in the Theory Group
at Microsoft Research in Redmond, WA.
I graduated from the University of California,
Berkeley under the supervision of
Elchanan Mossel.
I work at the intersection of probability, statistics and
theoretical computer science, with an emphasis on biological applications.
More details on my research interests and
publications can be found
here.
I am on the PC of SODA 2009.
The submission deadline is July 3, 2008.
Recent Preprints (for a full list,
see here)
Shrinkage Effect in Ancestral Maximum Likelihood
Submitted, 2008.
With E. Mossel, M. Steel.
Ancestral maximum likelihood (AML) is a method that simultaneously
reconstructs a phylogenetic tree and ancestral sequences from extant
data (sequences at the leaves). The tree and ancestral sequences
maximize the probability of observing the given data under a Markov
model of sequence evolution, in which branch lengths are also optimized
but constrained to take the same value on any edge across all sequence
sites. AML differs from the more usual form of maximum likelihood (ML)
in phylogenetics because ML averages over all possible ancestral
sequences. ML has long been known to be statistically consistent --
that is, it converges on the correct tree with probability approaching
1 as the sequence length grows. However, the statistical consistency of
AML has not been formally determined, despite informal remarks in a
literature that dates back 20 years. In this short note we prove a general
result that implies that AML is statistically inconsistent. In particular
we show that AML can `shrink' short edges in a tree, resulting in a tree
that has no internal resolution as the sequence length grows. Our
results apply to any number of taxa.
Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep
Submitted, 2008.
With C. Daskalakis, E. Mossel.
We introduce a new phylogenetic reconstruction algorithm which,
unlike most previous rigorous inference techniques, does not rely
on assumptions regarding the branch lengths or the depth of the tree.
The algorithm returns a forest which is guaranteed to contain all edges
that are: 1) sufficiently long and 2) sufficiently close to the leaves.
How much of the true tree is recovered depends on the sequence length provided.
The algorithm is distance-based and runs in polynomial time.
Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci
Submitted, 2007.
With E. Mossel.
We introduce a simple algorithm for reconstructing phylogenies
from multiple gene trees in the presence of incomplete lineage
sorting, that is, when the topology of the gene trees may differ
from that of the species tree. We show that our technique is
statistically consistent under standard stochastic assumptions,
that is, it returns the correct tree given sufficiently many
unlinked loci. We also show that it can tolerate moderate estimation errors.
Network Delay Inference from Additive Metrics
Submitted, 2007.
With S. Bhamidi and R. Rajagopal.
We demonstrate the use of computational phylogenetic techniques
to solve a central problem in inferential
network monitoring. More precisely, we design a novel algorithm
for multicast-based delay inference, i.e.
the problem of reconstructing the topology and delay characteristics
of a network from end-to-end delay
measurements on network paths. Our inference algorithm is based on
additive metric techniques widely used in phylogenetics. It runs in
polynomial time and requires a sample of size only $\poly(\log n)$.
Contact
If you would like to reach me, here is my contact information.
Office: 99/2955
Phone: 425-707-0361
Fax: 425-936-7329
Sebastien.Roch[at]microsoft[dot]com
Theory Group
Microsoft Research
One Microsoft Way
Redmond, WA 98052