Machine Learning Methods for Discovery of Regulatory Elements in Bacteria

I will present novel machine learning methods for the discovery of important DNA sequence elements encoded in bacterial genomes. Knowledge of these elements provides insight into important problems in computational biology such as uncovering gene functions, gene-regulatory networks, and evolutionary relationships among genes and organisms. This talk will focus on my contributions related to the design and learning of graphical probability models of these elements. In particular, I will present methods for (i) refining the structure of stochastic context-free grammars, (ii) training sequence models with “weakly” labeled data, (iii) designing models that incorporate multiple and diverse evidence sources and (iv) modeling and predicting arbitrarily overlapping elements in sequence data. The results of cross-validation experiments on the heavily studied bacterium E. coli show that the accuracy of our predictions exceeds the previous state-of-the-art.

Speaker Details

Joseph Bockhorst is a graduate student in the Department of Computer Sciences at the University of Wisconsin-Madison, from where he received his MS in 2000 and his PhD is expected in 2005. His graduate school research focuses on machine learning and bioinformatics and has been funded partly by an informatics training award from the US National Library of Medicine. Prior to graduate school, he worked as a software engineer at Microsoft Corporation in Redmond, WA from 1997-1998, and he earned his BS in electrical engineering from Wisconsin in 1996.

Date:
Speakers:
Joseph Bockhorst
Affiliation:
University of Wisconsin-Madison
    • Portrait of Jeff Running

      Jeff Running