Analyzing
Gene Function and Expression Simultaneously
Doug
Brutlag, Biochemistry
Daphne
Koller, Computer Science
Many
cellular functions are carried out by proteins
and the interactions between them. The large
scale genomic data sets being produced over
the last few years provide us an opportunity
to obtain a genome-wide view of cellular activity.
The focus of this project is to apply statistical
machine learning methods to these large but
noisy data sets, in order to analyze protein
function and interactions.
In particular, the
focus of this project has been twofold. On
the one hand, it tries to use motifs – fine-grained
functional elements of a protein sequence – in
order to predict the protein structure, function
and its interaction with other proteins. On
the other hand, it aims to integrate heterogeneous
data sources – such as protein sequence
characteristics (including motifs), protein
fold, mRNA expression levels, and protein-protein
interaction data – to obtain more robust
predictions and a more global understanding
of protein activity.
|