主 题: Statistical methods for inferring gene regulatory modules and networks
报告人: 懈军 教授 (普度大学统计系)
时 间: 2007-06-04 下午 4:00
地 点: 理科一号楼 1303
This talk is about probability and statistical methods for analysis
of genomic data. Our focus is on a specific problem of inferring gene
regulatory module, which is defined as a set of coexpressed genes that
are regulated by a common set of transcription factors (proteins).
We propose a series of statistical methods that combine information from
multiple types of genomic data, including DNA sequences, genome-wide
location analysis (ChIP-chip experiments), and mRNA gene expression
microarray. More specifically, we have developed a hidden Markov model,
which models combinations of transcription factor binding sites in DNA
sequences (strings of nucleotides A, C, G, T). The predictions are
refined by regression analysis on mRNA gene expression microarray data
and/or ChIP-chip binding data. In regression analysis, we formulate a
variable selection problem and show that all available methods, including
standard stepwise selection and LASSO/LARS, fail to select the right set
of covariates, due to complicated interdependence among genes. This
biological application posts a challenge in probability and statistics.
In addition to our attempt, other new methodologies will be of great
interest.