Date of Award

Summer 8-2009

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computing

Committee Chair

Dr. Chaoyang Zhang

Committee Chair Department

Computing

Committee Member 2

Dr. Ray Seyfarth

Committee Member 3

Dr. Jonathan Sun

Committee Member 4

Dr. Ping Gong

Committee Member 5

Dr. Andrew Strelzoff

Committee Member 6

Dr. Youping Deng

Abstract

The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed.

Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called "golden method" which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity.

One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network.

Share

COinS