Computing Sciences and Computer Engineering
Background: A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of time-series microarray expression data in terms of the number of time-points. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by a saturation in the pair-wise mutual information (MI) metric; hence there is a theoretical limit on the inference accuracy of information theory based schemes that depends on the number of time points of micro-array data used to infer GRNs. This illustrates the fact that MI might not be the best metric to use for GRN inference algorithms. To circumvent the limitations of the MI metric, we introduce a new method of computing time lags between any pair of genes and present the pair-wise time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) metrics. Next we use these new metrics to propose novel GRN inference schemes which provides higher inference accuracy based on the precision and recall parameters.
Results: It was observed that beyond a certain number of time-points (i.e., a specific size) of micro-array data, the performance of the algorithms measured in terms of the recall-to-precision ratio saturated due to the saturation in the calculated pair-wise MI metric with increasing data size. The proposed algorithms were compared to existing approaches on four different biological networks. The resulting networks were evaluated based on the benchmark precision and recall metrics and the results favour our approach.
Conclusions: To alleviate the effects of data size on information theory based GRN inference algorithms, novel time lag based information theoretic approaches to infer gene regulatory networks have been proposed. The results show that the time lags of regulatory effects between any pair of genes play an important role in GRN inference schemes.
Perkins, E. J.,
(2010). Time Lagged Information Theoretic Approaches to the Reverse Engineering of Gene Regulatory Networks. BMC Bioinformatics, 11.
Available at: https://aquila.usm.edu/fac_pubs/931