Expression Sequence Tags Analysis, Annotation, Toxicogenomics, and Learning Approach

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Biological Sciences

First Advisor

Youping Deng

Advisor Department

Biological Sciences


Genome sequence of many organisms is still unknown. Earthworm, Eisenia fetida , commonly known as compost worm, was described by Aristotle as "the intestine of the earth". Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression in response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. We stored all the information along with Gene Ontology and Pathway information at a highly performed relational database called EST model database (ESTMD) an integrated Web-based database model. To understand molecular mechanisms of the chronic, sublethal effects of 2,4,6-trinitrotoluene (TNT), a widely used ordnance compound of public concerns, we constructed a microarray consisting of 4,032 cDNA isolated from the earthworm Eisenia fetida . Based on the reproduction response to TNT, four treatments, i.e., control, 7, 35 and 139 ppm, were selected for gene expression studies. We performed an interwoven loop designed microarray experiment. Statistical data analysis identified that the expression of 109 significant transcripts. A down-regulation of chitinase genes and evidence blood disorders, weaken immunity and decrease digestion in E. fetida has been reported. We also implemented a java application that allows easy evaluation of errors and the role of hybrid normalization methods to remove the systematic errors from the experiment's data. Another important aspect of microarray analysis is classification of data and pattern recognition. Several classifications methods have been studied for the identification of differentially expressed genes in microarray data. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. We compared the efficiency of the classification methods. We reported that the choice of feature selection and classification methods substantially influence classification success. We also developed a java GUI application, called SVM Classifier, that allows SVM users to perform SVM training, classification and prediction.