Date of Award
Spring 5-2016
Degree Type
Masters Thesis
Degree Name
Master of Science (MS)
Department
Computing
Committee Chair
Zheng Wang
Committee Chair Department
Computing
Committee Member 2
Zheng Sun
Committee Member 2 Department
Computing
Committee Member 3
Nan Wang
Committee Member 3 Department
Computing
Abstract
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdA) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVM). Using the methylation states of sequentially neighboring regions as one of the learning features, SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Copyright
2016, Yiheng Wang
Recommended Citation
Wang, Yiheng, "Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks" (2016). Master's Theses. 183.
https://aquila.usm.edu/masters_theses/183