Date of Award
Spring 5-2015
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computing
School
Computing Sciences and Computer Engineering
Committee Chair
Nan Wang
Committee Chair Department
Computing
Committee Member 2
Chaoyang Zhang
Committee Member 2 Department
Computing
Committee Member 3
Chenhua Zhang
Committee Member 3 Department
Computing
Committee Member 4
Zheng Wang
Committee Member 4 Department
Computing
Committee Member 5
Ping Gong
Committee Member 5 Department
Computing
Abstract
In general, DNA reconstruction is deemed as the key of molecular biology since it makes people realize how genotype affects phenotypes. The DNA sequencing technology emerged exactly towards this and has greatly promoted molecular biology’s development. The traditional method, "Sanger," is effective but extremely expensive on a cost-per-base basis. This shortcoming of Sanger method leads to the rapid development of next-generation sequencing technologies. The NGS technologies are widely used by virtue of their low-cost, high-throughput, and fast nature. However, they still face major drawbacks such as huge amounts of data as well as relatively short read length compared with traditional methods. The scope of the research mainly focuses upon a quick preliminary analysis of NGS data, identification of genome-wide structural variations (SVs), and microRNA prediction. In terms of preliminary NGS data analysis, the author developed a toolkit named "SeqAssist" to evaluate genomic library coverage and estimate the redundancy between different sequencing runs. Regarding the genome-wide SV detection, a one-stop pipeline was proposed to identify SVs, which integrates the components of preprocessing, alignment, SV detection, breakpoints revision, and annotation. This pipeline not only detects SVs at the individual sample level, but also identifies consensus SVs at the population and cross-population levels. At last, miRDisc, a pipeline for microRNA discovery, was developed for the identification of three categories of miRNAs, i.e., known, conserved, and novel microRNAs.
Copyright
2015, Yan Peng
Recommended Citation
Peng, Yan, "Novel Bioinformatic Approaches for Analyzing Next-Generation Sequencing Data" (2015). Dissertations. 88.
https://aquila.usm.edu/dissertations/88