Date of Award

Spring 5-1-2015

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computing

School

Computing Sciences and Computer Engineering

Committee Chair

Nan Wang

Committee Chair Department

Computing

Committee Member 2

Chaoyang Zhang

Committee Member 2 Department

Computing

Committee Member 3

Chenhua Zhang

Committee Member 3 Department

Computing

Committee Member 4

Zheng Wang

Committee Member 4 Department

Computing

Committee Member 5

Ping Gong

Committee Member 5 Department

Computing

Abstract

In general, DNA reconstruction is deemed as the key of molecular biology since it makes people realize how genotype affects phenotypes. The DNA sequencing technology emerged exactly towards this and has greatly promoted molecular biology’s development. The traditional method, "Sanger," is effective but extremely expensive on a cost-per-base basis. This shortcoming of Sanger method leads to the rapid development of next-generation sequencing technologies. The NGS technologies are widely used by virtue of their low-cost, high-throughput, and fast nature. However, they still face major drawbacks such as huge amounts of data as well as relatively short read length compared with traditional methods. The scope of the research mainly focuses upon a quick preliminary analysis of NGS data, identification of genome-wide structural variations (SVs), and microRNA prediction. In terms of preliminary NGS data analysis, the author developed a toolkit named "SeqAssist" to evaluate genomic library coverage and estimate the redundancy between different sequencing runs. Regarding the genome-wide SV detection, a one-stop pipeline was proposed to identify SVs, which integrates the components of preprocessing, alignment, SV detection, breakpoints revision, and annotation. This pipeline not only detects SVs at the individual sample level, but also identifies consensus SVs at the population and cross-population levels. At last, miRDisc, a pipeline for microRNA discovery, was developed for the identification of three categories of miRNAs, i.e., known, conserved, and novel microRNAs.

Share

COinS