Faculty Publications

Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data

Isaac Akogwu, University of Southern Mississippi, School of Computing
Nan Wang, University of Southern Mississippi, School of ComputingFollow
Chaoyang Zhang, University of Southern Mississippi, School of ComputingFollow
Hwanseok Choi, University of Southern MississippiFollow
Huixiao Hong, National Center for Toxicological Research
Ping Gong, Environmental LaboratoryFollow

Document Type

Conference Proceeding

Publication Date

1-17-2017

School

Computing Sciences and Computer Engineering

Abstract

© 2016 IEEE. Error correction is a critical initial step in next-generation sequencing (NGS) data analysis. Although more than 60 tools have been developed, there is no systematic evidence-based comparison with regard to their strength and weakness, especially in terms of correction accuracy. Here we report a full factorial simulation study to examine how NGS dataset characteristics (genome size, coverage depth and read length in particular) affect error correction performance (precision and F-score), as well as to compare performance sensitivity/resistance of six k-mer spectrum-based methods to variations in dataset characteristics. Multi-way ANOVA tests indicate that choice of correction method and dataset characteristics had significant effects on performance metrics. Overall, BFC, Bless, Bloocoo and Musket performed better than Lighter and Trowel on 27 synthetic datasets. For each chosen method, read length and coverage depth showed more pronounced impact on performance than genome size. This study shed insights to the performance behavior of error correction methods in response to the common variables one would encounter in real-world NGS datasets. It also warrants further studies of wet lab-generated experimental NGS data to validate findings obtained from this simulation study.

Publication Title

Proceedings - 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016

First Page

1164

Last Page

1169

Recommended Citation

Akogwu, I., Wang, N., Zhang, C., Choi, H., Hong, H., Gong, P. (2017). Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data. Proceedings - 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, 1164-1169.
Available at: https://aquila.usm.edu/fac_pubs/17930

Link to Full Text

Find in your library

COinS

Faculty Publications

Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data

Document Type

Publication Date

School

Abstract

Publication Title

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

Faculty Publications

Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data

Authors

Document Type

Publication Date

School

Abstract

Publication Title

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner