Document Type
Article
Publication Date
1-1-2009
Department
Biological Sciences
School
Biological, Environmental, and Earth Sciences
Abstract
Introduction
Affymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions.
Results
We propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background.
Conclusion
Using the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data.
Publication Title
BMC Genomics
Volume
10
Issue
S1
First Page
1
Last Page
13
Recommended Citation
Chen, Z.,
McGee, M.,
Liu, Q.,
Kong, M.,
Deng, Y.,
Scheuermann, R. H.
(2009). A Distribution-Free Convolution Model for Background Correction of Oligonucleotide Microarray Data. BMC Genomics, 10(S1), 1-13.
Available at: https://aquila.usm.edu/fac_pubs/8414
Comments
Creative Commons Attribution License
Publisher's Version