Nucleosome structure incorporated histone acetylation site prediction in arabidopsis thaliana
Background: Acetylation is a crucial post-translational modification for histones, and plays a key role in gene expression regulation. Due to limited data and lack of a clear acetylation consensus sequence, a few researches have focused on prediction of lysine acetylation sites. Several systematic prediction studies have been conducted for human and yeast, but less for Arabidopsis thaliana. Results: Concerning the insufficient observation on acetylation site, we analyzed contributions of the peptide-alignment-based distance definition and 3D structure factors in acetylation prediction. We found that traditional structure contributes little to acetylation site prediction. Identified acetylation sites of histones in Arabidopsis thaliana are conserved and cross predictable with that of human by peptide based methods. However, the predicted specificity is overestimated, because of the existence of non-observed acetylable site. Here, by performing a complete exploration on the factors that affect the acetylability of lysines in histones, we focused on the relative position of lysine at nucleosome level, and defined a new structure feature to promote the performance in predicting the acetylability of all the histone lysines in A. thaliana. Conclusion: We found a new spacial correlated acetylation factor, and defined a epsilon-N spacial location based feature, which contains five core spacial ellipsoid wired areas. By incorporating the new feature, the performance of predicting the acetylability of all the histone lysines in A. Thaliana was promoted, in which the previous mispredicted acetylable lysines were corrected by comparing to the peptide-based prediction.