A Review of Feature Reduction Methods for QSAR-Based Toxicity Prediction
Computing Sciences and Computer Engineering
Thousands of molecular descriptors (1D to 4D) can be generated and used as features to model quantitative structure–activity or toxicity relationship (QSAR or QSTR) for chemical toxicity prediction. This often results in models that suffer from the “curse of dimensionality”, a problem that can occur in machine learning practice when too many features are employed to train a model. Here we discuss different methods of eliminating redundant and irrelevant features to enhance prediction performance, increase interpretability, and reduce computational complexity. Several feature selection and extraction methods are summarized along with their strengths and shortcomings. We also highlight some commonly overlooked challenges such as algorithm instability and selection bias while offering possible solutions.
Challenges and Advances in Computational Chemistry and Physics
(2019). A Review of Feature Reduction Methods for QSAR-Based Toxicity Prediction. Challenges and Advances in Computational Chemistry and Physics, 30, 119-139.
Available at: https://aquila.usm.edu/fac_pubs/16447