A Review of Feature Reduction Methods for QSAR-Based Toxicity Prediction

Document Type

Book Chapter

Publication Date

6-3-2019

School

Computing Sciences and Computer Engineering

Abstract

Thousands of molecular descriptors (1D to 4D) can be generated and used as features to model quantitative structure–activity or toxicity relationship (QSAR or QSTR) for chemical toxicity prediction. This often results in models that suffer from the “curse of dimensionality”, a problem that can occur in machine learning practice when too many features are employed to train a model. Here we discuss different methods of eliminating redundant and irrelevant features to enhance prediction performance, increase interpretability, and reduce computational complexity. Several feature selection and extraction methods are summarized along with their strengths and shortcomings. We also highlight some commonly overlooked challenges such as algorithm instability and selection bias while offering possible solutions.

Publication Title

Advances In Computational Toxicology

First Page

119

Last Page

139

Share

COinS