The Critical Feature Dimension and Critical Sampling Problems
Computing Sciences and Computer Engineering
Efficacious data mining methods are critical for knowledge discovery in various applications in the era of big data. Two issues of immediate concern in big data analytic tasks are how to select a critical subset of features and how to select a critical subset of data points for sampling. This position paper presents ongoing research by the authors that suggests: 1. the critical feature dimension problem is theoretically intractable, but simple heuristic methods may well be sufficient for practical purposes; 2. there are big data analytic problems where the success of data mining depends more on the critical feature dimension than the specific features selected, thus a random selection of the features based on the dataset's critical feature dimension will prove sufficient; and 3. The problem of critical sampling has the same intractable complexity as critical feature dimension, but again simple heuristic methods may well be practicable in most applications.
ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and Methods, Proceedings
(2015). The Critical Feature Dimension and Critical Sampling Problems. ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and Methods, Proceedings, 1, 360-366.
Available at: https://aquila.usm.edu/fac_pubs/18794