The Critical Feature Dimension and Critical Sampling Problems

Document Type

Conference Proceeding

Publication Date

1-1-2015

School

Computing Sciences and Computer Engineering

Abstract

Efficacious data mining methods are critical for knowledge discovery in various applications in the era of big data. Two issues of immediate concern in big data analytic tasks are how to select a critical subset of features and how to select a critical subset of data points for sampling. This position paper presents ongoing research by the authors that suggests: 1. the critical feature dimension problem is theoretically intractable, but simple heuristic methods may well be sufficient for practical purposes; 2. there are big data analytic problems where the success of data mining depends more on the critical feature dimension than the specific features selected, thus a random selection of the features based on the dataset's critical feature dimension will prove sufficient; and 3. The problem of critical sampling has the same intractable complexity as critical feature dimension, but again simple heuristic methods may well be practicable in most applications.

Publication Title

ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and Methods, Proceedings

Volume

1

First Page

360

Last Page

366

Share

COinS