Faculty Publications

Critical Feature Selection and Critical Sampling for Data Mining

Bernardete Ribeiro, Universidade de CoimbraFollow
José Silva, Universidade de Coimbra
Andrew H. Sung, University of Southern Mississippi, School of ComputingFollow
Divya Suryakumar, ConstructConnect

Document Type

Conference Proceeding

Publication Date

1-1-2018

School

Computing Sciences and Computer Engineering

Abstract

© 2018, Springer Nature Singapore Pte Ltd. The rapidly growing big data generated by connected sensors, devices, the web and social network platforms, etc., have stimulated the advancement of data science, which holds tremendous potential for problem solving in various domains. How to properly utilize the data in model building to obtain accurate analytics and knowledge discovery is a topic of great importance in data mining, and wherefore two issues arise: how to select a critical subset of features and how to select a critical subset of data points for sampling. This paper presents ongoing research that suggests: 1. the critical feature dimension problem is theoretically intractable, but simple heuristic methods may well be sufficient for practical purposes; 2. there are big data analytic problems where evidence suggest that the success of data mining depends more on the critical feature dimension than the specific features selected, thus a random selection of the features based on the dataset’s critical feature dimension will prove sufficient; and 3. The problem of critical sampling has the same intractable complexity as critical feature dimension, but again simple heuristic methods may well be practicable in most applications; experimental results with several versions of the heuristic method are presented and discussed. Finally, a set of metrics for data quality is proposed based on the concepts of critical features and critical sampling.

Publication Title

Communications in Computer and Information Science

Volume

844

First Page

Last Page

Recommended Citation

Ribeiro, B., Silva, J., Sung, A., Suryakumar, D. (2018). Critical Feature Selection and Critical Sampling for Data Mining. Communications in Computer and Information Science, 844, 13-24.
Available at: https://aquila.usm.edu/fac_pubs/18176

Link to Full Text

Find in your library

COinS

Faculty Publications

Critical Feature Selection and Critical Sampling for Data Mining

Document Type

Publication Date

School

Abstract

Publication Title

Volume

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

Faculty Publications

Critical Feature Selection and Critical Sampling for Data Mining

Authors

Document Type

Publication Date

School

Abstract

Publication Title

Volume

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner