Date of Award

Spring 5-2023

Degree Type

Masters Thesis

Degree Name

Master of Science (MS)

School

Computing Sciences and Computer Engineering

Committee Chair

Dr. Chaoyang Zhang

Committee Chair School

Computing Sciences and Computer Engineering

Committee Member 2

Dr. Sarah Lee

Committee Member 2 School

Computing Sciences and Computer Engineering

Committee Member 3

Dr. Ahmed Sherif

Abstract

Suicide is the second leading cause of death among youths in the USA. Although machine learning approaches have provided great potential for predicting suicide risk using survey data, prediction accuracy may not meet the need for clinical diagnosis due to the intrinsic characteristics of datasets. In this study, I perform a comparative study of six classification algorithms including naïve Bayes (NB), logistic regression (LR), multilayer perceptron (MLP), AdaBoost (Ada), random forest (RF), and bagging using YRBSS dataset and investigate the effectiveness of several data handling techniques to improve the overall performance of suicide risk prediction.

The dataset consists of 76 health risk-related questions with 13,437 responses collected from 136 high school students in the USA. Various preprocessing techniques such as missing value imputation, feature selection, and sampling techniques for handling the imbalanced ratio of the class label were applied to the dataset. The data was partitioned into a training dataset (70%) and a test dataset (30%) using a stratified partitioning method. The performance of the classifiers was evaluated using five evaluation metrics including accuracy, precision, recall, F2 score, and area under the receiver operating characteristic curve (AUROC). The result showed that RF classifier with undersampling method achieved the highest recall of 0.84, F2 measure of 0.72, and AUROC of 0.85 followed by LR and Ada classifiers.

Therefore, I can conclude that RF, LR, AdaBoost are powerful tools for predicting suicidal tendencies in youth. Feature selection and undersampling methods are crucial preprocessing steps necessary to identify adolescents who are at high suicide risk.

Share

COinS