Date of Award
Spring 5-2023
Degree Type
Masters Thesis
Degree Name
Master of Science (MS)
School
Computing Sciences and Computer Engineering
Committee Chair
Dr. Chaoyang Zhang
Committee Chair School
Computing Sciences and Computer Engineering
Committee Member 2
Dr. Sarah Lee
Committee Member 2 School
Computing Sciences and Computer Engineering
Committee Member 3
Dr. Ahmed Sherif
Abstract
Suicide is the second leading cause of death among youths in the USA. Although machine learning approaches have provided great potential for predicting suicide risk using survey data, prediction accuracy may not meet the need for clinical diagnosis due to the intrinsic characteristics of datasets. In this study, I perform a comparative study of six classification algorithms including naïve Bayes (NB), logistic regression (LR), multilayer perceptron (MLP), AdaBoost (Ada), random forest (RF), and bagging using YRBSS dataset and investigate the effectiveness of several data handling techniques to improve the overall performance of suicide risk prediction.
The dataset consists of 76 health risk-related questions with 13,437 responses collected from 136 high school students in the USA. Various preprocessing techniques such as missing value imputation, feature selection, and sampling techniques for handling the imbalanced ratio of the class label were applied to the dataset. The data was partitioned into a training dataset (70%) and a test dataset (30%) using a stratified partitioning method. The performance of the classifiers was evaluated using five evaluation metrics including accuracy, precision, recall, F2 score, and area under the receiver operating characteristic curve (AUROC). The result showed that RF classifier with undersampling method achieved the highest recall of 0.84, F2 measure of 0.72, and AUROC of 0.85 followed by LR and Ada classifiers.
Therefore, I can conclude that RF, LR, AdaBoost are powerful tools for predicting suicidal tendencies in youth. Feature selection and undersampling methods are crucial preprocessing steps necessary to identify adolescents who are at high suicide risk.
Copyright
Saswati Bhattacharjee
Recommended Citation
Bhattacharjee, Saswati, "Predicting Suicide Risk Among Youths Using Machine Learning Methods" (2023). Master's Theses. 973.
https://aquila.usm.edu/masters_theses/973