A Comprehensive Causal And Machine Learning Framework For Autism Spectrum Disorder Risk Prediction And High-Risk Subgroup Identification From Children With Chronic Health And Genetic Disorders In The United States
Document Type
Article
Publication Date
3-1-2026
School
Health Professions
Abstract
Purpose: Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition often accompanied by diverse comorbidities. This study aimed to develop and evaluate a machine learning framework to identify key predictors of ASD and characterize disease–disease relationships. Methods: We analyzed 2022–2023 National Survey of Children's Health (NSCH) data. Propensity score matching (PSM) was applied to reduce confounding and approximate a randomized controlled design. Multiple machine learning algorithms were implemented to identify important predictors and assess disease–disease interactions. Model performance was evaluated using receiver operating characteristic (ROC) curves, calibration plots, and cross-validation. Results: In matched cohorts, epilepsy showed the strongest association with ASD (20.0 % vs. 6.1 %; OR = 3.85, 95 % CI: 2.88–5.20). Down syndrome (DS) (14.5 % vs. 4.7 %; OR = 3.45, 95 % CI: 1.75–7.30) and congenital heart disease (CHD) (7.5 % vs. 5.4 %; OR = 1.42, 95 % CI: 1.14–1.77) were also significant, whereas diabetes, cystic fibrosis, and allergies demonstrated no consistent associations. Across machine learning analyses, epilepsy consistently emerged as the top predictor, with DS and CHD also ranked highly. Children with both epilepsy and CHD represented the highest-risk subgroup (>32 %), while those without epilepsy but with both allergies and CHD had moderate risk (>10 %). Model discrimination was good (AUC = 0.771) with satisfactory calibration. Conclusion: Epilepsy is the strongest predictor of ASD, with epilepsy - CHD comorbidity conferring the highest-risk subgroup, whereas no epilepsy but allergies and CHD represented moderate-risk groups. This framework highlights the value of advanced analytics for early identification and targeted interventions in vulnerable populations.
Publication Title
Research in Autism
Volume
131
Recommended Citation
Ahmmad, M.,
Pantazopoulos, H.,
Kothiya, S.
(2026). A Comprehensive Causal And Machine Learning Framework For Autism Spectrum Disorder Risk Prediction And High-Risk Subgroup Identification From Children With Chronic Health And Genetic Disorders In The United States. Research in Autism, 131.
Available at: https://aquila.usm.edu/fac_pubs/22039
COinS