Date of Award
5-2025
Degree Type
Honors College Thesis
Academic Program
Computer Science BS
Department
Computing
First Advisor
Ahmed Sherif, Ph.D.
Advisor Department
Computing
Abstract
Federated Learning (FL) is a Machine Learning (ML) approach that decentralizes training across distributed devices, eliminating the need to centralize data. Unlike traditional ML, where models are trained on aggregated data, FL sends a global model to multiple nodes for local training, with updated parameters transmitted back to the server for aggregation. This process preserves data privacy, making FL ideal for sensitive applications like cybersecurity. However, FL introduces challenges such as data heterogeneity, communication overhead, and difficulties in achieving model convergence, which can impact performance.
This study investigates a fundamental assumption in ML and FL research: that the superior performance of a model in a centralized setting will extend to an FL setup. Specifically, this study explores whether models that excel in centralized ML—such as Neural Networks (NNs) known for handling unstructured data—perform equally well in FL environments where data is split across nodes, each with a limited dataset. This constraint can affect the model's ability to capture intricate patterns, potentially leading to suboptimal performance when aggregating parameters. Given the critical role of cybersecurity, particularly with the rise of autonomous and connected vehicles, it is essential to understand how FL can preserve privacy without affecting the model performance.
The Car Hacking: Attack & Defense Challenge 2020 Dataset, featuring CAN bus traffic data from a Hyundai Avante CN7 with normal and attack messages, was used to simulate a real-world cybersecurity environment. This study focuses on binary classification to evaluate FL models' effectiveness in detecting attacks, highlighting challenges in distributed data environments. Preprocessing involved label encoding and limiting the task to binary detection.
The results reveal that strong centralized ML performance does not always translate to FL. While Naive Bayes excelled in centralized settings, XGBoost performed better in FL, highlighting the need for tailored model selection in distributed environments with limited local training.
These findings underscore the need to tailor FL models to distributed systems' constraints. Future research should examine the effects of more clients and training rounds to refine ML-FL performance dynamics. This study offers insights into developing FL-based IDS, enhancing privacy and adaptability in cybersecurity for connected and autonomous vehicles.
Copyright
Copyright for this thesis is owned by the author. It may be freely accessed by all users. However, any reuse or reproduction not covered by the exceptions of the Fair Use or Educational Use clauses of U.S. Copyright Law or without permission of the copyright holder may be a violation of federal law. Contact the administrator if you have additional questions.
Recommended Citation
Leonhardt, Tim, "A Comprehensive Performance Comparison of Machine Learning and Federated Learning for Intrusion Detection in Vehicular Ad-Hoc Networks Using CAN-Bus Data" (2025). Honors Theses. 994.
https://aquila.usm.edu/honors_theses/994