
Driver behavior is a critical determinant of road safety, influencing the likelihood and severity of traffic crashes. This study evaluates the performance of three advanced machine learning models: (i) RNN-AdaBoost, (ii) Generative Adversarial Networks (GANs), and (iii) XGBoost-RF for classifying hazardous driving behaviors across multiple safety levels. A key focus is on assessing the impact of Conditional GANs (cGANs) for generating synthetic data to address class imbalances and enhance classification accuracy. The dataset used in this research originates from a naturalistic driving study conducted in Belgium and the UK, capturing a diverse range of real-world driving behaviors. Initially, driving behaviors were categorized into three safety levels: Normal, Dangerous, and Avoidable Accident. However, following data augmentation with cGANs, a revised classification schema was introduced, consolidating risk levels into a binary system: Normal and Avoidable Accident. This adjustment aimed to optimize the training process while maintaining interpretability. Each model employed in this study offers distinct advantages. RNN-AdaBoost leverages temporal dependencies in driving data and integrates boosting to refine classification accuracy. GANs, both before and after data augmentation, are evaluated for their effectiveness in improving model generalization. XGBoost, a powerful ensemble learning algorithm, provides robust and scalable risk classification. The study further employs SHAP (Shapley Additive Explanations) analysis to interpret model predictions, identifying key factors such as harsh acceleration and braking as dominant risk indicators. The results reveal that while the cGAN-enhanced dataset significantly improves GAN model performance boosting accuracy from 76% to 90% in Belgium and 79% to 91% in the UK it introduces overfitting risks in the hybrid models. The XGBoost-RF and RNN-AdaBoost models achieve near-perfect accuracy on augmented data but struggle to generalize effectively to real-world scenarios. This study underscores both the potential and the challenges of cGAN-generated synthetic data in driving behavior classification, highlighting the need for careful validation and adaptive augmentation strategies.
ID | pc616 |
Presentation | |
Full Text | |
Tags |