Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
In modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety inte...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/2/501 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589302105636864 |
---|---|
author | Jonghak Lee Sangyoup Kim Tae-Young Heo Dongwoo Lee |
author_facet | Jonghak Lee Sangyoup Kim Tae-Young Heo Dongwoo Lee |
author_sort | Jonghak Lee |
collection | DOAJ |
description | In modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety interventions. In the data preprocessing section, a resampling technique was used to solve the data imbalance problem. Then, we applied five different machine learning models for classification by applying hyperparameter optimization. After classification, model-agnostic interpretation techniques were used to interpret the results of a series of machine learning models. Through the above series of processes, we were able to design a process that analyzes vehicle accident data and derives the factors that affect the accident. The classification model that uses XGBoost with ENN (Edited Nearest Neighbor) shows almost 84.3% accuracy. As a result, for “Length” and “Volume”, we found that certain points (Length: 200 m, 29,233 veh/day) were more likely to have an accident. Moreover, variables, such as volume or the volume of heavy vehicle, the probability of an accident increases as the value increases, but in the case of “Lane width” and “Shoulder width”, it can be confirmed that the probability of occurrence decreases as the value increases. These interpretations have meaningful information that could suggest policy recommendations for reducing traffic accidents and can be helpful in establishing effective traffic accident countermeasures. |
format | Article |
id | doaj-art-fe9dacee519d4ed7b94ea2171c2c6f2e |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-fe9dacee519d4ed7b94ea2171c2c6f2e2025-01-24T13:19:35ZengMDPI AGApplied Sciences2076-34172025-01-0115250110.3390/app15020501Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data AugmentationJonghak Lee0Sangyoup Kim1Tae-Young Heo2Dongwoo Lee3Transportation Pollution Research Center, National Institute of Environmental Research, Seo-gu, Incheon 22689, Republic of KoreaDepartment of Regional Development Research, Jeonbuk State Institute, 1696, Kongjwipatjwi-ro, Wansan-gu, Jeonju 55068, Jeonbuk State, Republic of KoreaDepartment of Information & Statistics, Chungbuk National University, Seowon-gu, Cheongju 28644, Chungbuk, Republic of KoreaDepartment of Smart Cities, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of KoreaIn modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety interventions. In the data preprocessing section, a resampling technique was used to solve the data imbalance problem. Then, we applied five different machine learning models for classification by applying hyperparameter optimization. After classification, model-agnostic interpretation techniques were used to interpret the results of a series of machine learning models. Through the above series of processes, we were able to design a process that analyzes vehicle accident data and derives the factors that affect the accident. The classification model that uses XGBoost with ENN (Edited Nearest Neighbor) shows almost 84.3% accuracy. As a result, for “Length” and “Volume”, we found that certain points (Length: 200 m, 29,233 veh/day) were more likely to have an accident. Moreover, variables, such as volume or the volume of heavy vehicle, the probability of an accident increases as the value increases, but in the case of “Lane width” and “Shoulder width”, it can be confirmed that the probability of occurrence decreases as the value increases. These interpretations have meaningful information that could suggest policy recommendations for reducing traffic accidents and can be helpful in establishing effective traffic accident countermeasures.https://www.mdpi.com/2076-3417/15/2/501vehicle accident classificationdata augmentationmachine learninginterpretability of machine learningmodel-agnostic interpretationSHAP (Shapley Additive Explanations) |
spellingShingle | Jonghak Lee Sangyoup Kim Tae-Young Heo Dongwoo Lee Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation Applied Sciences vehicle accident classification data augmentation machine learning interpretability of machine learning model-agnostic interpretation SHAP (Shapley Additive Explanations) |
title | Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation |
title_full | Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation |
title_fullStr | Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation |
title_full_unstemmed | Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation |
title_short | Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation |
title_sort | identifying the roadway infrastructure factors affecting road accidents using interpretable machine learning and data augmentation |
topic | vehicle accident classification data augmentation machine learning interpretability of machine learning model-agnostic interpretation SHAP (Shapley Additive Explanations) |
url | https://www.mdpi.com/2076-3417/15/2/501 |
work_keys_str_mv | AT jonghaklee identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation AT sangyoupkim identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation AT taeyoungheo identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation AT dongwoolee identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation |