Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation

In modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety inte...

Full description

Saved in:
Bibliographic Details
Main Authors: Jonghak Lee, Sangyoup Kim, Tae-Young Heo, Dongwoo Lee
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/501
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589302105636864
author Jonghak Lee
Sangyoup Kim
Tae-Young Heo
Dongwoo Lee
author_facet Jonghak Lee
Sangyoup Kim
Tae-Young Heo
Dongwoo Lee
author_sort Jonghak Lee
collection DOAJ
description In modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety interventions. In the data preprocessing section, a resampling technique was used to solve the data imbalance problem. Then, we applied five different machine learning models for classification by applying hyperparameter optimization. After classification, model-agnostic interpretation techniques were used to interpret the results of a series of machine learning models. Through the above series of processes, we were able to design a process that analyzes vehicle accident data and derives the factors that affect the accident. The classification model that uses XGBoost with ENN (Edited Nearest Neighbor) shows almost 84.3% accuracy. As a result, for “Length” and “Volume”, we found that certain points (Length: 200 m, 29,233 veh/day) were more likely to have an accident. Moreover, variables, such as volume or the volume of heavy vehicle, the probability of an accident increases as the value increases, but in the case of “Lane width” and “Shoulder width”, it can be confirmed that the probability of occurrence decreases as the value increases. These interpretations have meaningful information that could suggest policy recommendations for reducing traffic accidents and can be helpful in establishing effective traffic accident countermeasures.
format Article
id doaj-art-fe9dacee519d4ed7b94ea2171c2c6f2e
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-fe9dacee519d4ed7b94ea2171c2c6f2e2025-01-24T13:19:35ZengMDPI AGApplied Sciences2076-34172025-01-0115250110.3390/app15020501Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data AugmentationJonghak Lee0Sangyoup Kim1Tae-Young Heo2Dongwoo Lee3Transportation Pollution Research Center, National Institute of Environmental Research, Seo-gu, Incheon 22689, Republic of KoreaDepartment of Regional Development Research, Jeonbuk State Institute, 1696, Kongjwipatjwi-ro, Wansan-gu, Jeonju 55068, Jeonbuk State, Republic of KoreaDepartment of Information & Statistics, Chungbuk National University, Seowon-gu, Cheongju 28644, Chungbuk, Republic of KoreaDepartment of Smart Cities, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of KoreaIn modern society, vehicle accidents have been a factor that has adversely affected national development for a long time. Many countries have tried to solve this issue, and various solutions have been studied. This study aims to design a process for analyzing vehicle accidents to support safety interventions. In the data preprocessing section, a resampling technique was used to solve the data imbalance problem. Then, we applied five different machine learning models for classification by applying hyperparameter optimization. After classification, model-agnostic interpretation techniques were used to interpret the results of a series of machine learning models. Through the above series of processes, we were able to design a process that analyzes vehicle accident data and derives the factors that affect the accident. The classification model that uses XGBoost with ENN (Edited Nearest Neighbor) shows almost 84.3% accuracy. As a result, for “Length” and “Volume”, we found that certain points (Length: 200 m, 29,233 veh/day) were more likely to have an accident. Moreover, variables, such as volume or the volume of heavy vehicle, the probability of an accident increases as the value increases, but in the case of “Lane width” and “Shoulder width”, it can be confirmed that the probability of occurrence decreases as the value increases. These interpretations have meaningful information that could suggest policy recommendations for reducing traffic accidents and can be helpful in establishing effective traffic accident countermeasures.https://www.mdpi.com/2076-3417/15/2/501vehicle accident classificationdata augmentationmachine learninginterpretability of machine learningmodel-agnostic interpretationSHAP (Shapley Additive Explanations)
spellingShingle Jonghak Lee
Sangyoup Kim
Tae-Young Heo
Dongwoo Lee
Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
Applied Sciences
vehicle accident classification
data augmentation
machine learning
interpretability of machine learning
model-agnostic interpretation
SHAP (Shapley Additive Explanations)
title Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
title_full Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
title_fullStr Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
title_full_unstemmed Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
title_short Identifying the Roadway Infrastructure Factors Affecting Road Accidents Using Interpretable Machine Learning and Data Augmentation
title_sort identifying the roadway infrastructure factors affecting road accidents using interpretable machine learning and data augmentation
topic vehicle accident classification
data augmentation
machine learning
interpretability of machine learning
model-agnostic interpretation
SHAP (Shapley Additive Explanations)
url https://www.mdpi.com/2076-3417/15/2/501
work_keys_str_mv AT jonghaklee identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation
AT sangyoupkim identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation
AT taeyoungheo identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation
AT dongwoolee identifyingtheroadwayinfrastructurefactorsaffectingroadaccidentsusinginterpretablemachinelearninganddataaugmentation