Violence Detection From Industrial Surveillance Videos Using Deep Learning

The integration of Internet of Things (IoT) technology in industrial surveillance and the proliferation of surveillance cameras in smart cities has empowered the development of real-time activity recognition and violence detection systems, respectively. These systems are crucial in enhancing safety...

Full description

Saved in:
Bibliographic Details
Main Authors: Hamza Khan, Xiaohong Yuan, Letu Qingge, Kaushik Roy
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10844266/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832584007720632320
author Hamza Khan
Xiaohong Yuan
Letu Qingge
Kaushik Roy
author_facet Hamza Khan
Xiaohong Yuan
Letu Qingge
Kaushik Roy
author_sort Hamza Khan
collection DOAJ
description The integration of Internet of Things (IoT) technology in industrial surveillance and the proliferation of surveillance cameras in smart cities has empowered the development of real-time activity recognition and violence detection systems, respectively. These systems are crucial in enhancing safety measures, improving operational efficiency, reducing accident risks, and providing automatic monitoring in dynamic environments. In this paper, we propose a three-stage deep learning-based end-to-end framework for violence detection. The lightweight convolutional neural network (CNN) model initially identifies individuals in the video stream to minimize the processing of irrelevant frames. Subsequently, a sequence of 50 frames with identified persons is directed to a 3D-CNN model, where the spatiotemporal features of these sequences are extracted and passed to the classifier. Unlike traditional methods that process all frames indiscriminately, this targeted filtering mechanism allows computational resources to be allocated more effectively. Next, SoftMax classifier processes the extracted features to categorize frame sequences as violent or non-violent. The classifier’s predictions trigger real-time alerts, enabling rapid intervention. The modularity of this stage supports adaptability to new datasets, as it can leverage transfer learning to generalize across diverse surveillance contexts. Unlike traditional systems constrained by hand-crafted features, this design dynamically learns from data, reducing reliance on prior domain knowledge and improving generalizability. We conducted experiments on violence detection across four datasets, comparing the performance of our model with convolutional CNN models. A computation time analysis revealed that our lightweight model requires significantly less computation time, demonstrating its efficiency. We also conducted cross-data experiments to assess the model’s capacity to perform consistently across various datasets. Experiments show that our proposed model outperforms the methods mentioned in the existing literature. These experiments demonstrate that the model’s adaptability and robustness need to be improved.
format Article
id doaj-art-f8df41443fa649a1b1b3eaf1d9b49b46
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-f8df41443fa649a1b1b3eaf1d9b49b462025-01-28T00:01:41ZengIEEEIEEE Access2169-35362025-01-0113153631537510.1109/ACCESS.2025.353121310844266Violence Detection From Industrial Surveillance Videos Using Deep LearningHamza Khan0https://orcid.org/0000-0003-1857-7468Xiaohong Yuan1https://orcid.org/0000-0002-1295-9812Letu Qingge2https://orcid.org/0000-0002-9745-9584Kaushik Roy3https://orcid.org/0000-0002-9026-5322Department of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC, USADepartment of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC, USADepartment of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC, USADepartment of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC, USAThe integration of Internet of Things (IoT) technology in industrial surveillance and the proliferation of surveillance cameras in smart cities has empowered the development of real-time activity recognition and violence detection systems, respectively. These systems are crucial in enhancing safety measures, improving operational efficiency, reducing accident risks, and providing automatic monitoring in dynamic environments. In this paper, we propose a three-stage deep learning-based end-to-end framework for violence detection. The lightweight convolutional neural network (CNN) model initially identifies individuals in the video stream to minimize the processing of irrelevant frames. Subsequently, a sequence of 50 frames with identified persons is directed to a 3D-CNN model, where the spatiotemporal features of these sequences are extracted and passed to the classifier. Unlike traditional methods that process all frames indiscriminately, this targeted filtering mechanism allows computational resources to be allocated more effectively. Next, SoftMax classifier processes the extracted features to categorize frame sequences as violent or non-violent. The classifier’s predictions trigger real-time alerts, enabling rapid intervention. The modularity of this stage supports adaptability to new datasets, as it can leverage transfer learning to generalize across diverse surveillance contexts. Unlike traditional systems constrained by hand-crafted features, this design dynamically learns from data, reducing reliance on prior domain knowledge and improving generalizability. We conducted experiments on violence detection across four datasets, comparing the performance of our model with convolutional CNN models. A computation time analysis revealed that our lightweight model requires significantly less computation time, demonstrating its efficiency. We also conducted cross-data experiments to assess the model’s capacity to perform consistently across various datasets. Experiments show that our proposed model outperforms the methods mentioned in the existing literature. These experiments demonstrate that the model’s adaptability and robustness need to be improved.https://ieeexplore.ieee.org/document/10844266/Activity detectionindustrial surveillanceviolence detectioncomputer visiondeep learning
spellingShingle Hamza Khan
Xiaohong Yuan
Letu Qingge
Kaushik Roy
Violence Detection From Industrial Surveillance Videos Using Deep Learning
IEEE Access
Activity detection
industrial surveillance
violence detection
computer vision
deep learning
title Violence Detection From Industrial Surveillance Videos Using Deep Learning
title_full Violence Detection From Industrial Surveillance Videos Using Deep Learning
title_fullStr Violence Detection From Industrial Surveillance Videos Using Deep Learning
title_full_unstemmed Violence Detection From Industrial Surveillance Videos Using Deep Learning
title_short Violence Detection From Industrial Surveillance Videos Using Deep Learning
title_sort violence detection from industrial surveillance videos using deep learning
topic Activity detection
industrial surveillance
violence detection
computer vision
deep learning
url https://ieeexplore.ieee.org/document/10844266/
work_keys_str_mv AT hamzakhan violencedetectionfromindustrialsurveillancevideosusingdeeplearning
AT xiaohongyuan violencedetectionfromindustrialsurveillancevideosusingdeeplearning
AT letuqingge violencedetectionfromindustrialsurveillancevideosusingdeeplearning
AT kaushikroy violencedetectionfromindustrialsurveillancevideosusingdeeplearning