Violence Detection From Industrial Surveillance Videos Using Deep Learning

The integration of Internet of Things (IoT) technology in industrial surveillance and the proliferation of surveillance cameras in smart cities has empowered the development of real-time activity recognition and violence detection systems, respectively. These systems are crucial in enhancing safety...

Full description

Saved in:
Bibliographic Details
Main Authors: Hamza Khan, Xiaohong Yuan, Letu Qingge, Kaushik Roy
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10844266/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The integration of Internet of Things (IoT) technology in industrial surveillance and the proliferation of surveillance cameras in smart cities has empowered the development of real-time activity recognition and violence detection systems, respectively. These systems are crucial in enhancing safety measures, improving operational efficiency, reducing accident risks, and providing automatic monitoring in dynamic environments. In this paper, we propose a three-stage deep learning-based end-to-end framework for violence detection. The lightweight convolutional neural network (CNN) model initially identifies individuals in the video stream to minimize the processing of irrelevant frames. Subsequently, a sequence of 50 frames with identified persons is directed to a 3D-CNN model, where the spatiotemporal features of these sequences are extracted and passed to the classifier. Unlike traditional methods that process all frames indiscriminately, this targeted filtering mechanism allows computational resources to be allocated more effectively. Next, SoftMax classifier processes the extracted features to categorize frame sequences as violent or non-violent. The classifier’s predictions trigger real-time alerts, enabling rapid intervention. The modularity of this stage supports adaptability to new datasets, as it can leverage transfer learning to generalize across diverse surveillance contexts. Unlike traditional systems constrained by hand-crafted features, this design dynamically learns from data, reducing reliance on prior domain knowledge and improving generalizability. We conducted experiments on violence detection across four datasets, comparing the performance of our model with convolutional CNN models. A computation time analysis revealed that our lightweight model requires significantly less computation time, demonstrating its efficiency. We also conducted cross-data experiments to assess the model’s capacity to perform consistently across various datasets. Experiments show that our proposed model outperforms the methods mentioned in the existing literature. These experiments demonstrate that the model’s adaptability and robustness need to be improved.
ISSN:2169-3536