EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera

Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To...

Full description

Saved in:
Bibliographic Details
Main Authors: Orla Sealy Phelan, Dara Molloy, Roshan George, Edward Jones, Martin Glavin, Brian Deegan
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/15/4540
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional Neural Networks (CNNs) often have a resolution limitation, requiring images to be down-sampled before inference, causing significant information loss. Event-based cameras are neuromorphic vision sensors with high temporal resolution, low power consumption, and high dynamic range, making them preferable to regular RGB cameras in many situations. This project proposes the fusion of an event-based camera with an RGB camera to mitigate the trade-off between temporal resolution and accuracy, while minimising power consumption. The cameras are calibrated to create a multi-modal stereo vision system where pixel coordinates can be projected between the event and RGB camera image planes. This calibration is used to project bounding boxes detected by clustering of events into the RGB image plane, thereby cropping each RGB frame instead of down-sampling to meet the requirements of the CNN. Using the Common Objects in Context (COCO) dataset evaluator, the average precision (AP) for the bicycle class in RGB scenes improved from 21.08 to 57.38. Additionally, AP increased across all classes from 37.93 to 46.89. To reduce system latency, a novel object detection approach is proposed where the event camera acts as a region proposal network, and a classification algorithm is run on the proposed regions. This achieved a 78% improvement over baseline.
ISSN:1424-8220