End-to-End Online Video Stitching and Stabilization Method Based on Unsupervised Deep Learning

The limited field of view, cumulative inter-frame jitter, and dynamic parallax interference in handheld video stitching often lead to misalignment and distortion. In this paper, we propose an end-to-end, unsupervised deep-learning framework that jointly performs real-time video stabilization and sti...

Full description

Saved in:
Bibliographic Details
Main Authors: Pengyuan Wang, Pinle Qin, Rui Chai, Jianchao Zeng, Pengcheng Zhao, Zuojun Chen, Bingjie Han
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/5987
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The limited field of view, cumulative inter-frame jitter, and dynamic parallax interference in handheld video stitching often lead to misalignment and distortion. In this paper, we propose an end-to-end, unsupervised deep-learning framework that jointly performs real-time video stabilization and stitching. First, collaborative optimization architecture allows the stabilization and stitching modules to share parameters and propagate errors through a fully differentiable network, ensuring consistent image alignment. Second, a Markov trajectory smoothing strategy in relative coordinates models inter-frame motion as incremental relationships, effectively reducing cumulative errors. Third, a dynamic attention mask generates spatiotemporal weight maps based on foreground motion prediction, suppressing misalignment caused by dynamic objects. Experimental evaluation on diverse handheld sequences shows that our method achieves higher stitching quality, lower geometric distortion rates, and improved video stability compared to state-of-the-art baselines, while maintaining real-time processing capabilities. Ablation studies validate that relative trajectory modeling substantially mitigates long-term jitter and that the dynamic attention mask enhances stitching accuracy in dynamic scenes. These results demonstrate that the proposed framework provides a robust solution for high-quality, real-time handheld video stitching.
ISSN:2076-3417