HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features

Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inade...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiyin Chen, Xiaohu Zhang, Yonghua Shi, Junjie Pang
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/5/1333
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inadequate detection accuracy for small-scale defects due to substantial downsampling, inconsistencies between classification scores and localization confidence, and feature resolution loss caused by simple upsampling and downsampling strategies. To address these challenges, we propose the HCT-Det model, which incorporates a window-based self-attention residual (WSA-R) block structure. This structure combines window-based self-attention (WSA) blocks to reduce computational overhead and parallel residual convolutional (Res) blocks to enhance local feature continuity. The model’s backbone generates three cross-scale features as encoder inputs, which undergo Intra-Scale Feature Interaction (ISFI) and Cross-Scale Feature Interaction (CSFI) to improve detection accuracy for targets of various sizes. A Soft IoU-Aware mechanism ensures alignment between classification scores and intersection-over-union (IoU) metrics during training. Additionally, Hybrid Downsampling (HDownsample) and Hybrid Upsampling (HUpsample) modules minimize feature degradation. Our experiments demonstrate that HCT-Det achieved a mean average precision (mAP@0.5) of 0.795 on the NEU-DET dataset and 0.733 on the GC10-DET dataset, outperforming other state-of-the-art approaches. These results highlight the model’s effectiveness in improving computational efficiency and detection accuracy for steel surface defect detection.
ISSN:1424-8220