An effective dual encoder network with a feature attention large kernel for building extraction
Transformer models boost building extraction accuracy by capturing global features from images. However, convolutional networks’ potential in local feature extraction remains underutilized in CNN + Transformer models, limiting performance. To harness convolutional networks for local feature extracti...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2024-01-01
|
| Series: | Geocarto International |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/10106049.2024.2375572 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Transformer models boost building extraction accuracy by capturing global features from images. However, convolutional networks’ potential in local feature extraction remains underutilized in CNN + Transformer models, limiting performance. To harness convolutional networks for local feature extraction, we propose a feature attention large kernel (ALK) module and a dual encoder network for high-resolution image-building extraction. The model integrates an attention-based large kernel encoder, a ResNet50-Transformer encoder, a Channel Transformer (Ctrans) module and a decoder. Efficiently capturing local and global building features from both convolutional and positional perspectives, the dual encoder enhances performance. Moreover, replacing skip connections with the CTrans module mitigates semantic inconsistency during feature fusion, ensuring better multidimensional feature integration. Experimental results demonstrate superior extraction of local and global features compared to other models, showcasing the potential of enhancing local feature extraction in advancing CNN + Transformer models. |
|---|---|
| ISSN: | 1010-6049 1752-0762 |