Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy

Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticat...

Full description

Saved in:
Bibliographic Details
Main Authors: Hassan Afzaal, Derek Rude, Aitazaz A. Farooque, Gurjit S. Randhawa, Arnold W. Schumann, Nicholas Krouglicof
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Smart Agricultural Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772375525001674
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.
ISSN:2772-3755