Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy

Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticat...

Full description

Saved in:
Bibliographic Details
Main Authors: Hassan Afzaal, Derek Rude, Aitazaz A. Farooque, Gurjit S. Randhawa, Arnold W. Schumann, Nicholas Krouglicof
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Smart Agricultural Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772375525001674
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849768946557779968
author Hassan Afzaal
Derek Rude
Aitazaz A. Farooque
Gurjit S. Randhawa
Arnold W. Schumann
Nicholas Krouglicof
author_facet Hassan Afzaal
Derek Rude
Aitazaz A. Farooque
Gurjit S. Randhawa
Arnold W. Schumann
Nicholas Krouglicof
author_sort Hassan Afzaal
collection DOAJ
description Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.
format Article
id doaj-art-18d45a6f17974b6a8d3a9347e5c5fbb0
institution DOAJ
issn 2772-3755
language English
publishDate 2025-08-01
publisher Elsevier
record_format Article
series Smart Agricultural Technology
spelling doaj-art-18d45a6f17974b6a8d3a9347e5c5fbb02025-08-20T03:03:38ZengElsevierSmart Agricultural Technology2772-37552025-08-011110093410.1016/j.atech.2025.100934Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracyHassan Afzaal0Derek Rude1Aitazaz A. Farooque2Gurjit S. Randhawa3Arnold W. Schumann4Nicholas Krouglicof5Faculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE, CanadaCroptimisitcs Technology Inc., Saskatoon, SK, CanadaFaculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE, Canada; Canadian Centre for Climate Change and Adaptation, University of Prince Edward Island, St Peters Bay, PE, Canada; Corresponding author.School of Computer Science, University of Guelph, Guelph, ON, CanadaCitrus Research and Education Center, University of Florida, Gainesville, FL, USAIntempco Canada, Montreal, QC, CanadaPrecision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.http://www.sciencedirect.com/science/article/pii/S2772375525001674Vision TransformersConvolutional Neural NetworksConvFormerCAFormerSWIN TransformersMetaFormers
spellingShingle Hassan Afzaal
Derek Rude
Aitazaz A. Farooque
Gurjit S. Randhawa
Arnold W. Schumann
Nicholas Krouglicof
Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
Smart Agricultural Technology
Vision Transformers
Convolutional Neural Networks
ConvFormer
CAFormer
SWIN Transformers
MetaFormers
title Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
title_full Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
title_fullStr Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
title_full_unstemmed Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
title_short Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
title_sort improved crop row detection by employing attention based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
topic Vision Transformers
Convolutional Neural Networks
ConvFormer
CAFormer
SWIN Transformers
MetaFormers
url http://www.sciencedirect.com/science/article/pii/S2772375525001674
work_keys_str_mv AT hassanafzaal improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy
AT derekrude improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy
AT aitazazafarooque improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy
AT gurjitsrandhawa improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy
AT arnoldwschumann improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy
AT nicholaskrouglicof improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy