Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy
Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticat...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-08-01
|
| Series: | Smart Agricultural Technology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772375525001674 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849768946557779968 |
|---|---|
| author | Hassan Afzaal Derek Rude Aitazaz A. Farooque Gurjit S. Randhawa Arnold W. Schumann Nicholas Krouglicof |
| author_facet | Hassan Afzaal Derek Rude Aitazaz A. Farooque Gurjit S. Randhawa Arnold W. Schumann Nicholas Krouglicof |
| author_sort | Hassan Afzaal |
| collection | DOAJ |
| description | Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity. |
| format | Article |
| id | doaj-art-18d45a6f17974b6a8d3a9347e5c5fbb0 |
| institution | DOAJ |
| issn | 2772-3755 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Smart Agricultural Technology |
| spelling | doaj-art-18d45a6f17974b6a8d3a9347e5c5fbb02025-08-20T03:03:38ZengElsevierSmart Agricultural Technology2772-37552025-08-011110093410.1016/j.atech.2025.100934Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracyHassan Afzaal0Derek Rude1Aitazaz A. Farooque2Gurjit S. Randhawa3Arnold W. Schumann4Nicholas Krouglicof5Faculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE, CanadaCroptimisitcs Technology Inc., Saskatoon, SK, CanadaFaculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE, Canada; Canadian Centre for Climate Change and Adaptation, University of Prince Edward Island, St Peters Bay, PE, Canada; Corresponding author.School of Computer Science, University of Guelph, Guelph, ON, CanadaCitrus Research and Education Center, University of Florida, Gainesville, FL, USAIntempco Canada, Montreal, QC, CanadaPrecision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.http://www.sciencedirect.com/science/article/pii/S2772375525001674Vision TransformersConvolutional Neural NetworksConvFormerCAFormerSWIN TransformersMetaFormers |
| spellingShingle | Hassan Afzaal Derek Rude Aitazaz A. Farooque Gurjit S. Randhawa Arnold W. Schumann Nicholas Krouglicof Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy Smart Agricultural Technology Vision Transformers Convolutional Neural Networks ConvFormer CAFormer SWIN Transformers MetaFormers |
| title | Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| title_full | Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| title_fullStr | Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| title_full_unstemmed | Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| title_short | Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| title_sort | improved crop row detection by employing attention based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy |
| topic | Vision Transformers Convolutional Neural Networks ConvFormer CAFormer SWIN Transformers MetaFormers |
| url | http://www.sciencedirect.com/science/article/pii/S2772375525001674 |
| work_keys_str_mv | AT hassanafzaal improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy AT derekrude improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy AT aitazazafarooque improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy AT gurjitsrandhawa improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy AT arnoldwschumann improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy AT nicholaskrouglicof improvedcroprowdetectionbyemployingattentionbasedvisiontransformersandconvolutionalneuralnetworkswithintegrateddepthmodelingforprecisespatialaccuracy |