A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes

Species-level crop and weed semantic segmentation in agricultural field images enables plant identification and enhanced precision weed management. However, the scarcity of labeled data poses significant challenges for model development. Here, we report a patch-level synthetic data generation pipeli...

Full description

Saved in:
Bibliographic Details
Main Authors: Tang Li, James Burridge, Pieter M. Blok, Wei Guo
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Agriculture
Subjects:
Online Access:https://www.mdpi.com/2077-0472/15/2/138
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589493513748480
author Tang Li
James Burridge
Pieter M. Blok
Wei Guo
author_facet Tang Li
James Burridge
Pieter M. Blok
Wei Guo
author_sort Tang Li
collection DOAJ
description Species-level crop and weed semantic segmentation in agricultural field images enables plant identification and enhanced precision weed management. However, the scarcity of labeled data poses significant challenges for model development. Here, we report a patch-level synthetic data generation pipeline that improves semantic segmentation performance in natural agriculture scenes by creating realistic training samples, achieved by pasting patches of segmented plants onto soil backgrounds. This pipeline effectively preserves foreground context and ensures diverse and accurate samples, thereby enhancing model generalization. The semantic segmentation performance of the baseline model was higher when trained solely on data synthesized by our proposed method compared to training solely on real data, with an approximate increase in the mean intersection over union (mIoU) by approximately 1.1% (from 0.626 to 0.633). Building on this, we created hybrid datasets by combining synthetic and real data and investigated the impact of synthetic data volume. By increasing the number of synthetic images in these hybrid datasets from 1× to 20×, we observed a substantially performance improvement, with mIoU increasing by 15% at 15×. However, the gains diminish beyond this point, with the optimal balance between accuracy and efficiency achieved at 10×. These findings highlight synthetic data as a scalable and effective augmentation strategy for addressing the challenges of limited labeled data in agriculture.
format Article
id doaj-art-ae896e633ad641f6a84a11027af2d15b
institution Kabale University
issn 2077-0472
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Agriculture
spelling doaj-art-ae896e633ad641f6a84a11027af2d15b2025-01-24T13:15:51ZengMDPI AGAgriculture2077-04722025-01-0115213810.3390/agriculture15020138A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural ScenesTang Li0James Burridge1Pieter M. Blok2Wei Guo3Laboratory of Field Phenomics, Institute for Sustainable Agro-Ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 188-0002, JapanLaboratory of Field Phenomics, Institute for Sustainable Agro-Ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 188-0002, JapanLaboratory of Field Phenomics, Institute for Sustainable Agro-Ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 188-0002, JapanLaboratory of Field Phenomics, Institute for Sustainable Agro-Ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 188-0002, JapanSpecies-level crop and weed semantic segmentation in agricultural field images enables plant identification and enhanced precision weed management. However, the scarcity of labeled data poses significant challenges for model development. Here, we report a patch-level synthetic data generation pipeline that improves semantic segmentation performance in natural agriculture scenes by creating realistic training samples, achieved by pasting patches of segmented plants onto soil backgrounds. This pipeline effectively preserves foreground context and ensures diverse and accurate samples, thereby enhancing model generalization. The semantic segmentation performance of the baseline model was higher when trained solely on data synthesized by our proposed method compared to training solely on real data, with an approximate increase in the mean intersection over union (mIoU) by approximately 1.1% (from 0.626 to 0.633). Building on this, we created hybrid datasets by combining synthetic and real data and investigated the impact of synthetic data volume. By increasing the number of synthetic images in these hybrid datasets from 1× to 20×, we observed a substantially performance improvement, with mIoU increasing by 15% at 15×. However, the gains diminish beyond this point, with the optimal balance between accuracy and efficiency achieved at 10×. These findings highlight synthetic data as a scalable and effective augmentation strategy for addressing the challenges of limited labeled data in agriculture.https://www.mdpi.com/2077-0472/15/2/138data synthesisdata augmentationgenerative modelsemantic segmentationprecision weed management
spellingShingle Tang Li
James Burridge
Pieter M. Blok
Wei Guo
A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
Agriculture
data synthesis
data augmentation
generative model
semantic segmentation
precision weed management
title A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
title_full A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
title_fullStr A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
title_full_unstemmed A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
title_short A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes
title_sort patch level data synthesis pipeline enhances species level crop and weed segmentation in natural agricultural scenes
topic data synthesis
data augmentation
generative model
semantic segmentation
precision weed management
url https://www.mdpi.com/2077-0472/15/2/138
work_keys_str_mv AT tangli apatchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT jamesburridge apatchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT pietermblok apatchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT weiguo apatchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT tangli patchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT jamesburridge patchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT pietermblok patchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes
AT weiguo patchleveldatasynthesispipelineenhancesspecieslevelcropandweedsegmentationinnaturalagriculturalscenes