cigChannel: a large-scale 3D seismic dataset with labeled paleochannels for advancing deep learning in seismic interpretation

<p>Identifying paleochannels in 3D seismic volumes (seismic paleochannel interpretation) is essential for georesource development and offering insights into paleoclimate conditions. However, it remains a labor-intensive and time-consuming task. Deep learning has shown great promise in autom...

Full description

Saved in:
Bibliographic Details
Main Authors: G. Wang, X. Wu, W. Zhang
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Series:Earth System Science Data
Online Access:https://essd.copernicus.org/articles/17/3447/2025/essd-17-3447-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p>Identifying paleochannels in 3D seismic volumes (seismic paleochannel interpretation) is essential for georesource development and offering insights into paleoclimate conditions. However, it remains a labor-intensive and time-consuming task. Deep learning has shown great promise in automating seismic paleochannel interpretation with high efficiency and accuracy, as demonstrated in similar image segmentation tasks in computer vision (CV). Yet, unlike the CV domain, seismic exploration lacks a comprehensive labeled dataset for paleochannels, significantly hindering the development, application, and evaluation of deep learning methods in this field. Manual labeling of paleochannels in 3D seismic volumes is tedious and subjective, potentially leading to mislabeling that degrades deep learning model's performance. To address this, we propose a workflow to generate a synthetic seismic dataset, <i>cigChannel</i>, consisting of 1600 seismic volumes with over 10 000 labeled paleochannels. Each volume has a size of <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M1" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">256</mn><mo>×</mo><mn mathvariant="normal">256</mn><mo>×</mo><mn mathvariant="normal">256</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="79pt" height="10pt" class="svg-formula" dspmath="mathimg" md5hash="aa81b7525bb2a615d46addb872c74d66"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="essd-17-3447-2025-ie00001.svg" width="79pt" height="10pt" src="essd-17-3447-2025-ie00001.png"/></svg:svg></span></span> samples. This is the largest dataset to date for seismic paleochannel interpretation, featuring geologically reasonable seismic volumes with accurately labeled meandering channels, tributary channel networks, and submarine canyons. A convolutional neural network (simplified from U-Net) trained on this dataset achieves <span class="inline-formula"><i>F</i><sub>1</sub></span> scores of 0.52, 0.73, and 0.63 in identifying meandering channels, tributary channel networks, and submarine canyons in three field seismic volumes, respectively. However, the synthetic seismic volumes in the <i>cigChannel</i> dataset still lack the variability and realism of field seismic data, potentially affecting the deep learning model's generalizability. To facilitate further research, we publicly release the dataset (<span class="cit" id="xref_altparen.1"><a href="#bib1.bibx66">Wang et al.</a>, <a href="#bib1.bibx66">2024</a></span>, <a href="https://doi.org/10.5281/zenodo.10791151">https://doi.org/10.5281/zenodo.10791151</a>), data generation codes, and trained U-Net model (<span class="cit" id="xref_altparen.2"><a href="#bib1.bibx64">Wang</a>, <a href="#bib1.bibx64">2024</a></span>, <span class="uri">https://github.com/wanggy-1/cigChannel</span>, last access: 5 July 2025), aiming to advance deep learning approaches for seismic paleochannel interpretation.</p>
ISSN:1866-3508
1866-3516