Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization

The total organic carbon (TOC) content is a critical parameter for estimating shale oil resources. However, common TOC prediction methods rely on empirical formulas, and their applicability varies widely from region to region. In this study, a novel data-driven Bayesian optimization extreme gradient...

Full description

Saved in:
Bibliographic Details
Main Authors: Xingzhou Liu, Zhi Tian, Chang Chen
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Geofluids
Online Access:http://dx.doi.org/10.1155/2021/6155663
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832561372827746304
author Xingzhou Liu
Zhi Tian
Chang Chen
author_facet Xingzhou Liu
Zhi Tian
Chang Chen
author_sort Xingzhou Liu
collection DOAJ
description The total organic carbon (TOC) content is a critical parameter for estimating shale oil resources. However, common TOC prediction methods rely on empirical formulas, and their applicability varies widely from region to region. In this study, a novel data-driven Bayesian optimization extreme gradient boosting (XGBoost) model was proposed to predict the TOC content using wireline log data. The lacustrine shale in the Damintun Sag, Bohai Bay Basin, China, was used as a case study. Firstly, correlation analysis was used to analyze the relationship between the well logs and the core-measured TOC data. Based on the degree of correlation, six logging curves reflecting TOC content were selected to construct training dataset for machine learning. Then, the performance of the XGBoost model was tested using K-fold cross-validation, and the hyperparameters of the model were determined using a Bayesian optimization method to improve the search efficiency and reduce the uncertainty caused by the rule of thumb. Next, through the analysis of prediction errors, the coefficient of determination (R2) of the TOC content predicted by the XGBoost model and the core-measured TOC content reached 0.9135. The root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 0.63, 0.77, and 12.55%, respectively. In addition, five commonly used methods, namely, ΔlogR method, random forest, support vector machine, K-nearest neighbors, and multiple linear regression, were used to predict the TOC content to confirm that the XGBoost model has higher prediction accuracy and better robustness. Finally, the proposed approach was applied to predict the TOC curves of 20 exploration wells in the Damintun Sag. We obtained quantitative contour maps of the TOC content of this block for the first time. The results of this study facilitate the rapid detection of the sweet spots of the lacustrine shale oil.
format Article
id doaj-art-981c3c5066cf47e398631368bab1f7aa
institution Kabale University
issn 1468-8115
1468-8123
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Geofluids
spelling doaj-art-981c3c5066cf47e398631368bab1f7aa2025-02-03T01:25:11ZengWileyGeofluids1468-81151468-81232021-01-01202110.1155/2021/61556636155663Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian OptimizationXingzhou Liu0Zhi Tian1Chang Chen2Research Institute of Exploration and Development, Liaohe Oilfield Company, Petrochina, Panjin 124010, ChinaResearch Institute of Exploration and Development, Liaohe Oilfield Company, Petrochina, Panjin 124010, ChinaResearch Institute of Exploration and Development, Liaohe Oilfield Company, Petrochina, Panjin 124010, ChinaThe total organic carbon (TOC) content is a critical parameter for estimating shale oil resources. However, common TOC prediction methods rely on empirical formulas, and their applicability varies widely from region to region. In this study, a novel data-driven Bayesian optimization extreme gradient boosting (XGBoost) model was proposed to predict the TOC content using wireline log data. The lacustrine shale in the Damintun Sag, Bohai Bay Basin, China, was used as a case study. Firstly, correlation analysis was used to analyze the relationship between the well logs and the core-measured TOC data. Based on the degree of correlation, six logging curves reflecting TOC content were selected to construct training dataset for machine learning. Then, the performance of the XGBoost model was tested using K-fold cross-validation, and the hyperparameters of the model were determined using a Bayesian optimization method to improve the search efficiency and reduce the uncertainty caused by the rule of thumb. Next, through the analysis of prediction errors, the coefficient of determination (R2) of the TOC content predicted by the XGBoost model and the core-measured TOC content reached 0.9135. The root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 0.63, 0.77, and 12.55%, respectively. In addition, five commonly used methods, namely, ΔlogR method, random forest, support vector machine, K-nearest neighbors, and multiple linear regression, were used to predict the TOC content to confirm that the XGBoost model has higher prediction accuracy and better robustness. Finally, the proposed approach was applied to predict the TOC curves of 20 exploration wells in the Damintun Sag. We obtained quantitative contour maps of the TOC content of this block for the first time. The results of this study facilitate the rapid detection of the sweet spots of the lacustrine shale oil.http://dx.doi.org/10.1155/2021/6155663
spellingShingle Xingzhou Liu
Zhi Tian
Chang Chen
Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
Geofluids
title Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
title_full Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
title_fullStr Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
title_full_unstemmed Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
title_short Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization
title_sort total organic carbon content prediction in lacustrine shale using extreme gradient boosting machine learning based on bayesian optimization
url http://dx.doi.org/10.1155/2021/6155663
work_keys_str_mv AT xingzhouliu totalorganiccarboncontentpredictioninlacustrineshaleusingextremegradientboostingmachinelearningbasedonbayesianoptimization
AT zhitian totalorganiccarboncontentpredictioninlacustrineshaleusingextremegradientboostingmachinelearningbasedonbayesianoptimization
AT changchen totalorganiccarboncontentpredictioninlacustrineshaleusingextremegradientboostingmachinelearningbasedonbayesianoptimization