PHYSICS-DRIVEN FEATURE CREATION TO IMPROVE MACHINE LEARNING MODELS PERFORMANCE FOR OIL PRODUCTION RATE PREDICTION

This paper aims to develop a machine learning-based model for oil production rate prediction. The significance of feature dimension reduction is addressed by applying well-established approaches like Principal Component Analysis (PCA) and the proposed physics-driven feature creation technique. The p...

Full description

Saved in:
Bibliographic Details
Main Authors: Eghbal Motaei, Seyed Mehdi Tabatabai, Tarek Ganat, Ahmad Khanifar, Sulaiman Dzaiy, Timur Chis
Format: Article
Language:English
Published: Petroleum-Gas University of Ploiesti 2024-12-01
Series:Romanian Journal of Petroleum & Gas Technology
Subjects:
Online Access:http://jpgt.upg-ploiesti.ro/wp-content/uploads/2024/12/22_RJPGT_no.2-2024-Physics-driven-feature-ML-models-performance-oil-production-prediction.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper aims to develop a machine learning-based model for oil production rate prediction. The significance of feature dimension reduction is addressed by applying well-established approaches like Principal Component Analysis (PCA) and the proposed physics-driven feature creation technique. The physics-driven features, derived from experience or analytical modeling, introduce physical relevance and improve model quality. The study focuses on oil production prediction using a dataset that includes reservoir permeability, wellbore skin, reservoir pressure, net pay thickness, water cut, and well-liquid production rate. Several machine learning techniques, such as SVM, k-NN, Decision Tree, Random Forest, and linear regression, were constructed using PCA feature selection. The models were tuned and validated using k-fold cross-validation. The same models were then built using physics-driven features, and their performance metrics were compared. The results show significant improvement when applying the proposed physics-driven feature creation, compared to PCA. Over 10-fold cross-validation, PCA improved the R² performance metric by 10% (from 70% to 77%), while physics-driven features increased it by 20% (from 70% to 90% on average). The Random Forest and linear regression models outperformed the others, particularly when built based on physics-driven features. Additionally, models based on physics-driven features exhibited less sensitivity to data splits for learning and testing, proving more reliable with better performance metrics compared to those using original features.
ISSN:2734-5319
2972-0370