PHYSICS-DRIVEN FEATURE CREATION TO IMPROVE MACHINE LEARNING MODELS PERFORMANCE FOR OIL PRODUCTION RATE PREDICTION
This paper aims to develop a machine learning-based model for oil production rate prediction. The significance of feature dimension reduction is addressed by applying well-established approaches like Principal Component Analysis (PCA) and the proposed physics-driven feature creation technique. The p...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Petroleum-Gas University of Ploiesti
2024-12-01
|
Series: | Romanian Journal of Petroleum & Gas Technology |
Subjects: | |
Online Access: | http://jpgt.upg-ploiesti.ro/wp-content/uploads/2024/12/22_RJPGT_no.2-2024-Physics-driven-feature-ML-models-performance-oil-production-prediction.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper aims to develop a machine learning-based model for oil production rate prediction. The significance of feature dimension reduction is addressed by applying well-established approaches like Principal Component Analysis (PCA) and the proposed physics-driven feature creation technique. The physics-driven features, derived from experience or analytical modeling, introduce physical relevance and improve model quality. The study focuses on oil production prediction using a dataset that includes reservoir permeability, wellbore skin, reservoir pressure, net pay thickness, water cut, and well-liquid production rate. Several machine learning techniques, such as SVM, k-NN, Decision Tree, Random Forest, and linear regression, were constructed using PCA feature selection. The models were tuned and validated using k-fold cross-validation. The same models were then built using physics-driven features, and their performance metrics were compared. The results show significant improvement when applying the proposed physics-driven feature creation, compared to PCA. Over 10-fold cross-validation, PCA improved the R² performance metric by 10% (from 70% to 77%), while physics-driven features increased it by 20% (from 70% to 90% on average). The Random Forest and linear regression models outperformed the others, particularly when built based on physics-driven features. Additionally, models based on physics-driven features exhibited less sensitivity to data splits for learning and testing, proving more reliable with better performance metrics compared to those using original features. |
---|---|
ISSN: | 2734-5319 2972-0370 |