Yeo-Johnson Transformation Usage in Data Preprocessing for Well Production Prediction Using Deep Neural Networks (DNN)

Alringga Rizky; Anny Yuniarti

doi:10.59261/jbt.v7i2.607

Authors

Alringga Rizky Institut Teknologi Sepuluh Nopember
Anny Yuniarti Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.59261/jbt.v7i2.607

Keywords:

data preprocessing; deep neural networks (DNN), tree-structured parzen estimator (TPE), well production prediction, Yeo-Johnson transformation

Abstract

Background: The accurate prediction of infill well production is one of the major bottlenecks for hydrocarbon reservoir development. Traditional reservoir simulation tools are computationally expensive, taking weeks to months per scenario.

Objective: This paper presents the development of a Deep Neural Network (DNN) model for prediction with hyperparameter optimization using the Tree-structured Parzen Estimator (TPE) to predict pay porosity (PORPAYX) in infill wells of the Pertamina Hulu Sanga Sanga field.

Methods: A DNN model was developed to predict oil well production based on subsurface and production features from a comprehensive dataset of Pertamina Hulu Sanga Sanga reservoir characteristics and production data. Details of our method include: training the model on a robust dataset, hyperparameter tuning using the Tree-structured Parzen Estimator (TPE), and K-fold cross-validation for performance validation.

Results: Scaling normalized the data in such a way that every feature had equal influence during model training, enabling better learning and accurate prediction. In contrast, fitting the model using unscaled data resulted in an R² of less than zero (a negative score), meaning that the model could not explain the variability in the data. The mean R² score of the unscaled data model was −0.08496, along with a higher MSE = 0.009057 and RMSE = 0.095148. This was due to the model's failure to process features with varying scales, which prevented proper learning and prediction.

Conclusion: Residual plots confirmed that the model trained with scaled data met the assumptions of linearity and normality.

References

Abdrakhmanov, I. R., Kanin, E. A., Boronin, S. A., Burnaev, E. V., & Osiptsov, A. A. (2021). Development of deep transformer-based models for long-term prediction of transient production of oil wells. SPE Russian Petroleum Technology Conference, D021S006R008.

Abed, S., & Alshayeji, M. H. (2026). Hybrid XGBoost-LSTM Model for Oil Well Production Forecasting Using the Volve Field Dataset. Results in Engineering, 110128.

Alharbi, R., Alageel, N., Alsayil, M., Alharbi, R., & Alhakamy, A. (2022). Prediction of oil production through linear regression model and big data tools. International Journal of Advanced Computer Science and Applications, 13(12).

Alibrahim, H., & Ludwig, S. A. (2021). Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization. 2021 IEEE Congress on Evolutionary Computation (CEC), 1551–1559.

Bahrami, N., Pena, D., & Lusted, I. (2016). Well test, rate transient analysis and reservoir simulation for characterizing multi-fractured unconventional oil and gas reservoirs. Journal of Petroleum Exploration and Production Technology, 6(4), 675–689.

Chaki, S., Zagayevskiy, Y., Shi, X., Wong, T., & Noor, Z. (2020). Machine learning for proxy modeling of dynamic reservoir systems: deep neural network DNN and recurrent neural network RNN applications. International Petroleum Technology Conference, D022S152R002.

Chu, M., Min, B., Kwon, S., Park, G., Kim, S., & Huy, N. X. (2020). Determination of an infill well placement using a data-driven multi-modal convolutional neural network. Journal of Petroleum Science and Engineering, 195, 106805.

Davtyan, A., Rodin, A., Muchnik, I., & Romashkin, A. (2020). Oil production forecast models based on sliding window regression. Journal of Petroleum Science and Engineering, 195, 107916. https://doi.org/10.1016/j.petrol.2020.107916

Gramacki, A. (2017). Kernel density estimation. In Nonparametric kernel density estimation and its computational aspects (pp. 25–62). Springer.

Han, D., & Kwon, S. (2021). Application of machine learning method of data-driven deep learning model to predict well production rate in the shale gas reservoirs. Energies, 14(12), 3629.

Hu, Y., Xin, X., Yu, G., & Deng, W. (2025). Deep insight: an efficient hybrid model for oil well production forecasting using spatio-temporal convolutional networks and Kolmogorov–Arnold networks. Scientific Reports, 15(1), 8221.

Jia, J., Li, D., Wang, L., & Fan, Q. (2024). Novel Transformer-based deep neural network for the prediction of post-refracturing production from oil wells. Advances in Geo-Energy Research, 13(2), 119–131.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

Malakouti, S. M., Menhaj, M. B., & Suratgar, A. A. (2024). Applying grid search, random search, Bayesian optimization, genetic algorithm, and particle swarm optimization to fine-tune the hyperparameters of the ensemble of ML models enhances its predictive accuracy for mud loss. https://doi.org/10.33395/owner.v9i4.2828

Mantovani, R. G., Rossi, A. L. D., Vanschoren, J., Bischl, B., & De Carvalho, A. C. (2015). Effectiveness of random search in SVM hyper-parameter tuning. 2015 International Joint Conference on Neural Networks (IJCNN), 1–8.

Padmanabhan, S. B. (2025). Combining physics-based models and machine learning for multi-energy system modeling. Ecole nationale supérieure Mines-Télécom Atlantique.

Rahmanifard, H. (2024). Production Forecasting in Unconventional Reservoirs: A Workflow for Data-Driven Analysis.

September, M. A. K., Passino, F. S., Goldmann, L., & Hinel, A. (2024). Extended deep adaptive input normalization for preprocessing time series data for neural networks. International Conference on Artificial Intelligence and Statistics, 1891–1899.

Smith, L. N. (2018). A disciplined approach to neural network hyper-parameters: Part 1--learning rate, batch size, momentum, and weight decay. ArXiv Preprint ArXiv:1803.09820.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

Watanabe, S. (2023). Tree-structured parzen estimator: Understanding its algorithm components and their roles for better empirical performance. ArXiv Preprint ArXiv:2304.11127.

Zhang, H., Wang, J., & Zhang, H. (2016). Investigation of the main factors during shale-gas production using grey relational analysis. The Open Petroleum Engineering Journal, 9(1).