Optimizing initial oil production of horizontal Wolfcamp wells utilizing data analytics


Machine learning techniques have fundamentally altered how oil and gas industry practitioners design fracture operations. In this paper, we perform data analytics utilizing response surface methodology (RSM), a group of statistical techniques that develop a functional relationship between an output variable of interest and several associated input variables, to optimize the output. We apply RSM to optimize horizontal well production based on initial production (IP) of horizontal oil wells for 180 days (IP180 Oil), as a function of five input variables: reservoir type, fracturing fluid (gal/ft), proppants (lbm/ft), cluster spacing, and stage length (ft). The RSM model correlates the initial production of each well to the input variables via a single equation, thus allowing for exploration of the fitted response surface in order to maximize production. Although the choice of the five inputs is made based primarily after consultation with industry professionals, we validate our selection by also applying an assortment of data-analytics-based methods that attempt to rank variable importance and thereby identify completion variables that may be predictive of initial production. The findings rank all five variables above the 50th percentile, thus indicating that the chosen variables have merit. This procedure is applied to a dataset of 201 horizontal wells from the Wolfcamp formations. The model fits reasonably well, with R2 = 61%, a very significant F-statistic p value, and a predicted versus observed values scatterplot indicating a good fit. The RSM analysis suggests that, within the feasible space defined by this dataset, maximum values of IP 180 Oil may be obtained by setting the fracturing fluid in gal/ft at approximately 1972, while simultaneously maximizing the remaining input variables (proppant loading, cluster spacing, stage length). The outcome indicates the possible directions to be taken in seeking a global optimum initial production for the setting of completion variables. Iteration of this scheme may lead to a near-optimum global solution. The real utility of this work may be indicating the way different studies may be designed to optimize production, each with its own selection of inputs, and ultimately be combined in a meta-analysis.


Challenges to oil production in the Wolfcamp range from low productivity and quick decline to high variability in the response of wells due to fracture treatment. Initial production of wells is used as a measure to assess well performance and completion effectiveness, and to compare wells of a given company to those of its competitors, and production performance of one shale play to that of another. However, the challenge remains to determine which of these initial production metrics should be used. The O & G industry recognizes peak rate (the starting point of production decline); initial production (IP30) as a rolling average for 30-day production; and IP60, IP90, IP180, and IP365 as a measure of production performance. Some schools of thought use cumulative production in the first months along with EUR to judge performance and profitability of oil wells.

A key question is, what are production metrics?

Production metrics may be represented in initial production (IP) IP30, through IP365 initial production for a rolling average of 30 through 365 days; cumulative production for the first number of days or months of the life of the well, and Estimated Ultimate Recovery (EUR). The production metrics listed in Table 1 show benefits and challenges associated with each.

Table 1 Oil and gas production metrics

The IP rate of a well is established based on well type; reservoir considerations; historical and analogous performance; management and commercial strategy; operational considerations; target and economic reserves, and nodal analysis results (Olufemi and Ambastha 2017).

Empirical techniques to forecast oil production include decline curve analysis and capacitance/resistance modeling (CRM). CRM is a data-driven method for characterizing conventional reservoirs and optimizing oil production without complex reservoir simulations (Yousef et al. 2006).

Recent industry efforts established multistage fracture design optimization for unconventional shale oil and gas basins to locate the number of fracture stages that maximize NPV (Rashid et al. 2014). The effect of both spacing and perforation friction on the propagation of concurrent hydraulic fractures from a horizontal well was studied (Izadi et al. 2015). The optimization indicated that while limited-entry perforating could help equalize the lengths of fractures simultaneously growing from multiple clusters, it did not equalize their widths, but equalized placement of proppant across the entire series of perforation clusters.

The data-based technique, which relies on empirical correlations, also correlates the well production to reservoir characteristics. Due to the currently used downhole tool restrictions currently in place, multistage fracturing is often conducted consecutively, stage by stage from well toe to heel without real optimization (Xia et al. 2016; Roussel and Sharma 2011). Each stage usually contains several perforation sets, and thus, several fractures propagate simultaneously if each perforation set produces only one fracture (Wu and Olson 2014).

There difference is significant between data-driven reservoir modeling and empirical technologies. First, a model with a good predictivity is necessary; then internal interactions between the variables can be checked to see if the model makes sense. The physical and geological interactions can tell the prediction consistency of the built model (Mohaghegh 2017). A carefully worked-out effort to develop the unconventional reservoirs may change the trend of manufacturing approach (uniform spacing of wells). Utilization of stimulation software such as “Kinetix-Intersect” and comparison of the modeling outcome of fracture simulator for a selected fracture stage and rate-transient analytical results led to variable Enhanced Oil Recovery (EOR) (Rodriguez 2019). Acknowledgment of variable recovery and pressure depletion associated with the producers’ drainage volume should be the basis for the planning of EOR techniques (Rodriguez 2019). Spacing and placement of perforation clusters, stress distribution, and fracture mechanics are the most important input variables of production performance (Cheng 2012). A case study implemented an engineering approach to optimize multiple sets of parallel horizontal wells in the oil segment of the Eagle Ford Shale. The data was used to design stimulation-stage intervals and evaluate surface injection responses during fracture treatments (Stegent et al. 2013).

The correlation between the style of completions and the initial production rate is well established (Griffin et al. 2013; Lafollette et al. 2012a, b). Initial production is a crucial metric to measure profitability of wells. Estimated ultimate recovery of wells (EUR) usually increases with the number of stages. As the number of fracture stages increases, the efficiency of incremental stages decreases in the Bakken Shale formation (Ran and Kelkar 2015). Beyond a certain number, therefore, the incremental cost would exceed the incremental benefits. All the efforts led to a good understanding of the effect of individual variables on production without agreement on an optimum methodology. Microseismic monitoring in the unconventional monitoring has proven to be highly effective in many field runs (McClure 2012).

Table 1 lists the benefits and challenges of commonly used production metrics.

Why do O&G operators need production metrics?

  • To measure performance of new wells versus old wells.

  • To compare their own wells versus competitors’ wells, shales, plays.

  • To correlate production metrics with the EUR of wells.

Hydraulic fracture operations lead to an increase in production. This increase may depend on various metrics (shown in Table 2): completed length of the horizontal well; quantity of perforations; number of clusters per stage; cluster spacing in length; size of proppant in lbm; concentration of proppant in fluid; fluid in gal/ft; fluids in bbls; average injection rate; fluid type, and proppant type and size, along with production metrics (e.g., Initial Production, Yield).

Table 2 Range of completion parameters used to develop the model

The data ranges used to develop the model are summarized in Table 2, which also represents completion variables that might affect initial production performance of horizontal wells.

The paper is organized as follows:

  • Description of problem

  • Data analytics methodology to select optimum completion variables

  • Development of model

  • Testing and validation of model.

For lack of standard metrics, optimum completion variables are determined by trial and error approach (e.g., amount of fluid and proppant pumped per completed foot, cluster spacings and fracture stage length). One unanswered question is optimum number of completion variables needed per well for highest initial production. Another area of uncertainty is the degree to which each metric affects estimated ultimate recovery (EUR) from an individual well. One school of thought suggests the more, the better. Another suggests that fracture length and dimensionless fracture conductivity are the two primary variables that control the productivity index of a fractured well and thus initial production. For years, industry researchers have applied accepted methods to estimate post-fracture production increase. These approaches depend on a variety of fracture lengths and fracture conductivities. The procedures involve a combination of propped fracture lengths and conductivities that optimize post-fracture production response.

Graphical solutions rely on the interdependence between formation permeability, half-length of the fracture, and effective conductivity \(\left( {K_{\text{f}} \times w_{\text{fp}} } \right)\). Veatch et al. (2017) showed five commonly used charts for predicting semi-steady-state post-fracture production fold of increase, i.e., the ratio of production rate post-fracture cleanup to pre-fracture rate. The methods include Prats; McGuire and Sikora; Holditch; Tinsley et al., and Tannich and Nierode graphs. These charts may provide guidelines for determining approximate treatment size (fluid systems and proppants), injection rates and proppant staging schedules (Veatch et al. 2017). Economic analysis based on IP alone can be misleading for gas producers considering lateral placement in the reservoir at the bottom or midway (Taylor et al. 2011). On the other hand, ultimate production (EUR) from Marcellus wells was precisely predicted using initial well production (Male et al. 2016). The wells in the study, which showed a boundary-dominated flow, were used to forecast EUR of 5275 wells in Pennsylvania and West Virginia over a period of 25 years.

This paper focuses on use of data from the Wolfcamp play of the Permian Basin to introduce an effective model to predict initial production as a production performance metric from completion variables. The work utilizes data analytics for effective selection of key completion variables that lead to optimum IP. For a list of various analytics tools used in unconventional reservoirs and development, see Table 6 in “Appendix 2”.

As far as we know, no simple model is available to estimate production metrics utilizing key completion variables such as fluid in gal/ft, cluster spacing, proppant in Lbs./ft, reservoir type and stage length of horizontal wells. For this paper, we are optimizing the initial production of oil for 180 days (IP 180 Oil).


In this section, we first describe data sources, as well as the selection of input variables for optimizing IP 180 Oil. Summary measures commonly used in the machine learning literature to rank the relative merits of predictor variables are computed. Finally, a quadratic response surface model is fitted.

Data and nomenclature

The dataset consisted of measurements taken from 201 horizontal wells. The 209 originally listed variables of particular interest were analyzed. A list of 32 variables was chosen as given in Table 3. From this list, the following variables were chosen after the analyses described in the next subsection.

Table 3 List of chosen 32 variables
  1. 1.

    Oil: natural logarithmic of IP 180 of Oil (output variable). Range of values: 4.860–8.011.

  2. 2.

    Reservoir type: reservoir name (acts as blocking factor with 9 levels). Levels: WOLFCAMP, WOLFCAMP A, WOLFCAMP C, WOLFCAMP D, WOLFCAMP D4, WOLFCAMP LA, WOLFCAMP MA, WOLFCAMP UA, WOLFCAMP UAY.

  3. 3.

    Fluid.gal/ft: amount of fluid pumped per completed foot. Range: 643–2687.

  4. 4.

    Prop.lbs/ft: amount of proppant pumped per completed foot. Range: 724–2839.

  5. 5.

    Cluster spacing: distance between the clusters within the stage. Range: 13–132.

  6. 6.

    Stage length: length of each stage. Range: 156–404.

Selection of input variables

The selection of the five input variables discussed above is based primarily on expert opinion after extensive consultations with industry professionals. As a form of validation, we sought also a variety of purely machine-learning-based methods to rank the relative variable importance in predicting IP 180 Oil. The Random Forest Package in R, which fits regression tree predictive models, provides two such measures (Liaw and Weiner 2018): %IncMSE (Fig. 1) and IncNodePurity (Fig. 2). The third measure is the absolute value of the (Pearson) correlation coefficient of each variable with IP 180 Oil (Fig. 3). Note that, unlike the regression tree-based methods, correlations cannot be calculated between a categorical variable (in this case reservoir type) and a quantitative output.

Fig. 1

The first measure of importance calculated from regression trees

Fig. 2

The second measure of importance calculated from regression trees

Fig. 3

Abs correlations of IP 180 Oil with each of the 30 variables used in the study

We note that the five selected input variables, although not at the very top of these various rankings, do come out as being important. Namely: Reservoir is ranked 3rd in Fig. 1; Fluid gal/ft is ranked 12th in Fig. 3; Prop.lbs/ft; is ranked 7th in Fig. 3; Cluster spacing is ranked 9th in Fig. 3; Stage length is ranked 19th in Fig. 3.

Fitting a response surface model

Response surface methodology (RSM) is a group of statistical procedures used in the development of a flexible functional relationship between a relevant output variable, y, and a number of related input variables, \(x_{i}\), with a view toward optimization of the output (Khuri and Mukhopadhyay 2010). If we have just one categorical input variable \(x_{0}\) with L levels, and k numerical input variables \(x_{1} , \ldots ,x_{k}\), such a relationship is usually approximated by a low-degree polynomial model of the form (Eq. 1) listed in “Appendix 4”. Since the model is of the form of a multiple linear regression, it is straightforward to estimate the variables via least-squares and assess its goodness of fit. Denoting these estimates by \(\widehat{B}_{i }\) and \(\widehat{B}_{i,j }\), the resulting fitted model at a given value of the vector of input variables x is given by Eq. 2 listed in “Appendix 4”, which is the optimal predictor of y at x (in the least-squares sense).

The main goal of RSM is determination of the optimum model-predicted output ŷ(x), over all x in the data region D. As a result and for our data, this region would be the 4-dimensional rectangle delineated by the intervals (see above summary statistics):

$$\begin{aligned} 643 & < {\text{fluid}}.{\text{gal}}/{\text{ft}} < 2687 \\ 724 & < {\text{prop}}.{\text{lbs}}/{\text{ft}} < 2839 \\ 13 & < {\text{cluster}}\;{\text{spacing }} < 132 \\ 156 & < {\text{stage}}\;{\text{length}} < 404 \\ \end{aligned}$$

For ease of interpretation, we normalize these variables by subtracting the midpoint of the interval and dividing the result by the interval half-width:

$$\begin{aligned} x_{1} & = \frac{{{\text{fluid}}.{\text{gal}}/{\text{ft}} - 1665}}{1022} \\ x_{2} & = \frac{{{\text{prop}}.{\text{lbs}}/{\text{ft}} - 1781.5}}{1057.5} \\ x_{3} & = \frac{{{\text{cluster}}\;{\text{spacing}} - 72.5}}{59.5} \\ x_{4} & = \frac{{{\text{stage}}\;{\text{length}} - 280}}{124} \\ \end{aligned}$$

whence the (normalized) data region \(D_{\text{norm}}\) becomes the 4-dimensional cube delineated by the intervals \(- 1 < x_{i} < 1\), for i = 1,…,4. On this normalized scale, the fitted model coefficients are given in Table 5 of “Appendix 1”.

This model fits reasonably well when applied to our dataset with a (typical) quadratic order polynomial, resulting in \(R^{2} = 61\%\) and an F-statistic p value < 0. The predicted versus observed value plot in Fig. 4 shows relatively good in-sample predictions.

Fig. 4

Visualization of the model fit, predicted versus measured log (IP 180 Oil) of the Wolfcamp horizontal wells. The diagonal line indicates where the predicted and observed log IP 180 Oil values would be equal

Results and discussion

The selection of an appropriate pool of input (or predictor) variables is a perennial problem in supervised machine learning applications. One may begin to understand the difficulty of the issue by realizing that the exact functional relationship between a predictor and the outcome of interest is (and forever will be) unknown. Moreover, predictors are typically correlated, thus further confounding the issue of what is predicting what. And how should one define “appropriate”? In classical regression analysis (Neter et al. 2004), one can begin to untangle these inter-related issues by measuring the decrease in prediction error variance (the so-called MSE or “mean squared error”) when each variable by itself is added to the model in turn: important predictors should lead to large decreases in MSE. But this is not the full story, as the results depend on assumed functional relationships (typically linear), and the amount of “dependence” (correlation) among the predictors.

The selection of the five input variables listed in the previous section that were chosen for this study, was based primarily on expert opinion after extensive consultations with industry professionals. As a form of validation, the three metrics displayed in Figs. 1, 2 and 3 provide different instances of purely machine-learning-based rankings of importance. The two tree-based measures, %IncMSE (Fig. 1) and IncNodePurity (Fig. 2), consist of drop-in-MSE type of metrics that use frequency of splits along the variable to determine rank (see “Appendix 3” for a summary). The measure in Fig. 3 is the usual correlation coefficient. A plethora of other measures could be used, e.g., one of the many statistical variable selection information-based criteria, such as AIC or BIC, in connection with the fitted RSM model. Although the five inputs we chose are not the top 5 by any of these measures, they do rank in the top 50th‰ according to all of them. In any case, the aim of the present study is not to definitively “solve” this difficult ranking problem, but rather to sketch out a procedure for optimizing production that could be implemented following the selection of a few inputs based on pragmatic choices borne out of field experience.

Translation of this optimization goal into the RSM of “Methodology” section, converts into determination of the maximum model-predicted output ŷ(x), over all x in the data region D. Interestingly, the optimal x is obtained by maximizing all the variables, except fluid in gal/ft as shown in Table 4. Note that different reservoir types simply act as an intercept in the model, so that anything beyond WOLFCAMP increases this value; e.g., the largest increase is 1.0436 for WOLFCAMP UAY (see Table 5 of “Appendix 1”). Thus the location of the optimal point is unaffected by reservoir type, and all discussion and plots pertain to WOLFCAMP.

Table 4 Coordinates of the optimum point in the data region
Table 5 Model coefficents

Figure 5 shows contour plots of ŷ(x) in the vicinity of this optimal point (red dot). The variables not displayed on the axes are fixed at their optimal point values. Overlaid are the data points displayed in blue. The top left panel would seem to indicate that the optimal point is in the vicinity of the data, an error that is dispelled in the bottom left panel where the red dot is actually seen to be quite far from the data cloud.

Fig. 5

Contour plots of log (IP 180) versus 2D-slice of various input variables

Finally, Fig. 6 is a zoomed-out version of Fig. 5 plotted on the normalized variable scale. The data region \(D_{\text{norm}}\) is displayed as a red square, where the optimal point is still the red dot.

Fig. 6

Contour plots of normalized log (IP 180) versus 2D-slice of various input variables

From this we can see that the red dot is in a flat region, thereby suggesting that no dramatic improvements appear to be feasible beyond this point. (The fact that the global optimal solution simultaneously sets all inputs at infinity is not of interest here).


This paper devises an optimization procedure to maximize the initial production of horizontal wells considering a pool of reservoir variables, summarized as follows:

  1. 1.

    Choose an initial oil production metric, e.g., IP 180 Oil.

  2. 2.

    Select a pool of “important” predictors/variables, e.g., number of days in production; horizontal well completion configurations; proppant size; fluid type; stages count, and lateral completed interval, etc. This selection could (perhaps ideally) be made by combining pragmatic field experience with state-of-the art data analytics.

  3. 3.

    Use RSM to model the functional relationship between the pool of predictors and the metric.

  4. 4.

    Based on the fitted RSM model, determine the optimum model-predicted output ŷ(x), over all x in the feasible data region.

Response surface methodology suggests that, within the feasible space defined by this dataset, maximum values of IP 180 Oil may be obtained by setting fluid in gal/ft at approximately 1972, while simultaneously maximizing the remaining inputs (prop in lbs/ft, cluster spacing, stage length).

Note that this statement is not to be taken as absolute truth, but rather as suggestive of possible directions to be taken in seeking a global optimum value for the setting of inputs. More refined estimates may be obtained by collecting new data in the vicinity of this optimum, followed by a new analysis. Iteration of this scheme may lead to a near-optimum global solution.

The value of the new model proposed in the paper lies in its simplicity and the lower number of input variables needed for a practicing engineer to optimize oil production from shale horizontal wells. More importantly, it is based on completion design variables that may be chosen relatively quickly. It is also possible that the IP 180 Oil calculated from this model can be correlated with EUR.

It is projected that the calculated IP 180 Oil from this model may be used in several applications. First, it may work as a comparative scale to measure well performance of a key play versus that of a competitor. Second, it may be used as a tool to enable quick decision on certain completion variables needed for horizontal wells, which can facilitate further estimation of fluids gal/ft, proppants lbs/feet, cluster spacing and fracture stage length. Third, in case of multiple leases with a variety of IPs, the model may work as a quality check tool to optimize production of a lease. Estimation of IP will be used to differentiate among different reservoirs (Wolfcamp A, B, C, D, MA and LA).

Conclusion and recommendation

The objective of the paper was to propose a methodology for identifying the values of key completion variables that may lead to optimum production of horizontal wells. The study proceeded as follows.

  • Extensive literature review to identify all production metrics, understand the relationships between them, compare performance, and suggest which may be used.

  • Description of the problem.

  • The usage of data analytics methodology to select optimum values of key completion variables, after elimination of non-effective variables such as Avg. Rate (bpm) from the available pool.

  • Testing and validation of the procedure on publicly available data.

The following conclusions may be drawn.

  1. 1.

    Optimal production may be obtained by maximizing with respect to some pairs of variables (e.g., cluster spacing and proppants).

  2. 2.

    Maximum IP over domain is obtained by maximizing all the variables (e.g. proppants lbs/ft of 2839, cluster spacing of 132 ft, and stage length of 404 ft), except fluid in gal/ft.

  3. 3.

    The proposed model may be used to predict initial production of horizontal wells of the Wolfcamp using few input completion variables with reasonable accuracy.

  4. 4.

    Additional data might benefit the work for further testing of global areas of shale plays.


\(K_{\text{f}}\) :

Fracture permeability

\(w_{\text{fp}}\) :

Propped fracture closed width


The amount of oil produced by a new oil well, measured in B/D (barrels of oil per day) or BOE/D (barrels of oil equivalent per day)

Proppant Lbs.:

Mass of proppants


The difference in number of days between the time the well was completed and 01/01/2001

Avg. Prop. Concentration:

The lbs of proppant pumped per fluid gallons pumped

Avg. Rate:

The average rate at which the mixture is pumped downhole to create the fractures

Cluster spacing:

The distance between the clusters within the stage

Clusters per stage:

The number of clusters used in each stage


A set of perforations arranged in a certain pattern to achieve the optimal completion

Comp Date:

The date the well was completed

Completed feet:

The calculated distance between the top perf and bottom perf where the fractures occur

County Variable:

A numeric variable to distinguish between counties


The county where the well was drilled

Fluid Bbls:

The amount of fluid pumped downhole to initiate fracture and place proppant

Fluid Gal/Cluster:

The amount of fluid pumped per cluster

Fluid Gal/Ft.:

The amount of fluid pumped per completed foot

Fluid Gal/Perf:

The amount of fluid pumped per perforation


Initial production rates

IP 30, 90, & 180:

Calculations taking a rolling average by number of days described (30, 90, or 180) and then using the maximum value obtained for oil, gas and water


Instantaneous shut-in pressure once a stage has been completed and frac pumps are shut down

Linear Gel:

A fracturing fluid supplemented with different polymers which increase its ability to carry proppant

Max Prop. Concentration:

The proppant concentration begins with mostly fluid and then is built up to a maximum concentration of pounds of proppant per fluid gallon

Max. Rate:

The maximum rate achieved pumping mixture into fractures

Number of stages:

Whenever a plug is set, perforations are created, the reservoir is fractured with fluid, and then another plug is set, it is called a stage


The number of perforations used in each cluster


The number of holes created from the charges that are inserted downhole in sets of perforations called clusters designed in different patterns to achieve optimal completion

Prop. Lbs./Cluster:

The amount of proppant pumped per cluster

Prop. Lbs./Ft.:

The amount of proppant pumped per completed foot

Prop. Lbs./Perf:

The amount of proppant pumped per perforation

Prop. Lbs.:

The amount of proppant pumped with the fluid to keep the created fractures open


The average rate per cluster


The average rate per completed foot


The average rate per perf

Fluid Gal/Ft.:

Fluid volume in gallons per foot


The formation in which the lateral was drilled (i.e., Wolfcamp A, Wolfcamp LA, Wolfcamp MA)

Reservoir Variable:

A numeric variable distinguishing between reservoirs Wolfcamp C–D and Wolfcamp A


The formation that the lateral was drilled (i.e., Wolfcamp A, Wolfcamp LA, Wolfcamp MA)


Water with chemicals added to speed up the rate at which it can be pumped to create more fractures

Stage length:

The length of each stage, a good indicator for normalizing stages for lateral length


The furthest vertical depth drilled


A unique well identifier for every well; each set of numbers stands for a unique location and well identifier (i.e., county, state, horizontal, pilot)

GOR Ratios:

A metric used to determine the amount of gas produced per oil produced (SCF/STB)


Condensate yield, MMSCF/STB


Water cut; a metric used to determine the amount of water produced per oil produced. (Water/(Water + Oil))

Oil, Gas, MBOE EUR:

The estimated ultimate recovery of oil and gas, MBOE − Oil + Gas/6


The amount of overlap divided by total rectangular area


  1. Bakshi A, Uniacke E, Korjani M (2017) A novel adaptive non-linear regression method to predict shale oil well performance based on well completions and fracturing data. In: SPE-185695-MS presented at the SPE western regional meeting. Bakersfield, California, USA, pp 23–27. https://doi.org/10.2118/185695-MS

  2. Bolen M, Crkvenjakov V, Converset J (2018) The role of big data in operational excellence and real time fleet performance management—the key to deepwater thriving in a low-cost oil environment. Society of Petroleum Engineers. https://doi.org/10.2118/189603-MS

  3. Cheng Y (2012) Impacts of the number of perforation clusters and cluster spacing on production performance of horizontal shale gas wells. SPE Reserv Eval Eng 15(01):31–40. https://doi.org/10.2118/138843-PA

    Article  Google Scholar 

  4. Feblowitz J (2013) Analytics in oil and gas: the big deal about big data. Society of Petroleum Engineers. https://doi.org/10.2118/163717-MS

  5. Gaurav A (2017) Horizontal shale well EUR determination integrating geology, machine learning, pattern recognition and multivariate statistics focused on the Permian Basin. Society of Petroleum Engineers. https://doi.org/10.2118/187494-MS

  6. Grieser WV, Shelley RF, Johnson BJ, Fielder EO, Heinze JR, Werline JR (2006) Data analysis of Barnett shale completions. Society of Petroleum Engineers. https://doi.org/10.2118/100674-PA

  7. Griffin LG, Pearson CM, Strickland S, McChesney J, Wright CA, Mayer J,Weijers L (2013) The value proposition for applying advanced completion and stimulation designs to the Bakken Central Basin. Society of Petroleum Engineers. https://doi.org/10.2118/166479-MS

  8. Gupta S, Fuehrer F, Jeyachandra BC (2014) Production forecasting in unconventional resources using data mining and time series analysis. In: SPE-171588-MS presented at the SPE/CSUR unconventional resources conference, 30 September–2 October, Calgary. https://doi.org/10.2118/171588-MS

  9. https://www.investopedia.com/terms/e/estimated-ultimate-recovery.asp

  10. https://www.verdazo.com/blog/what-production-performance-measure-should-i-use/

  11. Izadi G, Settgast R, Moos D, Baba C, Jo H (2015) Fully 3D hydraulic fracture growth within multi-stage horizontal wells. International Society for Rock Mechanics and Rock Engineering

  12. Jaripatke OA, Barman I, Ndungu JG, Schein GW, Flumerfelt RW, Burnett N, Bello HD, Barzola GJ (2018) Review of permian completion designs and results. Soc Pet Eng. https://doi.org/10.2118/191560-MS

    Article  Google Scholar 

  13. Johnston J, Guichard A (2015) New findings in drilling and wells using big data analytics. In: Offshore technology conference. https://doi.org/10.4043/26021-MS

  14. Khuri AI, Mukhopadhyay S (2010) Response surface methodology. Wiley Interdiscip Rev Comput Stat 2(2):128–149

    Article  Google Scholar 

  15. Lafollette R, Holcomb WD, Aragon J (2012a) Impact of completion system, staging, and hydraulic fracturing trends in the Bakken formation of the Eastern Williston Basin. Society of Petroleum Engineers. https://doi.org/10.2118/152530-MS

  16. Lafollette R, Holcomb WD, Aragon J (2012b) Practical data mining: analysis of Barnett shale production results with emphasis on well completion and fracture stimulation. Society of Petroleum Engineers. https://doi.org/10.2118/152531-MS

  17. Lehman LV, Jackson K, Noblett B (2016) Big data yields completion optimization: using drilling data to optimize completion efficiency in a low permeability formation. Soc Pet Eng. https://doi.org/10.2118/181273-MS

    Article  Google Scholar 

  18. Liaw A, Weiner M (2018) “Randomforest”: Breiman and Cutler’s random forests for classification and regression (R package version 4.6-14). R Foundation for Statistical Computing, Vienna

    Google Scholar 

  19. Male F, Marder MP, Browning J et al (2016) Marcellus wells’ ultimate production accurately predicted from initial production. In: SPE-180234-MS paper presented at the SPE low perm symposium held in Denver, Colorado, USA

  20. Male F, Aiken C, Duncan IJ (2018) Using data analytics to assess the impact of technology change on production forecasting. Society of Petroleum Engineers. https://doi.org/10.2118/191536-MS

  21. McClure MW (2012) Modeling and characterization of hydraulic stimulation and induced seismicity in geothermal and shale gas reservoirs. Ph.D. thesis. Stanford University, Stanford, California

  22. Mohaghegh SD (2017) Data-driven reservoir modeling: top-down modeling (TDM): a paradigm shift in reservoir modeling, the art and science of building reservoir models based on field measurements. Society of Petroleum Engineers, Richardson

    Google Scholar 

  23. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (2004) Applied linear statistical models, 5th edn. McGraw-Hill/Irwin, New York

    Google Scholar 

  24. Olaoye O, Zakhour N (2018) Utilizing a data analytics workflow to optimize completion practices while leveraging public data: a Permian Basin case study. Society of Petroleum Engineers. https://doi.org/10.2118/191416-MS

  25. Olufemi A, Ambastha A (2017) Initial production rate estimation: impact on recovery and interplay of recovery process. In: SPE-189078-MS paper presented at the Nigeria annual international conference and exibition held in Lagos, Nigeria

  26. Pankaj P, Geetan S, MacDonald R, Shukla P, Sharma A, Menasria S et al (2018) Need for speed: data analytics coupled to reservoir characterization fast tracks well completion optimization. In: SPE Canada unconventional resources conference. Society of Petroleum Engineers, Alberta, Canada

  27. Passman A (2018) Modeling and data analytics workflow from delineation to development: a case study utilizing limited data in the extensional deep dry Utica shale play of Pennsylvania. Society of Petroleum Engineers. https://doi.org/10.2118/191468-MS

  28. Ran B, Kelkar M (2015) Fracture stages optimization in Bakken shale formation. In: Unconventional resources technology conference, no. July: 20–22. https://doi.org/10.15530/URTEC-2015-2154796

  29. Rashid K, Bailey W, Couet B, Hudgens P, Heim R (2014) Sector-wide multi-stage hydraulic fracture optimization. Society of Exploration Geophysicists

  30. Rodriguez A (2019) Inferences of two dynamic processes on recovery factor and well spacing for a shale oil reservoir. Society of Petroleum Engineers. https://doi.org/10.2118/197089-MS

  31. Roussel NP, Sharma MM (2011) Strategies to minimize frac spacing and stimulate natural fractures in horizontal completions. In: SPE annual technical conference and exhibition, 30 October–2 November, Denver, Colorado, USA. https://doi.org/10.2118/146104-MS

  32. Schuetter J, Mishra S, Zhong M, LaFollette R (2015) Data analytics for production optimization in unconventional reservoirs. In: Unconventional resources technology conference. https://doi.org/10.15530/URTEC-2015-2167005

  33. Shelley RF, Lehman LV, Shah K (2011) Survey of more than 1,000 fracture stage database with net pressure in the Barnett shale. Part 2. Society of Petroleum Engineers. https://doi.org/10.2118/143330-ms

  34. Stegent NA, Wagner A, Stringer C, Tompkins R, Smith N (2013) Engineering approach to optimize development strategy in the oil segment of the Eagle Ford shale: a case study. Society of Petroleum Engineers. https://doi.org/10.2118/158846-PA

  35. Taylor RS, Barree R, Aguilera R (2011) Why not to base economic evaluations on initial production alone. In: SPE 148680 paper presented at the Candian unconventional resources conference held in Calgary, Alberta, Canada

  36. Veatch RW, King GE, Holditch SA (2017) Essentials of hydraulic fracturing: vertical and horizontal wellbores. PennWell Corporation, Tulsa

    Google Scholar 

  37. Wu K, Olson JE (2014) Simultaneous multifracture treatments: fully coupled fluid flow and fracture mechanics for horizontal wells. SPE J 20(2):334–346. https://doi.org/10.2118/167626-PA

    Article  Google Scholar 

  38. Xia K, Fonseca E, Jones R, Zhan L (2016) A new perspective on multistage stimulation of multiple horizontal wells. In: The 50th US rock mechanics/geomechanics symposium, 26–29 June, Houston, Texas, USA

  39. Yousef AA, Gentil PH, Jensen JL, Lake LW (2006) A capacitance model to infer interwell connectivity from production and injection rate fluctuations. SPE Reserv Eval Eng J 9(5):630–646. https://doi.org/10.2118/95322-pa

    Article  Google Scholar 

Download references


The authors would like to acknowledge the kind assistance through fruitful discussion with Wade Baustian from Camino Natural Resources and Cimarex for providing the data used in this work.

Author information



Corresponding author

Correspondence to A. Alzahabi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1: Model coefficient details

See Table 5.

Appendix 2: Table of data analytics use in completion strategies in unconventional resources

See Table 6.

Table 6 Data analytics tools used for unconventional reservoirs

Appendix 3: Regression tree-based measures for ranking predictors

  1. 1.

    %Inc Mean Squared Error (MSE)

    It is calculated from the well data for each tree, the prediction error on the test (randomly rearranged values of the variable) is documented as MSE.

    $${\text{MSE}} = {\text{mean}}\left( {\left( {y_{\text{measured}} - y_{\text{predicted}} } \right)^{2} } \right)$$
  2. 2.

    Inc Node Purity

    It is considered from the test data as well, total reduction in node impurities from splitting on the variable, averaged over all trees. Then impurities are evaluated by residual sum of squares.

Appendix 4: Response surface model

$$y = \int\limits_{\ell = 1}^{L} {\beta_{0,\ell } x_{0,\ell } } + \int\limits_{i = 1}^{k} {\beta_{i} x_{i} } + \int\limits_{i < j} {\beta_{i,j} x_{i} x_{j} } + \int\limits_{i = 1}^{k} {\beta_{i,i} x_{i}^{2} + \varepsilon ,}$$

where \(x_{0,\ell }\) is a logic denoting the presence (\(x_{0,\ell } = 1\)) or absence (\(x_{0,\ell } = 0\)) of the factor at level in the model, \(\beta_{0,0} , \ldots ,\beta_{0,L} ,\;\beta_{1} , \ldots , \beta_{k} ,\; \beta_{1,1} ,\beta_{1,2} , \ldots \beta_{k,k}\) are unknown coefficients to be estimated, and ε is a random error term.

$$\widehat{y}\left( X \right) = \mathop \sum \limits_{l = 1}^{L} \widehat{B}_{0,l } x_{0,l} + \mathop \sum \limits_{l = 1}^{k} \widehat{B}_{i } x_{i} + \mathop \sum \limits_{i < j} \widehat{B}_{i,j } x_{i} x_{j} + \mathop \sum \limits_{i = 1}^{k} \widehat{B}_{i,j } x_{i}^{2}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alzahabi, A., Trindade, A.A., Kamel, A. et al. Optimizing initial oil production of horizontal Wolfcamp wells utilizing data analytics. J Petrol Explor Prod Technol 10, 2357–2371 (2020). https://doi.org/10.1007/s13202-020-00926-0

Download citation


  • Initial oil production
  • Wolfcamp
  • Horizontal wells
  • Completion variables
  • Data analytics
  • Response surface methodology