1 Introduction

Lithofacies identification is a critical content in stratigraphic correlation and sedimentary facies analysis. Lithofacies is an indispensable component of sedimentary facies, which can represent different lithologies or same lithology containing different types of fluids. Lithofacies prediction has a magnificent guiding significance for reservoir prediction and subsequent reservoir properties prediction. Accurate identification of lithofacies is conducive to the process of exploration, development and the stable production of resources (Xiong et al. 2010; Li et al. 2012). Delving into the abundant information about lithology and fluid contained in seismic data is helpful to improve lateral resolution and accuracy in the inter-well area. However, the relationship between lithofacies and seismic information is extremely complex and difficult to construct because multitudinous factors affect each other. As a result, it brings great challenges on seismic lithofacies identification (Liu et al. 2017; Huang et al. 2017a, b).

Traditional methods use rock physics and geostatistics to achieve reservoir characterization and prediction (Jalalalhosseini et al. 2014, 2015; Liu et al. 2018). Nowadays, machine learning has attracted wide attention in geoscience because of its advantages in addressing big data issues (e.g., Huang et al. 2016; Chen 2017, 2018; Chen et al. 2019). Neural network is one of the most used machine learning algorithms in geophysics (Kobrunov and Priezzhev 2016). However, it depends on the network topology, initial weights and thresholds and is sensitive to learning rate. Improper parameters will result in a slow convergence, falling to a local minimum, and subsequently outputting unsatisfactory predictions (Caruana and Niculescu-Mizil 2006). Support vector machine (SVM) is an effective machine learning method that aims at maximizing the distance from support vector to separating hyperplanes (Li et al. 2004; Zhang et al. 2005, 2018; Liu et al. 2020). It shows a powerful performance during the dispose of small-samples, nonlinearity and high-dimensional problems. SVM is guided by the maximum margin decision boundary and does not require the data to follow a specific distribution (Li et al. 2004; Vapnik 1999; Mou et al. 2015). Besides, it has relatively simple mathematical form with strong generalization ability (Suykens and Vandewalle 1999). The SVM method can be introduced to lithofacies identification (Abedi et al. 2012; Wang et al. 2016). Using SVM to identified lithofacies includes two steps. The first is training by employing training attributes (input) and known lithofacies (output) and obtaining the relationship between attributes and lithofacies. The second step is predicting lithofacies by inputting the attributes in target areas into decision function. Li et al. (2004) used SVM to recognize and predict reservoirs from seismic data, demonstrating the feasibility of SVM. Torres and Reveron (2013) integrated rock physics and simultaneous seismic inversion and successfully identified the reservoir zones by utilizing SVM in Orinoco Oil Belt, Venezuela. Zhao et al. (2015) compared the artificial neural network with SVM for lithofacies recognition and proved that SVM was mathematically more robust and easier to train. Besides, Zhao et al. (2014) introduced the proximal support vector machines (PSVM, see Fung and Mangasarian 2001, 2005; Mangasarian and Wild 2005) into lithofacies classification in Barnett Shale for saving computational cost, demonstrating the validity of PSVM classifier in binary classification between shale and limestone.

However, SVM is originally used to solve binary classification problems. Lithofacies identification is a multi-classification, high-dimensional and nonlinear problem. When solving the problem of nonlinear classification or prediction, using kernel functions maps the original data into a higher-dimensional feature space (Qu et al. 2019). Then, according to the training data, a classification hyperplane is established as a decision surface to separate the data belonging to different categories in the high-dimensional space (Zhu et al. 2015). Unfortunately, it is inadequate to distinguish the characteristics of different lithofacies only by using a single kernel under this situation. Otherwise, the identified accuracy will be impacted. The multi-kernel learning (MKL) is an available alternative with more flexibility than single-kernel function (Crammer and Singer 2001). Introducing the MKL method can improve the accuracy and stability of lithofacies discrimination based on SVM (Li et al. 2014). Through multi-kernel mapping, the high-dimensional space is divided into a combined space composed of several feature spaces. Then, each of the characteristic components can be embedded in the corresponding kernel function. MKL-SVM is less explored in lithofacies identification and reservoir prediction applications. Qin (2017) analyzed the methods and principles of commonly used techniques for lithofacies identification. Based on logging data, it was proved that MKL-SVM could enhance the accuracy of lithofacies classification. Cheng et al. (2018) successfully accomplished the lithofacies classification by the MKL-SVM method, but the calculation was time-consuming and the operation was cumbersome. The MKL algorithm is confronted with the dilemmas in selecting the appropriate of kernel function, determining the combination way of kernel functions and calculating the coefficients of each kernel function. In general, substantial kernel matrices jointly participate in the computation. This is the main reason why the computational dimension is large, and the spatial complexity and memory occupancy are high. Then, the computational time increases dramatically (Lin et al. 2007; Li et al. 2016). As a result, application of lithofacies identification based on MKL-SVM is severely limited in practice. GöNen and Alpaydin (2008) proposed a local multi-kernel learning (LMKL) SVM method that selected the appropriate kernel function locally to reduce the computational complexity. Although it effectively improves the sparsity of the kernels and reduces spatial complexity, it weakens the “complementarity” between the kernels and has serious parameter redundancy (GöNen and Alpaydın 2013). It also ignores the global characteristics of data. Generally, selecting proper kernel functions and determining their weights are difficult and depend heavily on experience. In addition, the computational efficiency of LMKL-SVM is inadequate (Ding 2014).

Jose et al. (2013) generalized LMKL to learn a tree-based primal feature that was high-dimensional and sparse and put forward a local deep multi-kernel learning (LDMKL) SVM method (Bengio et al. 2010). It could take both global and local features of data into account and facilitated the efficiency of multi-kernel learning while ensuring accuracy. This method focused on learning the best decision boundary in a sparse, high-dimensional representation, which could jointly learn both kernel and SVM parameters. Nevertheless, only a single-kernel function was used for global features and it was mainly aimed at the problem of binary classification. Following this research, we introduced LDKML-SVM into lithofacies classification and improved it for multi-class. Several low- and high-dimensional kernel functions are combined to distinguish the attributes of different lithofacies more accurately, so as to effectively build the relationship between lithofacies and training attributes. Using the local deep kernel function with tree structure can promote the computational efficiency. Taking the global features into account maintains the recognition accuracy. Automatic learning of kernel function parameters and decision parameters avoids the deviation caused by artificial selection of parameters. The goals are to supply a promising new method for lithofacies identification and reservoir prediction, to overcome the weaknesses of the existing SVM-based lithofacies identification method and to promote the practicability of lithofacies classification based on SVM. Model data test and field data application verify validity of the proposed method.

2 Methodology

Compared with the LMKL-SVM, LDMKL-SVM is an improved method that can take into account the global and local characteristics of data at the same time. It concentrates on learning the best decision boundary in a sparse and high-dimensional representation. We introduced this method to lithofacies classification and extended it for multi-class facies. Multiple global kernel functions that are used to learn global features are set to be low-dimensional. The local kernel functions are composed of mapping functions that are tree-structured, high-dimensional and sparse. For that reason, the LDMKL-SVM method is conducive to improving the efficiency and accuracy when classifying lithofacies. The number of global and local kernel functions can be adjusted according to the sophistication of the problem. Another advantage of LDMKL-SVM is that it can learn parameters of kernel function and decision parameters of SVM at the same time. By learning the training data, we can establish relationship between training attributes and lithofacies using LDMKL algorithm. Inputting the measured data into the decision function of SVM realizes lithofacies recognition in other wells or in inter-well area. In the present method, the decision function of SVM can be expressed as (Jose et al. 2013):

$$\begin{aligned} y\left( {\mathbf{x}} \right) &= sign\left( {\sum\limits_{i} {\alpha_{i} y_{i} K\left( {{\mathbf{x}},{\mathbf{x}}_{i} } \right)} } \right) \hfill \\ &= sign\left( {\sum\limits_{ijk} {\alpha_{i} y_{i} \phi_{{G_{j} }} \left( {{\mathbf{x}}_{i} } \right)\phi_{{G_{j} }} \left( {\mathbf{x}} \right)\phi_{{L_{k} }} \left( {{\mathbf{x}}_{i} } \right)\phi_{{L_{k} }} \left( {\mathbf{x}} \right)} } \right) \hfill \\ &= sign\left( {{\mathbf{w}}^{T} \left( {\varPhi_{G} \left( {\mathbf{x}} \right) \otimes \varPhi_{L} \left( {\mathbf{x}} \right)} \right)} \right) \hfill \\ &= sign\left( {\varPhi_{L}^{T} \left( {\mathbf{x}} \right){\mathbf{W}}^{T} \varPhi_{G} \left( {\mathbf{x}} \right)} \right) \hfill \\ &= sign\left( {{\mathbf{W}}^{T} \left( {\mathbf{x}} \right)\varPhi_{G} \left( {\mathbf{x}} \right)} \right) \hfill \\ \end{aligned}$$
(1)

where \(K\left( {{\mathbf{x}},{\mathbf{x}}_{i} } \right) = \sum\nolimits_{j,k} {K_{Gj} } K_{Lk}\) represents multi-kernel learning function, \({\mathbf{x}}_{i}\) represents the data, subscript \(j = 1, \ldots ,J\) denotes the \(j{\text{th}}\) global kernel function \(K_{G} = \varPhi_{G} \otimes \varPhi_{G}\) and \(K_{L} = \varPhi_{L} \otimes \varPhi_{L}\) are the global and local kernel functions, respectively. \(K_{L}\) consists of sparse and tree-structured mapping functions \(\varPhi_{L}\) that contains high-dimensional local features. \(\varPhi_{G}\) represents global mapping relations that contains low-dimensional global features. \({\mathbf{w}}_{k} = \sum\nolimits_{i} {\alpha_{i} y_{i} \phi_{{L_{k} }} \left( {{\mathbf{x}}_{i} } \right)\varPhi_{G} \left( {{\mathbf{x}}_{i} } \right)}\), \(k = 1, \ldots ,M\) denotes the \(k{\text{th}}\) dimension of \(\varPhi_{L}\), \(y_{i}\) represents the type of lithofacies, \(\alpha_{i}\) is coefficient, \({\mathbf{W}} = [{\mathbf{w}}_{1} , \ldots ,{\mathbf{w}}_{k} , \ldots ,{\mathbf{w}}_{M} ]\) and \({\mathbf{W}}\left( {\mathbf{x}} \right) = {\mathbf{W}}\varPhi_{L} \left( {\mathbf{x}} \right)\).

In order to ensure computational efficiency, global features are usually low-dimensional. In this case, global mapping kernels can be set in linear and quadratic. To make prediction efficient, \(\varPhi_{L}\) is tree-structured. Each dimension of \(\varPhi_{L}\) corresponds to a node in the tree, and the dimension of \(\varPhi_{L} \left( {\mathbf{x}} \right)\) is nonzero only in the case that the corresponding node lies on the path traversed from the root to one of the leaves. Otherwise, it is equal to 0. Thus, for any locations, \(\varPhi_{L} \left( {\mathbf{x}} \right)\) has only \(\log \, M\) nonzero dimensions, accelerating the computation (Jose et al. 2013). Figure 1 displays a four-layer tree structure schematic diagram. Only those dimensions of \(\varPhi_{L} \left( {\mathbf{x}} \right)\) are nonzero which correspond to the path traversed by \({\mathbf{x}}\) from the root to a leaf (as the black node shown in Fig. 1), which reduces the number of features to be calculated.

Fig. 1
figure 1

Sketch Map of a four-dimensional local feature with tree-structured

For the \(k{\text{th}}\)-dimensional local feature, its state value can be controlled by the indicator variable \(I_{k} \left( {\mathbf{x}} \right)\) (Jose et al. 2013),

$$I_{k} \left( {\mathbf{x}} \right) = \prod\limits_{a \in ancestors\left( k \right)} {\frac{1}{2}} \left( {\tanh \left( {s_{I} \theta_{a}^{T} {\mathbf{x}}} \right) + ( - 1)^{C\left( a \right)} } \right)$$
(2)

where \(s_{I}\) represents a contraction factor. Jose et al. (2013) introduced a scale parameter \(\theta_{a}^{T}\) to make tree learning amenable to sub-gradient descent and keep the sparsity. \(C\left( a \right)\) is 0 if a node is its parent’s left child and 1 if it is its parent’s right child. The high nonlinear features are mainly embodied in the \(\varPhi_{L}\). Jose et al. (2013) had tested the performance of local deep mapping functions with different forms and drew a conclusion that using the hyperbolic tangent function with a scale parameter \(\theta_{k}^{\prime T}\) could output an excellent result. Following the conclusion, we utilized the hyperbolic tangent function to construct the local kernel function. The \(k^{th}\)-dimensional local feature mapping function is as follows:

$$\phi_{{L_{k} }} = \tanh \left( {\sigma \theta_{k}^{\prime T} {\mathbf{x}}} \right)I_{k} \left( {\mathbf{x}} \right)$$
(3)

where \(\sigma\) is a contraction factor and \(\theta_{k}^{\prime T}\) is a scale parameter. \(\varTheta = [\theta_{1} , \ldots ,\theta_{M} ]\) and \(\varTheta^{\prime } = [\theta^{\prime }_{1} , \ldots ,\theta^{\prime }_{M} ]\) denote the learning parameters that can be gained by solving the following objective function (Jose et al. 2013):

$$\begin{aligned} \mathop {\hbox{min} }\limits_{{{\mathbf{W}},\varTheta ,\varTheta^{\prime } }} P\left( {{\mathbf{W}},\varTheta ,\varTheta^{\prime } } \right) = \frac{{\lambda_{W} }}{2}\sum\limits_{k} {\left\| {{\mathbf{w}}_{k} } \right\|}_{2}^{2} + \frac{{\lambda_{\theta } }}{2}\sum\limits_{k} {\left\| {\theta_{k} } \right\|}_{2}^{2} \hfill \\ \, + \frac{{\lambda_{{\theta^{\prime } }} }}{2}\sum\limits_{k} {\left\| {\theta^{\prime }_{k} } \right\|}_{2}^{2} + \sum\limits_{i} {L\left( {y_{i} ,\varPhi_{L}^{T} \left( {{\mathbf{x}}_{i} } \right){\mathbf{W}}^{T} {\mathbf{x}}_{i} } \right)} \hfill \\ \end{aligned}$$
(4)

where \(\lambda_{W}\) , \(\lambda_{\theta }\) , \(\lambda_{{\theta^{\prime}}}\) represent the coefficients, and \(L\) denotes the loss function. The loss function is broadly used to estimate the degree of inconsistency between the predicted value \(y\left( {\mathbf{x}} \right)\) and the real value \(y\) at \({\mathbf{x}}\). In order to extend the binary SVM to multi-class classification, the common practice is using one-vs-all or one-vs-one strategy (Duan and Keerthi 2005). In contrast, we introduced the multi-class loss function to solve the multi-class problem. There are several variations in multi-class loss functions. We use the loss function proposed by Crammer and Singer (2001):

$$L = \hbox{max} \left( {0,1 + \mathop {\hbox{max} }\limits_{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} \ne y_{i} }} \left( {\varPhi_{L}^{T} \left( {{\mathbf{x}}_{i} } \right){\mathbf{W}}_{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} } \right)}}^{T} {\mathbf{x}}_{i} } \right) - \varPhi_{L}^{T} \left( {{\mathbf{x}}_{i} } \right){\mathbf{W}}_{{\left( {y_{i} } \right)}}^{T} {\mathbf{x}}_{i} } \right)$$
(5)

where \(\mathop {\hbox{max} }\limits_{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} \ne y_{i} }} \left( {\varPhi_{L}^{T} \left( {{\mathbf{x}}_{i} } \right){\mathbf{W}}_{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} } \right)}}^{T} {\mathbf{x}}_{i} } \right)\) represents the estimated score of the real facies \(y_{i}\) misclassified into other lithofacies. The minimum of objective function (Eq. 4) can be found by the primal stochastic sub-gradient descent method, neglect of keeping dual variables or dual sparsity (see Jose et al. 2013; Orabona et al. 2010). Equation 4 contains terms about learning parameters of kernel functions and decision parameters of SVM. As a result, the method can jointly learn both kernel learning and SVM decision parameters. For the \(t{\text{th}}\) iteration of training point \({\mathbf{x}}_{i}\), the corresponding iteration update formula is:

$${\mathbf{W}}^{{\left( {t + 1} \right)}} = {\mathbf{W}}^{\left( t \right)} - \beta_{t} \nabla_{{\mathbf{W}}} P\left( {{\mathbf{W}}^{\left( t \right)} ,\varTheta^{\left( t \right)} ,\varTheta^{\prime \left( t \right)} ,{\mathbf{x}}_{i} } \right)$$
(6)
$$\varTheta^{{\left( {t + 1} \right)}} = \varTheta^{\left( t \right)} - \beta_{t} \nabla_{\varTheta } P\left( {W^{\left( t \right)} ,\varTheta^{\left( t \right)} ,\varTheta^{\prime \left( t \right)} ,{\mathbf{x}}_{i} } \right)$$
(7)
$$\varTheta^{{\prime \left( {t + 1} \right)}} = \varTheta^{\prime \left( t \right)} - \beta_{t} \nabla_{{\varTheta^{\prime } }} P\left( {W^{\left( t \right)} ,\varTheta^{\left( t \right)} ,\varTheta^{\prime \left( t \right)} ,{\mathbf{x}}_{i} } \right)$$
(8)

where \(\beta_{t}\) denotes iteration step,

$$\nabla_{{{\mathbf{w}}_{k} }} P\left( {{\mathbf{x}}_{i} } \right) = \lambda_{W} {\mathbf{w}}_{k} + \nabla_{{{\mathbf{w}}_{k} }} \left( {\mathop {\hbox{max} }\limits_{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} \ne y_{i} }} \varPhi_{L}^{T} \left( {{\mathbf{x}}_{i} } \right){\mathbf{W}}_{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} } \right)}}^{T} {\mathbf{x}}_{i} } \right) - \phi_{{L_{k} }} \left( {{\mathbf{x}}_{i} } \right){\mathbf{x}}_{i}$$
(9)
$$\nabla_{{\theta_{k} }} P\left( {{\mathbf{x}}_{i} } \right) = \lambda_{\varTheta } \theta_{k} - \sum\limits_{a} {\tanh\left( {\sigma \theta_{a}^{\prime T} {\mathbf{x}}_{i} } \right)\nabla_{{\theta_{k} }} I_{a} \left( {{\mathbf{x}}_{i} } \right)} \left( {{\mathbf{w}}_{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} } \right)a}}^{{T\left( {\mathop {\hbox{max} }\limits_{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} \ne y_{i} }} } \right)}} {\mathbf{x}}_{i} - {\mathbf{w}}_{{\left( {y_{i} } \right)a}}^{T} {\mathbf{x}}_{i} } \right)$$
(10)
$$\nabla_{{\theta^{\prime }_{k} }} P\left( {{\mathbf{x}}_{i} } \right) = \lambda_{{\varTheta^{\prime } }} \theta^{\prime }_{k} - \sigma \left[ {1 - \tanh^{2} \left( {\sigma \theta_{k}^{\prime T} {\mathbf{x}}_{i} } \right)} \right]I_{k} \left( {{\mathbf{x}}_{i} } \right)\left( {{\mathbf{w}}_{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} } \right)k}}^{{T\left( {\mathop {\hbox{max} }\limits_{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i} \ne y_{i} }} } \right)}} - {\mathbf{w}}_{{\left( {y_{i} } \right)k}}^{T} } \right){\mathbf{x}}_{i} {\mathbf{x}}_{i}$$
(11)

Solving the objective function iteratively, we can acquire parameters of local deep multi-kernel learning and SVM decision parameters. For the seismic lithofacies classification problem with large amount of data, it can also obtain discriminant lithofacies with appropriate accuracy while ensuring the computational efficiency.

3 Test on model data

First, we tested the present method on a modified fluvial channel system from the Stanford V reservoir model (Mao and Journel 1999), which had 120 CDPs and 100 CDPs in the horizontal direction of X and Y, with interval of 25 meters. The curved channel model includes channel, point bar, natural dike and flood plain subfacies. The lithofacies developed in different sedimentary facies are discrepant. The floodplain, channel, natural dike and point bar are sandstone, mudstone, sandy mudstone and siltstone deposit, respectively. Extract some traces from the original model as training data. Test is done on the whole model, and here, we show a slice of it to contrast clearly. Figure 2 shows lithofacies of one slice in the model and locations of the training data. Training attributes include density and P-wave velocity (displayed in Fig. 3). Figure 3c, d shows the probability density distribution of elastic attributes in different facies. It should be mentioned that the elastic attributes features of siltstone and sandy mudstone are distributed between that of mudstone and sandstone, and their attribute values overlap with sandstone or mudstone in a large range. The relationship between lithofacies and elastic attributes is established by the present method and LMKL-SVM method, respectively. Figure 4a exhibits the discriminated lithofacies using LDMKL-SVM. It can be noted that this method can better identify channel sandstone and floodplain mudstone, while the siltstone and sandy mudstone cannot be fully recognized. Figure 4b shows the discriminated lithofacies by LMKL-SVM. Some locations whose lithofacies are sandstone in model are identified as sandy mudstone. In addition, the lithofacies that should be siltstone and sandy mudstones at some locations are not consistent with the model facies. Although both the two methods closely follow the defined facies and show the promising results, the proposed method produces a more accurate result, which suggests that its ability to classify these facies is stronger than the conventional method. In order to compare and evaluate the performance of the two methods more intuitively and quantitatively, we count the confusion matrices of the two methods (shown in Tables 1 and  2). It is found that the accuracy of the LMKL-SVM is slightly lower than that of the proposed method (about 2.03%). Obviously, the recognition accuracy of the proposed method for different lithofacies is also higher than that of LMKL-SVM. For the whole test model, the execution time of LMKL-SVM is 1674.81 s and is longer than that of the proposed method (354.99 s). This demonstrates the effectiveness and superiority of the lithofacies identification method based on LDMKL-SVM.

Fig. 2
figure 2

One slice of the test model, which develops channels, natural dike, point bar and floodplain. Different colors represent different facies. 0, 1, 2 and 3 represent mudstone, sandstone, sandy mudstone and siltstone, respectively. White circles represent training data locations

Fig. 3
figure 3

Elastic attributes of the test model (a, b) and their probability density distribution (c, d). (a, c) P-wave velocity; (b, d) density. 0,1,2,3 represent mudstone, sandstone, sandy mudstone and siltstone, respectively. The difference of elastic properties between mudstone and sandstone is remarkable and easy to distinguish, while attribute values of sandy mudstone and siltstone overlap with those of mudstone and sandstone, which brings about challenges to discriminate them

Fig. 4
figure 4

Classified results of the test model by different methods: a identified lithofacies by LDMKL-SVM; b identified lithofacies by LMKL-SVM. By contrast, the former is superior to the latter. Mudstone and sandstone are well recognized. But the classification effect of the other two lithofacies is unsatisfactory

Table 1 Confusion matrix of the test model with the LDMKL-SVM method
Table 2 Confusion matrix of the test model with the LMKL-SVM method

4 Application on field data

In order to further verify the validity of lithofacies discriminant method based on LDMKL-SVM, we applied this novel method to actual land logging and 2D seismic data. The filed data are from a China work area. Figure 5 displays the test seismic section that goes through two wells (Well A and B). According to the preliminary study and log interpretation results, the main lithofacies in the study area includes mudstone, water-bearing sandstone and oil-bearing sandstone. Black lines denote locations of the two wells, and the distance between the two wells is 975 m. Black arrows in Fig. 5 indicate the target reservoirs.

Fig. 5
figure 5

2D seismic stack profile. Black lines denote locations of the two wells. Black arrows indicate target reservoirs

4.1 Lithofacies discriminant in wells

Experiments are carried out in wells at first. We borrow density and P- and S-wave velocity as training attributes. Figures 6 and 7 exhibit logging curves of Wells A and B. Figure 8 displays the distribution of elastic attributes in the two wells. All logging data of Well A are used as training data to identify lithofacies of Well A and Well B by the present method and LMKL-SVM, respectively. Figures 9 and 10 show interpretation lithofacies and identified results of Well A and Well B, respectively. The result produced by the present method has an excellent consistency with the interpretation lithofacies and can accurately pick out the oil-bearing sandstone. The performance of the proposed method is better than that of the traditional method. Statistical analysis of the confusion matrix exhibited in Table 3 and Table 4 indicates that the misjudgment rate of the proposed method is only 4.87% (Well A) and 6.69% (Well B), while that of the traditional method is 7.89% and 14.68%, correspondingly. The accuracy of each type lithofacies is higher than that of LMKL-SVM. That is, the accuracy is improved by jointly considering the global low- and local high-dimensional features, indicating the dependability of the method quantitatively.

Fig. 6
figure 6

Logging curves of Well A: a porosity; b shale content; c water saturation; d P-wave velocity; e S-wave velocity; and f density

Fig. 7
figure 7

Logging curves of the Well B: a porosity; b shale content; c water saturation; d P-wave velocity; e S-wave velocity; and f density

Fig. 8
figure 8

Probability density distribution of elastic attributes in Wells A and B. a, d P-wave velocity; b, e S-wave velocity; c, f density. The above figures are about Well A, and the below figures are about Well B. 0, 1, 2 represent mudstone, water sandstone and oil sandstone, respectively

Fig. 9
figure 9

Defined lithofacies a and identified facies by different methods of the test Well A: b classified by the proposed method; c classified by the traditional method. 0, 1, 2 represent mudstone, water sandstone and oil sandstone, respectively. Because of the higher complexity of the field data than model data, learning of global and high-dimensional local features can better distinguish the attributes of different lithofacies and the accuracy is promoted

Fig. 10
figure 10

Defined lithofacies a and identified facies by different methods of the test Well B. b classified by the proposed method; and c classified by the traditional method. 0, 1, 2 represent mudstone, water sandstone and oil sandstone, respectively. Comparing diagrams b with c, it can be found that the new method improves the accuracy and has a better consistency with the definition of lithofacies

Table 3 Confusion matrix of Well A with different methods
Table 4 Confusion matrix of Well B with different methods

4.2 Seismic facies classification

Ultimately, we applied this novel method to the 2D seismic profile. The characteristics of the geological structure vary with different measurement methods at different scales. Because the observation scales of logging data and seismic data are different, logging data are first coarsened to seismic scale by Backus averaging. The elastic attributes of the area to be predicted are obtained by the prestack seismic inversion method, as shown in Fig. 11. With coarsened well data (well A and well B) as training data, classified lithofacies by different methods is displayed in Fig. 12. It can be noted that both the two methods can effectively identify sandstone and mudstone in the inter-well area and distinguish oil-bearing sandstone from water-bearing sandstone. In this application, we cannot compare the accuracy of the two methods because the true seismic facies is unknown. However, LDMKL-SVM can speed up prediction time over LMKL-SVM. Their running time is 107.5 s and 418.2 s, respectively. Predicted lithofacies of the traces near wells are almost in agreement with the coarsened well facies. The identified target reservoirs follow the actual situation, which also suggests the good performance and practical value of the new method. It can provide reliable information for reservoir prediction and subsequent research. In addition, this method can be applied to any case where lithology and fluid need to be recognized.

Fig. 11
figure 11

Profiles of seismic elastic attributes obtained by prestack seismic inversion technique. a P-wave velocity; b S-wave velocity; and c density. The inverted attributes reflect the shapes and characteristics of strata in the study area. A good inversion result of the elastic attribute is helpful to identify the corresponding lithofacies

Fig. 12
figure 12

Classified lithofacies of the field data using different methods. a LDMKL-SVM; b traditional LMKL-SVM. 0, 1, 2 represent mudstone, water sandstone and oil sandstone, respectively. Different lithofacies are well distinguished, and their shapes are similar to of the strata. Target reservoirs are also well identified

5 Conclusion

We describe a new lithofacies identification method based on SVM. The present method draws support from a composite nonlinear kernel consisted of high-dimensional, sparse and computationally deep local features and low-dimensional global features. This method can be used to classify multiple types of lithofacies. One advantage is that it can automatically learn the parameters of kernel functions and SVM at the same time, avoiding the weakness of the traditional lithofacies discriminant method based on SVM. Another potential advantage is that the new method can effectively improve the classification accuracy. At the same time, the computing cost is saved because the local high-dimensional features are sparse and tree-structured. The numerical example and filed data application illustrate that the proposed method can generate a preferable identified result in a relatively short time. The comparison with traditional methods also calibrates the superiority of the lithofacies discriminant method based on LDMKL-SVM. Profiting from the high accuracy and computation efficiency, a valuable approach is provided for the practical application of seismic lithofacies identification. It is also of great significance to the exploration and development of reservoirs and has good application prospects.