Modeling of oxygen delignification process using a Kriging-based algorithm

  • 17 Accesses


A phenomenological model of cellulose production processes presents limitations due to the presence of species and chemical reactions of complex computational representation. Modeling based on machine learning techniques is an alternative to overcome this drawback. This paper addresses the Gaussian process regressor (Kriging) method to model the oxygen delignification process in one of the largest pulp production plants of the world. Different correlation models were used to evaluate this method; furthermore, an optimization routine, based on the constrained optimization by linear approximation method, was coupled to model to minimize the objective function, which is based on the input cost. Results have shown the good performance of using a combined Kriging method with optimization routines in the non-linear industrial processes to obtain a representative model capable of providing optimized operating scenarios. A reduction of 36.5% in consumption of NaOH was obtained, while required restrictions are obeyed.

Graphic abstract

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


\(\varvec{\alpha}\) :

Scale mixture factor

\(\varvec{\beta}_{\varvec{p}}\) :

Regression parameters for the polynomial function

\(\varvec{\sigma}_{\varvec{l}}\) :

Characteristic length-scale

\(\varvec{\theta}\) :

Vector of hyperparameters (parameters of the covariance function)

\({\mathbb{E}}\) :


\(\varvec{Cov}\) :


D :

Euclidean distance between x and \({\text{x}}^{*} \left( {\sqrt {\left( {x - x^{*} } \right)^{T} \left( {x - x^{*} } \right)} } \right) ;\;{\text{x}} \ne {\text{x}}^{*}\)

\(\varvec{F}\) :

Matrix of fixed-base functions

\(\varvec{f}_{\varvec{p}} (\varvec{x})\) :

Fixed-base functions

Q k :

Median Q of fraction k of dataset (0.25 or 0.75)

\(\varvec{R}(\varvec{\theta},\varvec{ x}_{\varvec{i}} ,\varvec{x}_{\varvec{j}} )\) :

Covariance function (or kernel) evaluated at points x and \({\text{x}}^{ *} :\;{\text{x}} \ne {\text{x}}^{ *}\)

\(\varvec{R}\left[ {\varvec{R}(\varvec{\theta},\varvec{ x}_{\varvec{i}} ,\varvec{x}_{\varvec{j}} )} \right]\) :

Correlation matrix or spatial correlation function

\(\varvec{y}\) :

Set of response values for each sampling point

\(\varvec{y}(\varvec{x})\) :

Response value for one specific input

\(\hat{\varvec{y}}_{\varvec{i}}\) :

The ith estimated output

\(\varvec{y}_{\varvec{i}}\) :

The ith correct output

\(\bar{\varvec{y}}\) :

The arithmetic mean of the samples

\(\varvec{Var}\) :


\({\mathbf{Z}}\) :

Set of deviation functions

\(\varvec{z}(\varvec{x})\) :

Deviation function


  1. Akter T, Desai S (2018) Developing a predictive model for nanoimprint lithography using artificial neural networks. Mater Des 160:836–848.

  2. Caigny A, Coussement K, Bock KW (2018) A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur J Oper Res 269:760–772.

  3. Hall P, Phan W, Whitson K (2016) The evolution of analytics: opportunities and challenges for machine learning in business. O’Reilly Media, Newton

  4. Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J S Afr Inst Min Metall 52:119–139.

  5. Lee H, Lee D, Kwon H (2018) Development of an optimized trend kriging model using regression analysis and selection process for optimal subset of basis functions. Aerosp Sci Technol 77:273–285.

  6. Liu T, Wei H, Zhang K (2018) Wind power prediction with missing data using gaussian process regression and multiple Imputation. Appl Soft Comput 71:905–916.

  7. Luciano RD, Silva BL, Rosa LM, Meier HF (2017) Multi-objective optimization of cyclone separators in series based on computational fluid dynamics. Powder Technol 325:452–466.

  8. Luo L, Yao Y, Gao F, Zhao C (2018) Mixed-effects gaussian process modeling approach with application in injection molding processes. J Process Control 62:37–43.

  9. Martin JD, Simpson TW (2005) Use of Kriging models to approximate deterministic computer models. AIAA J 43:853–863.

  10. Matheron G (1969) Le krigeage universel. École nationale supérieure des mines de Paris, Paris

  11. Pani AK, Vadlamudi V, Bhargavi RJ, Mohanta, HK (2011) Neural network soft sensor application in cement industry: prediction of clinker quality parameters. In: International conference on process automation, control and computing.

  12. Pedregosa F, Varoquaux G, Gramfort A (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

  13. Rasmussen CE, Williams CKI (2006) Gaussian process for machine learning. Massachusetts Institute Technology, Cambridge

  14. Richardson RR, Osborne MA, Howey DA (2017) Gaussian process regression for forecasting battery state of health. J Power Sources 357:209–219.

  15. Sheng H, Xiao J, Cheng Y, Ni Q, Wang S (2016) Short-term solar power forecasting based on weighted gaussian process regression. IEEE Trans Ind Electron 65:300–308.

  16. Tayeb S, Pirouz M, Sun J, Hall K, Chang A, Li J, Song C, Chauhan A, Ferra M, Sager T, Zhan J, Latifi S (2017) Toward predicting medical conditions using k-nearest neighbors. In: IEEE International conference on big data, pp 3897–3903.

  17. Zhou J, Guang F, Tang R (2017) Scenario analysis of carbon emissions of china’s power industry based on the improved particle swarm optimization-support vector machine model. Pol J Environ Stud 27:439–449.

Download references


The authors thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial support for this work.

Author information

Correspondence to Karoline Brito.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



The Kriging method was initially suggested in the 1950s by the mining engineer Daniel G. Krige (1951) and developed later by the Georges Matheron (1969). The Kriging model is based on the representation of a process as being the sum of a fixed base function and a deviation function, as can be observed in Eq. 15.

$$y(x) = \mathop \sum \limits_{p = 1}^{p = k} f_{p} (x)\beta_{p} + z(x);\quad p = 1,2,3 \ldots k;$$
  • The first term after equality is the average trend of the true response (Lee et al. 2018) and can be considered as a regression model that is a linear combination of k chosen functions. The fixed-base functions, \(f_{p} (x)\), are usually formulated as pth order polynomials, which together with the deviation term (\(z(\varvec{x})\)) make up the so-called pth order Universal Kriging (UKG) model.

  • \(z(x)\) is the deviation between the true and fixed-base functions. This function is usually modeled as a Gaussian random process with mean zero and variance \(\sigma^{2}\).

  • β represents the regression parameters for the polynomial function and is determined by the generalized least square (GLS) method (Martin and Simpson 2005). Expanding the first term of Eq. 15 gives:

    $$\begin{aligned} \mathop \sum \limits_{k = 1}^{n} f_{k} (x)\beta_{k} & = f_{1} (x)\beta_{1} + f_{2} (x)\beta_{2} + \cdots + f_{n} (x)\beta_{n} \\ & = \left[ {f_{1} (x) \ldots f_{k} (x)} \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\beta_{1} } \\ \vdots \\ \end{array} } \\ {\beta_{k} } \\ \end{array} } \right] = \varvec{f}(x)^{T}\varvec{\beta}\\ \end{aligned}$$

    and thus rewrite Eq. 15:

    $$y(x) = \varvec{f}(x)^{T}\varvec{\beta}+ z(x)$$

By applying this formulation to a set of sampling points \(\left( {\varvec{x} = \left[ {x_{1} ,x_{2} , \ldots x_{n} } \right]^{T} } \right)\), with \(x \in {\mathbb{R}}^{q}\) (q is the number of input variables), it is possible to express Eq. 17 in the general form described by Eq. 18:

$$\varvec{y} = \varvec{F\beta } + \varvec{Z}$$


$$\varvec{y} = \left[ {y(\varvec{x}_{{\mathbf{1}}} ),y(\varvec{x}_{{\mathbf{2}}} ), \ldots , y(\varvec{x}_{\varvec{n}} )} \right]^{T}$$
$$\varvec{F} = f_{k} (\varvec{x}_{\varvec{i}} ) = \left[ {\begin{array}{*{20}c} {f_{1} (\varvec{x}_{{\mathbf{1}}} )} & {f_{2} (\varvec{x}_{{\mathbf{1}}} )} & \ldots & {f_{k} (\varvec{x}_{{\mathbf{1}}} )} \\ {f_{1} (\varvec{x}_{{\mathbf{2}}} )} & {f_{2} (\varvec{x}_{{\mathbf{2}}} )} & \ldots & {f_{k} (\varvec{x}_{{\mathbf{2}}} )} \\ \vdots & \vdots & \ddots & \vdots \\ {f_{1} (\varvec{x}_{\varvec{n}} )} & {f_{2} (\varvec{x}_{\varvec{n}} )} & \ldots & {f_{k} (\varvec{x}_{\varvec{n}} )} \\ \end{array} } \right]$$
$$\varvec{\beta}= \left[ {\beta_{1} ,\beta_{2} , \ldots \beta_{k} } \right]^{T}$$
$${\mathbf{Z}} = \left[ {Z(\varvec{x}_{{\mathbf{1}}} ),Z(\varvec{x}_{{\mathbf{2}}} ), \ldots Z(\varvec{x}_{\varvec{n}} )} \right]^{T}$$

The vector y represents the set of response values for each sampling point \(x_{i}\), where \(i = 1,2,3 \ldots n\). The deviation functions (Eq. 22) are usually modeled as a Gaussian random process with mean zero and variance \(\sigma^{2}\). The covariance model can be defined by Eq. 23 and represents, from the point of view of the Gaussian process, how the response at a point \(x_{i}\) is affected by the answers at other points \(x_{j}\), where \(i \ne j\) (Rasmussen and Williams 2006).

$$Cov\left( {Z(\varvec{x}_{\varvec{i}} ),Z(\varvec{x}_{\varvec{j}} )} \right) = {\mathbb{E}}\left[ {\left( {Z(\varvec{x}_{\varvec{i}} ) - m(\varvec{x}_{\varvec{i}} )} \right)\left( {Z(\varvec{x}_{\varvec{j}} ) - m(\varvec{x}_{\varvec{j}} )} \right)} \right] = \sigma^{2} \varvec{R}\left[ {R(\varvec{\theta},\varvec{ x}_{\varvec{i}} ,\varvec{x}_{\varvec{j}} )} \right]$$


$$\varvec{R}\left[ {R(\varvec{\theta},\varvec{ x}_{\varvec{i}} ,\varvec{x}_{\varvec{j}} )} \right] = \left[ {\begin{array}{*{20}c} {R(\varvec{\theta},\varvec{x}_{{\mathbf{1}}} ,\varvec{x}_{{\mathbf{1}}} )} & {R(\varvec{\theta},\varvec{x}_{{\mathbf{1}}} ,\varvec{x}_{{\mathbf{2}}} )} & \ldots & {R(\varvec{\theta},\varvec{x}_{{\mathbf{1}}} ,\varvec{x}_{\varvec{n}} )} \\ {R(\varvec{\theta},\varvec{x}_{{\mathbf{2}}} ,\varvec{x}_{{\mathbf{1}}} )} & {R(\varvec{\theta},\varvec{x}_{{\mathbf{2}}} ,\varvec{x}_{{\mathbf{2}}} )} & \ldots & {R(\varvec{\theta},\varvec{x}_{{\mathbf{2}}} ,\varvec{x}_{\varvec{n}} )} \\ \vdots & \vdots & \ddots & \vdots \\ {R(\varvec{\theta},\varvec{x}_{\varvec{n}} ,\varvec{x}_{{\mathbf{1}}} )} & {R(\varvec{\theta},\varvec{x}_{\varvec{n}} ,\varvec{x}_{{\mathbf{2}}} )} & \ldots & {R(\varvec{\theta},\varvec{x}_{n} ,\varvec{x}_{\varvec{n}} )} \\ \end{array} } \right]$$
  • The \(\varvec{\theta}\) term represents a set of free parameters or hyperparameters associated with the correlation model that are directly tied to the smoothness of the Kriging response.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Euler, G., Nayef, G., Fialho, D. et al. Modeling of oxygen delignification process using a Kriging-based algorithm. Cellulose (2020) doi:10.1007/s10570-020-02991-4

Download citation


  • Kraft process
  • Cellulose
  • Pre-bleaching
  • Gaussian process regressor
  • Kriging