Abstract
The Online Soft Computing Models (OSCMs) based on ensemble methods are novel and quite effective datadriven tools for predicting key variables. The current challenge encountered by them is how to enhance the reliability caused by both the uncertainty from noise and the unsuitable specifications of models, on the premise of high predicting accuracy and low computational cost. To meet the current challenge, the OSCM based on the Boundary Forest (OSCMBF) is proposed in this paper. The BF combines a set of the TreeStructure Ensemble (TSE) models. In terms of the different values of θ (i.e., the minimum size of leaf nodes), the BF enhances the reliability of a single TSE not only by overlapping the gap segments of output range (i.e., connecting the discontinuous boundaries of leaf nodes), but also by possessing stronger robustness via producing enough diversity. Moreover, a theoretical range of the value of θ constructed by BF is provided. Since the simplicity, the nice interpretability and the flexibility on largescale data, the movingwindow strategy was adopted to realize the update of the BF models. The experiments on the noisy data from the industrial process of Ladle Furnace reveal that the OSCMBF can enhance the reliability of the OSCMTSE on the premise of high predicting accuracy and low computational cost.
This is a preview of subscription content, log in to check access.
Abbreviations
 BF:

Boundary Forest
 CART:

Classification and Regression Tree
 ELM:

Extreme Learning Machine
 GRNN:

General Regression Neural Network
 LF:

Ladle Furnace
 LSSVR:

Least Squares Support Vector Regression
 MAE:

Maximum Absolute Error
 MSE:

Mean Square Errors
 NN:

Neural Network
 OSCM:

Online SCM
 OSCMBF:

OSCM based on the Boundary Forest
 OSCMTSE:

OSCM based on the TSE
 pENsemble:

Parsimonious Ensemble
 RF:

Random Forest
 RMSE:

RootMeanSquare Error
 SCM:

Soft Computing Models
 SVM:

Support Vector Machine
 TSE:

TreeStructure Ensemble
 \( \varpi \) :

The width of a window
 \( \vartheta \) :

The step for updating
 Θ:

A learning set, and \( \varTheta = {\text{\{ (}}{\mathbf{X}},y )_{n} {\text{\} }}_{{n{ = }1}}^{N} \)
 \( ({\mathbf{X}},y) \) :

A sample pair
 y :

The output variable, or the real output, \( y \in {\mathbb{R}}^{1} \)
 \( \hat{y} \) :

The prediction of a model
 \( {\mathbf{X}} \) :

The input vector or a sample, and \( {\mathbf{X}} = (x_{1} , \ldots ,x_{M} ) \in {\mathbb{R}}^{M} \)
 x_{i}, i = 1, 2, …, M :

The ith input variable
 N :

The number of the samples in Θ
 M :

The dimension of the input variables
 p(X):

The mapping of the piecewise function to X
 \( \hbar_{i} , { }i = 1, \ldots ,M \) :

The threshold of the input variable \( x_{i} \)
 Θ_{leaf}, Θ_{right} :

The sample subsets of the left and the right subbranches
 MSE_{leaf}, MSE_{right} :

The MSEs of the outputs in Θ_{leaf} and Θ_{right}
 \( \bar{y}_{\text{left}} \), \( \bar{y}_{\text{left}} \) :

The mean values of the real outputs in Θ_{leaf} and Θ_{right}
 N_{leaf}, N_{right} :

The numbers of samples in Θ_{leaf} and Θ_{right}
 MSE_{min} :

The minimum sum of MSE_{leaf} and MSE_{right}
 J :

The number of the possible thresholds of a input variable
 θ :

The minimum size of leaf nodes in a TSE model
 K :

The number of the TSE models in a BF model
 T _{ k } :

The kth TSE models in a BF model, k = 1, …, K
 θ _{ k } :

The minimum size of leaf nodes in the TSE model T_{k}
 Φ _{ k } :

The set of leaf nodes in the TSE submodel T_{k}, and \( \varPhi_{k} = \{ \varTheta_{1k}^{\text{leaf}} ,\varTheta_{2k}^{\text{leaf}} , \ldots ,\varTheta_{{\varGamma_{k} k}}^{\text{leaf}} \} \)
 Г_{k} :

The number of the leaf nodes in Φ_{k}
 \( g_{1k}^{\text{leaf}} ({\mathbf{X}}),g_{2k}^{\text{leaf}} ({\mathbf{X}}), \ldots ,g_{{\varGamma_{k} k}}^{\text{leaf}} ({\mathbf{X}}) \) :

The mappings of the local TSE models learnt on Φ_{k}
 f^{BF}(X):

The mapping of a BF model
 ω = [ω_{1}, ω_{2}, …, ω_{K}] :

The weight vector of the TSE models{T_{1}, T_{2}, …, T_{K}}
 \( \omega_{k} \) :

The weight of the TSE submodel T_{k}
 f ^{TSE}_{k} (X):

The mapping of the TSE submodel T_{k}
 Ω :

The covariance matrix with size K × K
 Ω _{ kj } :

The element of Ω, j, k = 1, …, K
 \( \hat{y}_{ki} \) :

The prediction of the sample X_{i} from the TSE submodel T_{k}, j, k = 1, …, K
 \( y_{i} \) :

The real output of the sample X_{i}
 \( {\hat{\mathbf{\varLambda }}} \) :

The prediction matrix of the training samples from the K TSE models
 X _{q} :

The query sample
 \( \hat{y}_{{1{\text{q}}}} ,\hat{y}_{{2{\text{q}}}} , \ldots ,\hat{y}_{{K{\text{q}}}} \) :

The predictions of X_{q} from the K TSE models in a BF model
 χ _{ jk } :

The size of the jth leaf node in T_{k}, j = 1, …, Г_{k}, k = 1, …, K
References
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Int Group 40(3):17–23
Demsǎr J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
García S, Fernandez A, Luengo J, Herrera F (2009) A study statistical of techniques and performance measures for geneticsbased machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 42(2):513–529
Jaramillo F, Orchard M, Muñoz C, Antileo C, Sáez D, Espinoza P (2018) Online estimation of the aerobic phase length for partial nitrification processes in SBR based on features extraction and SVM classification. Chem Eng J 331:114–123
Kadlec P, Gabrys B (2011) Local learningbased adaptive soft sensor for catalyst activation prediction. AIChE J 57(5):1288–1301
Kadlec P, Gabrys B, Strandt S (2009) Datadriven soft sensor in the process industry. Comput Chem Eng 33(4):795–814
Kadlec P, Grbić R, Gabrys B (2011) Review of adaptation mechanisms for datadriven soft sensors. Comput Chem Eng 35(1):1–24
Kazienko P, Lughofer E, Trawinski B (2015) Editorial on the special issue “Hybrid and ensemble techniques in soft computing: recent advances and emerging trends”. Soft Comput 19:3353–3355
Liu Y, Gao Z, Chen J (2013) Development of softsensors for online quality prediction of sequentialreactormultigrade industrial processes. Chem Eng Sci 102(11):602–612
Liukkonen M, Hälikkä E, Hiltunen T, Hiltunen Y (2013) Adaptive soft sensor for fluidized bed quality: applications to combustion of biomass. Fuel Process Technol 105(1):46–51
Lughofer E, Macian V, Guardiola C, Klement EP (2011) Identifying static and dynamic prediction models for NOx emissions with evolving fuzzy systems. Appl Soft Comput 11(2):2487–2500
Marković D, Petković D, Nikolić V, Milovančević M, Petković B (2017) Soft computing prediction of economic growth based in science and technology factors. Phys A 465:217–220
Moayedi H, Hayati S (2018) Modelling and optimization of ultimate bearing capacity of strip footing near a slope by soft computing methods. Appl Soft Comput 29:1393–1409
Parsaie A, Haghiabi AH, Saneie M, Torabi H (2018) Applications of soft computing techniques for prediction of energy dissipation on stepped spillways. Neural Comput Appl 29:1393–1409
Peng X, Tang Y, Du W, Qian F (2017) Online performance monitoring and modeling paradigm based on justintime learning and ELM for a nonGaussian chemical process. Ind Eng Chem Res 56(23):6671–6684
Perrone MP, Cooper LN (1993) When networks disagree: ensemble methods for hybrid neural networks. In: Mammone RJ (ed) Artificial neural networks for speech and vision. Chapman & Hall, London, pp 126–142
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn ++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern C Appl Rev 31(4):497–508
Pratama M, Pedrycz W, Lughofer E (2018) Evolving ensemble fuzzy classifier. IEEE Trans Fuzzy Syst 26(5):2552–2567
Shen KY, Tzeng GH (2015) A decision rulebased soft computing model for supporting financial performance improvement of the banking industry. Soft Comput 19:859–874
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576
Suykens JAK, Gestel TV, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Tatinati S, Veluvolu KC, Wei TA (2015) Multistep prediction of physiological tremor based on machine learning for robotics assisted microsurgery. IEEE Trans Cybern 45(2):328–339
Tian HX, Mao ZZ (2010) An ensemble ELM based on modified AdaBoost.RT algorithm for predicting the temperature of molten steel in ladle furnace. IEEE Trans Autom Sci Eng 7(1):73–85
Vandechali MR, AbbaspourFard MH, Rohani A (2018) Development of a prediction model for estimating tractor engine torque based on soft computing and low cost sensors. Measurement 121:83–95
Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New York
Wang X (2017) Ladle furnace temperature prediction model based on largescale data with random forest. IEEE/CAA J Autom Sin 4(4):770–774
Wang L, Jin H, Chen X, Dai J, Yang K, Zhang D (2016a) Soft sensor development based on the hierarchical ensemble of Gaussian process regression models for nonlinear and nonGaussian chemical processes. Ind Eng Chem Res 55(28):7704–7719
Wang X, You M, Mao Z, Yuan P (2016b) Treestructure ensemble general regression neural networks applied to predict the molten steel temperature in ladle furnace. Adv Eng Inform 30(3):368–375
Wang X, Yuan P, Mao Z, You M (2016c) Molten steel temperature prediction model based on bootstrap feature subsets ensemble regression trees. Knowl Based Syst 101:48–59
Wang XJ, Wang XY, Zhang Q, Mao ZZ (2018) The soft sensor of the molten steel temperature using the modified maximum entropy based pruned bootstrap feature subsets ensemble method. Chem Eng Sci 189:401–412
Weigl E, Heidl W, Lughofer E, Radauer T, Eitzinger C (2016) On improving performance of surface inspection systems by online active learning and flexible classifier updates. Mach Vis Appl 27(1):103–127
Yan Y, Wang L, Wang T, Wang X, Hu Y, Duan Q (2018) Application of soft computing techniques to multiphase flow measurement: a review. Measurement 60:30–43
Yuan X, Ge Z, Huang B, Song Z, Wang Y (2017) Semisupervised JITL framework for nonlinear industrial soft sensing based on locally semisupervised weighted PCR. IEEE Trans Industr Inf 13(2):99
Zou QY, Wang XJ, Zhou CJ, Zhang Q (2018) The memory degradation based online sequential extreme learning machine. Neurocomputing 275:2864–2879
Acknowledgements
The authors would like to acknowledge Professor ZhiZhong Mao for providing the data and suggestions. He is a PhD Supervisor at Northeastern University, and his research interests include control and optimization in complex industrial system.
Funding
This study was funded by the National Natural Science Foundation of China (No. 61702070) and the Research Projects of Liaoning Marine Fisheries Office (No. 201512).
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Deng, C., Wang, X., Gu, J. et al. The Online Soft Computing Models of key variables based on the Boundary Forest method. Soft Comput 24, 10815–10828 (2020). https://doi.org/10.1007/s00500019045841
Published:
Issue Date:
Keywords
 Industrial process
 Key variables
 Soft computing
 Machine learning
 Online prediction
 Big data