Neural Computing and Applications

, Volume 31, Issue 1, pp 11–25 | Cite as

A novel logistic-NARX model as a classifier for dynamic binary classification

  • Jose Roberto Ayala Solares
  • Hua-Liang WeiEmail author
  • Stephen A. Billings
Original Article


System identification and data-driven modeling techniques have seen ubiquitous applications in the past decades. In particular, parametric modeling methodologies such as linear and nonlinear autoregressive with exogenous input models (ARX and NARX) and other similar and related model types have been preferably applied to handle diverse data-driven modeling problems due to their easy-to-compute linear-in-the-parameter structure, which allows the resultant models to be easily interpreted. In recent years, several variations of the NARX methodology have been proposed that improve the performance of the original algorithm. Nevertheless, in most cases, NARX models are applied to regression problems where all output variables involve continuous or discrete-time sequences sampled from a continuous process, and little attention has been paid to classification problems where the output signal is a binary sequence. Therefore, we developed a novel classification algorithm that combines the NARX methodology with logistic regression and the proposed method is referred to as logistic-NARX model. Such a combination is advantageous since the NARX methodology helps to deal with the multicollinearity problem while the logistic regression produces a model that predicts categorical outcomes. Furthermore, the NARX approach allows for the inclusion of lagged terms and interactions between them in a straight forward manner resulting in interpretable models where users can identify which input variables play an important role individually and/or interactively in the classification process, something that is not achievable using other classification techniques like random forests, support vector machines, and k-nearest neighbors. The efficiency of the proposed method is tested with five case studies.


Nonlinear system identification Dynamic systems Binary classification NARX models Logistic regression 



The authors acknowledge the financial support to J. R. Ayala Solares from the University of Sheffield and the Mexican National Council of Science and Technology (CONACYT). The authors gratefully acknowledge that part of this work was supported by the Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/I011056/1 and Platform Grant EP/H00453X/1, and ERC Horizon 2020 Research and Innovation Action Framework Programme under Grant No 637302 (PROGRESS).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Billings SA (2013) Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. WileyGoogle Scholar
  2. 2.
    Söderström T, Stoica P (1989) System identification. Prentice HallGoogle Scholar
  3. 3.
    Pope KJ, Rayner PJW (1994) In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol IV, pp 457–460Google Scholar
  4. 4.
    Billings SA, Chen S, Backhouse RJ (1989) The identification of linear and non-linear models of a turbocharged automotive diesel engine. Mech Syst Signal Process 3(2):123Google Scholar
  5. 5.
    Billings SA, Wei HL (2007) Sparse model identification using a forward orthogonal regression algorithm aided by mutual information. IEEE Trans Neural Netw 18(1):306Google Scholar
  6. 6.
    Wei HL, Zhu DQ, Billings S, Balikhin MA (2007) Forecasting the geomagnetic activity of the Dst index using multiscale radial basis function networks. Adv Space Res 40(12):1863. Google Scholar
  7. 7.
    Billings SA, Wei HL (2008) An adaptive orthogonal search algorithm for model subset selection and non-linear system identification. Int J Control 81(5):714MathSciNetzbMATHGoogle Scholar
  8. 8.
    Wei HL, Billings SA (2008) Model structure selection using an integrated forward orthogonal search algorithm assisted by squared correlation and mutual information. Int J Model Ident Control 3(4):341Google Scholar
  9. 9.
    Alexandridis AK, Zapranis AD (2013) Wavelet neural networks: A practical guide. Neural Netw 42(0):1. doi: 10.1016/j.neunet.2013.01.008. zbMATHGoogle Scholar
  10. 10.
    Billings SA, Wei HL (2005) The wavelet-NARMAX representation: a hybrid model structure combining polynomial models with multiresolution wavelet decompositions. Int J Syst Sci 36(3): 137MathSciNetzbMATHGoogle Scholar
  11. 11.
    Billings SA, Wei HL (2005) A new class of wavelet networks for nonlinear system identification. IEEE Trans Neural Netw 16(4):862Google Scholar
  12. 12.
    Wei HL, Billings SA, Zhao Y, Guo L (2009) Lattice dynamical wavelet neural networks implemented using particle swarm optimization for spatio temporal system identification. IEEE Trans Neural Netw 20(1):181Google Scholar
  13. 13.
    Billings S, Wei HL, Balikhin MA (2007) Generalized multiscale radial basis function networks. Neural Netw 20(10): 1081. zbMATHGoogle Scholar
  14. 14.
    Koller D, Sahami M (1996) Toward optimal feature selection. In: 13th international conference on machine learning. Bari, Italy, pp 284–292Google Scholar
  15. 15.
    Wang S, Wei HL, Coca D, Billings SA (2013) Model term selection for spatio-temporal system identification using mutual information. Int J Syst Sci 44(2):223MathSciNetzbMATHGoogle Scholar
  16. 16.
    Speed T (2011) A correlation for the 21st century. Science 334(6062):1502. doi: 10.1126/science.1215894. Google Scholar
  17. 17.
    Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518. doi: 10.1126/science.1205438. zbMATHGoogle Scholar
  18. 18.
    Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6): 2769MathSciNetzbMATHGoogle Scholar
  19. 19.
    Székely GJ, Rizzo ML (2013) Energy statistics: A class of statistics based on distances. J Stat Plan Infer 143(8):1249MathSciNetzbMATHGoogle Scholar
  20. 20.
    Piroddi L, Spinelli W (2003) An identification algorithm for polynomial NARX models based on simulation error minimization. Int J Control 76(17):1767. doi: 10.1080/00207170310001635419 MathSciNetzbMATHGoogle Scholar
  21. 21.
    Ayala Solares J, Wei HL (2015) Nonlinear model structure detection and parameter estimation using a novel bagging method based on distance correlation metric. Nonlinear Dynamics, pp 1–15. doi: 10.1007/s11071-015-2149-3
  22. 22.
    Wei HL, Lang Z, Billings SA (2008) Constructing an overall dynamical model for a system with changing design parameter properties. Int J Model Ident Control 5(2):93Google Scholar
  23. 23.
    Li P, Wei HL, Billings SA, Balikhin MA, Boynton R (2013) Nonlinear model identification from multiple data sets using an orthogonal forward search algorithm. J Comput Nonlinear Dyn 8(4):10Google Scholar
  24. 24.
    Li Y, Wei HL, Billings S, Sarrigiannis P (2015) Identification of nonlinear time-varying systems using an online sliding-window and common model structure selection (CMSS) approach with applications to EEG. International Journal of Systems Science, pp 1–11. doi: 10.1080/00207721.2015.1014448  10.1080/00207721.2015.1014448
  25. 25.
    Guo Y, Guo L, Billings S, Wei HL (2015) An iterative orthogonal forward regression algorithm. Int J Syst Sci 46(5):776. doi: 10.1080/00207721.2014.981237 MathSciNetzbMATHGoogle Scholar
  26. 26.
    Guo Y, Guo LZ, Billings S, Wei HL (2015) Ultra-orthogonal forward regression algorithms for the identification of non-linear dynamic systems. Neurocomputing 173:715–723.
  27. 27.
    James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with application in r, Springer Texts in Statistics, vol 103. SpringerGoogle Scholar
  28. 28.
    Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression and survival analysis. SpringerGoogle Scholar
  29. 29.
    Pallant J (2013) SPSS survival manual. McGraw-Hill Education, UKGoogle Scholar
  30. 30.
    Breiman L (2001) Random forests. Mach Learn 45(1):5. doi: 10.1023/A%3A1010933404324 zbMATHGoogle Scholar
  31. 31.
    Vapnik VN (1998) Statistical learning theory. WileyGoogle Scholar
  32. 32.
    Kuhn M, Johnson K (2013) Applied predictive modeling. SpringerGoogle Scholar
  33. 33.
    Wei HL, Billings SA, Liu J (2004) Term and variable selection for non-linear system identification. Int J Control 77(1):86MathSciNetzbMATHGoogle Scholar
  34. 34.
    Rashid MT, Frasca M, Ali AA, Ali RS, Fortuna L, Xibilia MG (2012) Nonlinear model identification for Artemia population motion. Nonlinear Dyn 69(4):2237. doi: 10.1007/s11071-012-0422-2 MathSciNetGoogle Scholar
  35. 35.
    Wickham H (2016) R for Data Science. Hadley Wickham, Garrett Grolemund, O’Reilly, CanadaGoogle Scholar
  36. 36.
    Aguirre LA, Jácôme C (1998) Cluster analysis of NARMAX models for signal-dependent systems IEEE proceedings of the control theory and applications, vol 145. IET, pp 409–414Google Scholar
  37. 37.
    Feil B, Abonyi J, Szeifert F (2004) Model order selection of nonlinear input–output models—a clustering based approach. J Process Control 14(6):593Google Scholar
  38. 38.
    Kukreja SL, Lofberg J, Brenner MJ (2006) A least absolute shrinkage and selection operator (LASSO) for nonlinear system identification. In: IFAC proceedings volumes, vol 39, no 1, pp 814–819Google Scholar
  39. 39.
    Qin P, Nishii R, Yang ZJ (2012) Selection of NARX models estimated using weighted least squares method via GIC-based method and L 1-norm regularization methods. Nonlinear Dyn 70(3):1831. doi: 10.1007/s11071-012-0576-y Google Scholar
  40. 40.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301MathSciNetzbMATHGoogle Scholar
  41. 41.
    Hong X, Chen S (2012) An elastic net orthogonal forward regression algorithm 16th IFAC symposium on system identification, pp 1814–1819Google Scholar
  42. 42.
    Sette S, Boullart L (2001) Genetic programming: principles and applications. Eng Appl Artif Intell 14 (6):727Google Scholar
  43. 43.
    Madár J, Abonyi J, Szeifert F (2005) Genetic programming for the identification of nonlinear input–output models. Ind Eng Chem Res 44(9):3178Google Scholar
  44. 44.
    Baldacchino T, Anderson SR, Kadirkamanathan V (2012) Structure detection and parameter estimation for NARX models in a unified EM framework. Automatica 48(5):857MathSciNetzbMATHGoogle Scholar
  45. 45.
    Teixeira BO, Aguirre LA (2011) Using uncertain prior knowledge to improve identified nonlinear dynamic models. J Process Control 21(1):82Google Scholar
  46. 46.
    Billings SA, Voon WSF (1986) A prediction-error and stepwise-regression estimation algorithm for non-linear systems. Int J Control 44(1):235zbMATHGoogle Scholar
  47. 47.
    Dietterich TG (2002) Machine learning for sequential data: a review structural, syntactic, and statistical pattern recognition Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30Google Scholar
  48. 48.
    Aguirre LA, Letellier C (2009) Modeling nonlinear dynamics and chaos: a review. Math Probl Eng 2009:35MathSciNetzbMATHGoogle Scholar
  49. 49.
    Wei HL, Balikhin MA, Walker SN (2015) A new ridge basis function neural network for data-driven modeling and prediction 2015 10th international conference on computer science & education (ICCSE). IEEE, pp 125–130Google Scholar
  50. 50.
    Billings S, Mao K (1998) Model identification and assessment based on model predicted output. Tech. rep., Department of Automatic Control and Systems Engineering. The University of Sheffield, UKGoogle Scholar
  51. 51.
    Nepomuceno EG, Martins SAM (2016) A lower bound error for free-run simulation of the polynomial NARMAX. Syst Sci Control Eng 4(1):50. doi: 10.1080/21642583.2016.1163296 Google Scholar
  52. 52.
    Chen S, Billings S, Luo W (1989) Orthogonal least squares methods and their application to non-linear system identification. Int J Control 50(5):1873zbMATHGoogle Scholar
  53. 53.
    Komarek P (2004) Logistic regression for data mining and high-dimensional classification. Master’s thesis, Robotics Institute - School of Computer Science. Carnegie Mellon University , USAGoogle Scholar
  54. 54.
    Senawi A, Wei HL, Billings S (2017) A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking. Pattern Recognition. AcceptedGoogle Scholar
  55. 55.
    Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Opt Methods Softw 1(1):23Google Scholar
  56. 56.
    Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43(4):570MathSciNetzbMATHGoogle Scholar
  57. 57.
    Lichman M (2013) Breast cancer diagnosis and prognosis via linear programming. UCI machine learning repository.
  58. 58.
    WHO Breast cancer: prevention and control.
  59. 59.
    Wang T, Guan SU, Man KL, Ting TO (2014) EEG eye state identification using incremental attribute learning with time-series classification. Mathematical Problems in Engineering 2014 Google Scholar
  60. 60.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321zbMATHGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.Department of Automatic Control and Systems Engineering, Faculty of EngineeringThe University of SheffieldSheffieldUK

Personalised recommendations