Limited Data Modelling Approaches for Engineering Applications

Khayyam, Hamid; Golkarnarenji, Gelayol; Jazar, Reza N.

doi:10.1007/978-3-319-69480-1_12

Limited Data Modelling Approaches for Engineering Applications

Hamid Khayyam³,
Gelayol Golkarnarenji⁴ &
Reza N. Jazar³

Chapter
First Online: 31 January 2018

1480 Accesses
5 Citations

Abstract

In real-world situation, the process of data collection can be challenging and resource intensive, due to being costly, time consuming, and compute intensive. Thus, the amount of data needed to build accurate models is often limited. System identification, making decisions, and prediction based on limited data reduce the production yields, increase the production costs, and decrease the competitiveness of the enterprises; hence, developing an appropriate data model with smaller variance of forecasting error and good accuracy based on these small data sets helps the enterprises to meet the competitive environment. However, the mathematical deterministic approaches that solve problems based on existing theories with few amount of data are not part of this chapter. The chapter aims to review common data modelling techniques for limited data based on heuristic approaches. This review also provides an overview of some of the research to date on data modelling techniques with limited data for various engineering application areas.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abd, A. M., & Abd, S. M. (2017). Modelling the strength of lightweight foamed concrete using support vector machine (SVM). Case Studies in Construction Materials, 6, 8–15.
Article Google Scholar
Ali, A. B. M. S. (2009). Dynamic and advanced data Mining for Progressing Technological Development: Innovations and systemic approaches: Innovations and systemic approaches. Information Science Reference.
Google Scholar
Ali, S., & Smith, K. A. (2006). On learning algorithm selection for classification. Applied Soft Computing, 6, 119–138.
Article Google Scholar
Balabin, R. M., & Lomakina, E. I. (2011). Support vector machine regression (SVR/LS-SVM)—An alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data. Analyst, 136, 1703–1712.
Article Google Scholar
Barrett, J. D. (2007). Taguchi’s quality engineering handbook. Taylor & Francis.
Google Scholar
Bookstein, F. L. (1989). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 567–585.
Article MATH Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MathSciNet MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.
Article Google Scholar
Cai, Z.-j., Lu, S., & Zhang, X.-b. (2009). Tourism demand forecasting by support vector regression and genetic algorithm. In Computer science and information technology, 2009. ICCSIT 2009. 2nd IEEE international conference on (pp. 144–146).
Google Scholar
Catal, C., Sevim, U., & Diri, B. (2011). Practical development of an eclipse-based software fault prediction tool using naive Bayes algorithm. Expert Systems with Applications, 38(3), 2347–2353.
Article Google Scholar
Cawley, G. C., & Talbot, N. L. (2004). Efficient model selection for kernel logistic regression. In Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on (pp. 439–442).
Chapter Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131–159.
Article MATH Google Scholar
Chen, N. (2004). Support vector machine in chemistry. World Scientific Pub.
Google Scholar
Chen, Z.-S., Zhu, B., He, Y.-L., & Yu, L.-A. (2017). A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Engineering Applications of Artificial Intelligence, 59, 236–243.
Article Google Scholar
Cherkassky, V., & Mulier, F. M. (2007). Learning from data: Concepts, theory, and methods. Chichester: Wiley.
Book MATH Google Scholar
Cholette, M. E., Borghesani, P., Gialleonardo, E. D., & Braghin, F. (2017). Using support vector machines for the computationally efficient identification of acceptable design parameters in computer-aided engineering applications. Expert Systems with Applications, 81, 39–52.
Article Google Scholar
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83, 187–205.
Article Google Scholar
Cosma, G., Brown, D., Archer, M., Khan, M., & Graham Pockley, A. (2017). A survey on computational intelligence approaches for predictive modeling in prostate cancer. Expert Systems with Applications, 70, 1–19.
Article Google Scholar
Crowley, P. H. (1992). Resampling methods for computation-intensive data analysis in ecology and evolution. Annual Review of Ecology and Systematics, 23, 405–447.
Article Google Scholar
Curran-Everett, D. (2012). Explorations in statistics: Permutation methods. Advances in Physiology Education, 36, 181–187.
Article Google Scholar
Datla, M. V. (2015). Bench marking of classification algorithms: Decision trees and random forests - a case study using R. In 2015 international conference on trends in automation, communications and computing technology (I-TACT-15) (pp. 1–7).
Google Scholar
Davim, P. (2012). Computational methods for optimizing manufacturing technology models and techniques. Hershey: Engineering Science Reference.
Book Google Scholar
Davim, J. P. (2015). Design of Experiments in production engineering. Springer International Publishing.
Google Scholar
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application (Vol. 1). New York: Cambridge university press.
Book MATH Google Scholar
Dobre, T. G., & Sanchez Marcano, J. G. (2007). Chemical engineering: Modelling, simulation and similitude. Weinheim: Wiley-VCH Verlag GmbH & KGaA.
Book Google Scholar
Efron, B. (1992). Bootstrap methods: Another look at the jackknife. In Breakthroughs in statistics (pp. 569–593). Springer.
Google Scholar
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.
Google Scholar
Ertekin, S. (2012). K-NN. Available: https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec06.pdf.
Exterkate, P. (2013). Model selection in kernel ridge regression. Computational Statistics & Data Analysis, 68, 1–16.
Article MathSciNet Google Scholar
Fan, X., & Wang, L. (1996). Comparability of jackknife and bootstrap results: An investigation for a case of canonical correlation analysis. The Journal of Experimental Education, 64, 173–189.
Article Google Scholar
Fan, C., Xiao, F., & Zhao, Y. (2017). A short-term building cooling load prediction method using deep learning algorithms. Applied Energy, 195, 222–233.
Article Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Article MathSciNet MATH Google Scholar
Ghiasi, M. M., & Mohammadi, A. H. (n.d.). Application of decision tree learning in modelling CO2 equilibrium absorption in ionic liquids. Journal of Molecular Liquids.
Google Scholar
Golkarnarenji, G., Naebe, M., Church, J. S., Badii, K., Bab-Hadiashar, A., Atkiss, S., et al. (2017). Development of a predictive model for study of skin-core phenomenon in stabilization process of PAN precursor. Journal of Industrial and Engineering Chemistry, 49, 46–60.
Article Google Scholar
Gunn, S. R. (1998) Support vector machines for classification and regression.
Google Scholar
Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5, 989–993.
Article Google Scholar
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.
Google Scholar
Hoffman, J. I. E. (2015). Chapter 37 - resampling statistics. In Biostatistics for medical and biomedical practitioners (pp. 655–661). Academic Press.
Google Scholar
Hu, W., Yan, L., Liu, K., & Wang, H. (2016). A short-term traffic flow forecasting method based on the hybrid PSO-SVR. Neural Processing Letters, 43, 155–172.
Article Google Scholar
Huang, C. (2002). Information diffusion techniques and small-sample problem. International Journal of Information Technology & Decision Making, 1, 229–249.
Article Google Scholar
Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35, 137–161.
Article MathSciNet MATH Google Scholar
Hunter, D., Yu, H., Pukish, M. S., III, Kolbusz, J., & Wilamowski, B. M. (2012). Selection of proper neural network sizes and architectures—A comparative study. IEEE Transactions on Industrial Informatics, 8, 228–240.
Article Google Scholar
Ilin, A., & Raiko, T. (2010). Practical approaches to principal component analysis in the presence of missing values. Journal of Machine Learning Research, 11, 1957–2000.
MathSciNet MATH Google Scholar
Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Arentze, T., & Timmermans, H. (2006). Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. European Journal of Operational Research, 175, 16–34.
Article MATH Google Scholar
Keerthi, S. S., & Lin, C.-J. (2003). Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15, 1667–1689.
Article MATH Google Scholar
Kermani, B. G., Schiffman, S. S., & Nagle, H. T. (2005). Performance of the Levenberg–Marquardt neural network training method in electronic nose applications. Sensors and Actuators B: Chemical, 110, 13–22.
Article Google Scholar
Khayyam, H., Naebe, M., Bab-Hadiashar, A., Jamshidi, F., Li, Q., Atkiss, S., et al. (2015a). Stochastic optimization models for energy management in carbonization process of carbon fiber production. Applied Energy, 158, 643–655.
Article Google Scholar
Khayyam, H., Naebe, M., Zabihi, O., Zamani, R., Atkiss, S., & Fox, B. (2015b). Dynamic prediction models and optimization of Polyacrylonitrile (PAN) stabilization processes for production of carbon fiber. IEEE Transactions on Industrial Informatics, 11, 887–896.
Article Google Scholar
Khayyam, H., Fakhrhoseini, S. M., Church, J. S., Milani, A. S., Bab-Hadiashar, A., Jazar, R. N., et al. (2017). Predictive modelling and optimization of carbon fiber mechanical properties through high temperature furnace. Applied Thermal Engineering, 125, 1539–1554.
Article Google Scholar
Kohavi, R. (n.d.). A study of cross-validation and bootstrap for accuracy estimation and model selection.
Google Scholar
Kulkarni, S., & Harman, G. (2011). An elementary introduction to statistical learning theory (Vol. 853). Wiley.
Google Scholar
Lanouette, R., Thibault, J., & Valade, J. L. (1999). Process modeling with neural networks using small experimental datasets. Computers & Chemical Engineering, 23, 1167–1176.
Article Google Scholar
Lawal, I. A. (2011). Predictive modeling of material properties using GMDH-based Abductive networks. In Y. O. Mohammed (Ed.), Modelling symposium (AMS), 2011 fifth Asia (pp. 3–6).
Chapter Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444., 05/28/print.
Article Google Scholar
Leopold, E., & Kindermann, J. (2006). Content classification of multimedia documents using partitions of low-level features. JVRB - Journal of Virtual Reality and Broadcasting, 3, 2007.
Google Scholar
Li, F., & Pengfei, L. (2013). The research survey of system identification method. presented at the 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).
Google Scholar
Li, D.-C., & Wen, I. H. (2014a). A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing, 143, 222–230.
Article Google Scholar
Li, D.-C., & Wen, I.-H. (2014b). A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing, 143, 222–230.
Article Google Scholar
Li, D.-C., Wu, C., & Chang, F. M. (2006). Using data continualization and expansion to improve small data set learning accuracy for early flexible manufacturing system (FMS) scheduling. International Journal of Production Research, 44, 4491–4509.
Article MATH Google Scholar
Li, D.-C., Chang, C.-J., Chen, C.-C., & Chen, W.-C. (2012). A grey-based fitting coefficient to build a hybrid forecasting model for small data sets. Applied Mathematical Modelling, 36, 5101–5108.
Article MathSciNet MATH Google Scholar
Liaw, A., & Wiener, M. (n.d.). Classification and regression by randomForest.
Google Scholar
Liu, H., Chen, G., Song, G., & Han, T. (2009). Analog circuit fault diagnosis using bagging ensemble method with cross-validation. In Mechatronics and automation, 2009. ICMA 2009. International conference on (pp. 4430–4434).
Google Scholar
Lu, Z. J., Xiang, Q., Wu, Y. m., & Gu, J. (2015). Application of support vector machine and genetic algorithm optimization for quality prediction within complex industrial process. In 2015 I.E. 13th international conference on industrial informatics (INDIN) (pp. 98–103).
Chapter Google Scholar
Mao, R., Zhu, H., Zhang, L., & Chen, A. (2006). A new method to assist small data set neural network learning. In Intelligent systems design and applications, 2006. ISDA’06. Sixth international conference on (pp. 17–22).
Google Scholar
Montgomery, D. C. (2008). Design and analysis of experiments. Wiley.
Google Scholar
Motulsky, H. J., & Ransnas, L. A. (1987). Fitting curves to data using nonlinear regression: A practical and nonmathematical review. The FASEB Journal, 1, 365–374.
Article Google Scholar
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86, 2196–2209.
Article Google Scholar
Pal, M., & Foody, G. M. (2012). Evaluation of SVM, RVM and SMLR for accurate image classification with limited ground data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5, 1344–1355.
Article Google Scholar
Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.
Google Scholar
Peng, X., Cai, Y., Li, Q., & Wang, K. (2017). Control rod position reconstruction based on K-nearest neighbor method. Annals of Nuclear Energy, 102, 231–235.
Article Google Scholar
Pham, Q. T. (1998). Dynamic optimization of chemical engineering processes by an evolutionary method. Computers & Chemical Engineering, 22, 1089–1097.
Article Google Scholar
Politis, D. N., Romano, J. P., & Wolf, M. (1999). Subsampling. New York: Springer.
Book MATH Google Scholar
Powell, M. (1965). A method for minimizing a sum of squares of non-linear functions without calculating derivatives. The Computer Journal, 7, 303–307.
Article MathSciNet MATH Google Scholar
Rasmuson, A., Andersson, B., Olsson, L., & Andersson, R. (2014a). Mathematical modeling in chemical engineering. New York: Cambridge University Press.
Book Google Scholar
Rasmuson, A., Andersson, B., Olsson, L., & Andersson, R. (2014b). Mathematical modeling in chemical engineering. New York: Cambridge University Press.
Book Google Scholar
Rasmussen, C. E. (2004). Gaussian processes in machine learning. In Advanced lectures on machine learning (pp. 63–71). Berlin: Springer.
Chapter Google Scholar
Ratner, B. (2011). Statistical and machine-learning data mining: Techniques for better predictive modeling and analysis of big data. CRC Press.
Google Scholar
Rodrigues, A. E., & Minceva, M. (2005). Modelling and simulation in chemical engineering: Tools for process innovation. Computers & Chemical Engineering, 29, 1167–1183.
Article Google Scholar
Rokach, L., & Maimon, O. (2014). Data mining with decision trees: Theory and applications. World scientific.
Google Scholar
Ross, S. M. (2009). Introduction to probability and statistics for engineers and scientists. Elsevier Science.
Google Scholar
Ruparel, N. H., Shahane, N. M., & Bhamare, D. P. (n.d.). Learning from small data set to build classification model: A survey.
Google Scholar
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear estimation and classification (Vol. 171, p. 149). Springer.
Google Scholar
Sharma, A., & Paliwal, K. K. (2015). Linear discriminant analysis for the small sample size problem: An overview. International Journal of Machine Learning and Cybernetics, 6, 443–454.
Article Google Scholar
Shatovskaya, T., Repka, V., & Good, A. (2006). Application of the Bayesian networks in the informational modeling. In 2006 international conference - modern problems of radio engineering, telecommunications, and computer science (pp. 108–108).
Chapter Google Scholar
Stenger, T. -K. K. B. Available: http://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html.
Tsai, T.-I., & Li, D.-C. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications, 35, 1293–1300.
Article Google Scholar
Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10, 988–999.
Article Google Scholar
Vapnik, V. N., & Vapnik, V. (1998). Statistical learning theory (Vol. 1). New York: Wiley.
MATH Google Scholar
Vapnik, V., Golowich, S. E., & Smola, A. (1996). Support vector method for function approximation, regression estimation, and signal processing. In Advances in neural information processing systems (Vol. 9).
Google Scholar
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67–82.
Article Google Scholar
Xu, J., Yao, L., & Li, L. (2015). Argumentation based joint learning: A novel ensemble learning approach. PLoS One, 10, e0127281.
Article Google Scholar
Yesilnacar, E., & Topal, T. (2005). Landslide susceptibility mapping: A comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79, 251–266.
Article Google Scholar
Yoo, K., Shukla, S. K., Ahn, J. J., Oh, K., & Park, J. (2016). Decision tree-based data mining and rule induction for identifying hydrogeological parameters that influence groundwater pollution sensitivity. Journal of Cleaner Production, 122, 277–286.
Article Google Scholar
Zhang, C.-X., Zhang, J.-S., & Zhang, G.-Y. (2008). An efficient modified boosting method for solving classification problems. Journal of Computational and Applied Mathematics, 214, 381–392.
Article MathSciNet MATH Google Scholar
Zhen, H., Hong, L., Mujiao, F., & Chunbi, X. (2010). Application of statistical learning theory to predict corrosion rate of injecting water pipeline. In Cognitive informatics (ICCI), 2010 9th IEEE international conference on (pp. 132–136).
Google Scholar
Zhou, J., & Huang, J. (2012). Support-vector modeling and optimization for microwave filters manufacturing using small data sets. In Industrial informatics (INDIN), 2012 10th IEEE international conference on (pp. 202–207).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, RMIT University, Melbourne, VIC, Australia
Hamid Khayyam & Reza N. Jazar
Institute for Frontier Materials, Carbon Nexus, Deakin University, Waurn Ponds, VIC, Australia
Gelayol Golkarnarenji

Authors

Hamid Khayyam
View author publications
You can also search for this author in PubMed Google Scholar
Gelayol Golkarnarenji
View author publications
You can also search for this author in PubMed Google Scholar
Reza N. Jazar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Khayyam .

Editor information

Editors and Affiliations

Xiamen University of Technology, Xiamen, China
Liming Dai
Xiamen University of Technology, Xiamen, China
Reza N. Jazar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Khayyam, H., Golkarnarenji, G., Jazar, R.N. (2018). Limited Data Modelling Approaches for Engineering Applications. In: Dai, L., Jazar, R. (eds) Nonlinear Approaches in Engineering Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-69480-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-69480-1_12
Published: 31 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69479-5
Online ISBN: 978-3-319-69480-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics