Abstract
Data sets grow rapidly, driven by increasing storage capacities as well as by the wish to equip more and more devices with sensors and connectivity. In mechanical engineering Big Data offers new possibilities to gain knowledge from existing data for product design, manufacturing, maintenance and failure prevention. Typical interests when analyzing Big Data are the identification of clusters, the assignment to classes or the development of regression models for prediction. This paper assesses various Big Data approaches and chooses adequate clustering and classification solutions for a data set of simulated jet engine signals and life spans. These solutions include k-means clustering, linear discriminant analysis and neural networks. MATLAB is chosen as the programming environment for implementation because of its dissemination in engineering disciplines. The suitability of MATLAB as a tool for Big Data analysis is to be evaluated. The results of all applied clustering and classification approaches are discussed and prospects for further adaption and transferability to other scenarios are pointed out.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kitchin, R.: The Data Revolution. SAGE, Los Angeles (2014)
Franks, B.: Taming the Big Data Tidal Wave. Wiley, Hoboken (2012)
Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety, https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 01 June 2018
Demchenko, Y., Grosso, P., Laat, C., de Membrey, P.: Addressing Big Data issues in scientific data infrastructure. In: IEEE (ed.) 2013 International Conference on Collaboration Technologies and Systems (CTS) (2013)
Long, C., Talbot, K., Gill, K. (eds.): Data Science & Big Data Analytics. Wiley, Indianapolis (2015)
Simon, P.: Too Big to Ignore. Wiley, Hoboken (2013)
Iafrate, F.: From Big Data to Smart Data. Wiley, Hoboken (2015)
Aggarwal, C.C.: Data Mining. Springer, Cham (2015)
Discroll, T.A.: Learning MATLAB. Society for Industrial and Applied Mathematics, Philadelphia (2009)
NASA Prognostics Center of Excellence: PCoE Datasets. https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/. Accessed 06 Sept 2017
Saxena, A., Goebel, K.: Turbofan Engine Degradation Simulation Data Set. https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository. Accessed 14 June 2018
Kitchin, R.: Big Data, new epistemologies and paradigm shifts. SAGE J. Big Data Soc. (2014)
Louridas, P., Ebert, C.: Machine learning. IEEE Softw. 33(5), 110–115 (2016)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Ester, M., Sander, J.: Knowledge Discovery in Databases. Springer, Berlin (2000)
Shindler, M.: Approximation Algorithms for the Metric k-Median Problem. UCLA, Los Angeles (2008)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SIAM (ed.) SODA 2007: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, Philadelphia (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2017)
Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classification. Springer, New York (2016)
Genuer, R., Poggi, J.-M., Tuleau-Malot, C., Villa-Vialaneix, N.: Random forests for Big Data. In: Big Data Research, pp. 22–46 (2017)
Schlittgen, R.: Multivariate Statistik. Oldenbourg, München (2009)
The MathWorks Inc.: Create and Visualize Discriminant Analysis Classifier. https://de.mathworks.com/help/stats/create-and-visualize-discriminant-analysis-classifier.html. Accessed 2018 Sep 2017
Nielsen, M.: Using Neural Nets to Recognize Handwritten Digits. http://neuralnetworksanddeeplearning.com/chap1.html. Accessed 27 Mar 2018
The MathWorks Inc.: Tansig: Hyperbolic Tangent Sigmoid Transfer Function. https://de.mathworks.com/help/nnet/ref/tansig.html. Accessed 28 Mar 2018
Russell, S., Norvig, P.: Künstliche Intelligenz, 3., aktualisierte. Pearson, München (2012)
Kolen, J.F., Kremer, S.C. (eds.): A Field Guide to Dynamical Recurrent Networks. IEEE, New York (2001)
Alpaydin, E.: Introduction to Maschine Learning. MIT Press, Cambridge (2004)
The MathWorks Inc.: Tainml: Levenberg–Marquardt Backpropagation. https://de.mathworks.com/help/nnet/ref/trainlm.html. Accessed 27 Mar 2018
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963)
Hagan, M.T., Menhaj, M.: Training feed-forward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 5(6), 989–993 (1994)
TIOBE: TIOBE Index for March 2018. https://www.tiobe.com/tiobe-index/. Accessed 21 Mar 2018
GitHut: Top Active Languages. http://githut.info/. Accessed 21 Mar 2018
Ramasso, E., Saxena, A.: Performance benchmarking and analysis of prognostic methods for CMAPSS datasets. Int. J. Prognstics Health Manag. 5(2), 1–5 (2014)
The MathWorks Inc.: Big Data Workflow Using Tall Arrays and Datastores. https://de.mathworks.com/help/distcomp/big-data-workflow-using-tall-arrays-and-datastores.html. Accessed 27 Mar 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pitz, K., Anderl, R. (2019). Implementing Clustering and Classification Approaches for Big Data with MATLAB. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2018. FTC 2018. Advances in Intelligent Systems and Computing, vol 880. Springer, Cham. https://doi.org/10.1007/978-3-030-02686-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-02686-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02685-1
Online ISBN: 978-3-030-02686-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)