Abstract
It the recent years, Big Data (BD) has attracted researchers in many domains as a new concept providing opportunities to improve research applications including business, science, engineering. Big Data Analytics is becoming a practice that many researchers adopt to construct valuable information from BD. This paper presents the BD technologies and how BD is useful in Cluster Analysis. Then, a clustering approach named multi-SOM is studied. In doing so, a banking dataset is analyzed integrating R statistical tool with BD technologies that include Hadoop Distributed File System, HBase and Map Reduce. Hence, we aim to decrease the time execution of multi-SOM clustering method in determining the number of clusters using R and Hadoop. Results show the performance of integrating R and Hadoop to handle big data using multi-SOM clustering algorithm and to overcome the weaknesses of R.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chan, J.O.: Big data customer knowledge management. Commun. IIMA 14(3) (2014). Article 5
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: International Conference on Collaboration Technologies and Systems (CTS) IEEE, pp. 48–55 (2013)
Duhon, B.: It’s all in our heads. Assoc. Inf. Image Manage. Int. 12(8), 8–13 (1998)
Douglas, L.: 3D data management: controlling data volume, velocity and variety, 6 Feb 2001
Franke, B., Plante, J.-F., Roscher, R., et al.: Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015)
García, S., Ramírez-Gallego, S., Luengo, J., et al.: Big data preprocessing: methods and prospects. Big Data Anal. 1, 9 (2016)
Ghouila, A., BenYahia, S., Malouche, D., Jmel, H., Laouini, D., Guerfali, Z., Abdelhak, S.: Application of multi-SOM clustering approach to macrophage gene expression analysis. Infect. Genet. Evol. 9, 328–329 (2009)
Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)
Khan, Z., Vorley, T.: Big data text analytics: an enabler of knowledge management. J. Knowl. Manage. 21, 18–34 (2017)
Khanchouch, I., Charrad, M., Limam, M.: A comparative study of multi-SOM algorithms for determining the optimal number of clusters. Int. J. Future Comput. Commun. 4(3), 198–202 (2014)
Khanchouch, I., Charrad, M., Limam, M.: An improved multi-SOM algorithm for determining the optimal number of clusters. In: Computer and Information Science, pp. 189–201. Springer (2015)
Kohonen, T.: Automatic formation of topological maps of patterns in a self-organizing system. In: Proceedings of the 2SCIA, Scand, Conference on Image Analysis, pp. 214–220 (1981)
Lamirel, J.C.: Using artificial neural networks for mapping of science and technology: a multi self-organizing maps approach. Scientometrics 51, 267–292 (2001)
Lamirel, J.C.: Multisom: a multimap extension of the som model. Application to information discovery in an iconographic context, pp. 1790–1795 (2002)
Liao, Z., Yin, Q., Huang, Y., Sheng, L.: Management and application of mobile big data. Int. J. Embed. Syst. 7(1), 63–70 (2014)
Sajana, T., Sheela Rani, C.M., Narayana, K.V.: A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 9 (2016)
Shah, T., Rabhi, F., Ray, P.: Investigating an ontology-based approach for big data analysis of inter-dependent medical and oral health conditions. Cluster Comput. 18(1), 351–367 (2015)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. Int. J. Very Large Data Bases (VLDB J.) 8, 289–304 (2000)
Shen, J., Chang, S.I., Lee, E.S., Deng, Y., Brown, S.J.: Determination of cluster number in clustering microarray data. Appl. Math. Comput. 1172–1185 (2005)
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical big data analysis challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Tukey, J.W.: The Future of Data Analysis. Ann. Math. Stat. 33, 1–67 (1962). https://doi.org/10.1214/aoms/1177704711, http://projecteuclid.org/euclid.aoms/1177704711
ur Rehman, M.H., Liew, C.S., Abbas, A., et al.: Big data reduction methods: a survey. Data Science and Engineering l.1, 265–284 (2016)
Wu, Y., Yuan, G.-X., Ma, K.-L.: Visualizing flow of uncertainty through analytical processes. IEEE Trans. Visual. Comput. Graph. 18(12), 2526–2535 (2012)
Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth 10, 13–53 (2016)
Acknowledgement
We are gratefully thankful to Mohamed Rahal for his helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Khanchouch, I., Limam, M. (2018). Adapting a Multi-SOM Clustering Algorithm to Large Banking Data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 745. Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-77703-0_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77702-3
Online ISBN: 978-3-319-77703-0
eBook Packages: EngineeringEngineering (R0)