Adapting a Multi-SOM Clustering Algorithm to Large Banking Data
Abstract
It the recent years, Big Data (BD) has attracted researchers in many domains as a new concept providing opportunities to improve research applications including business, science, engineering. Big Data Analytics is becoming a practice that many researchers adopt to construct valuable information from BD. This paper presents the BD technologies and how BD is useful in Cluster Analysis. Then, a clustering approach named multi-SOM is studied. In doing so, a banking dataset is analyzed integrating R statistical tool with BD technologies that include Hadoop Distributed File System, HBase and Map Reduce. Hence, we aim to decrease the time execution of multi-SOM clustering method in determining the number of clusters using R and Hadoop. Results show the performance of integrating R and Hadoop to handle big data using multi-SOM clustering algorithm and to overcome the weaknesses of R.
Keywords
Big data Big data analytics Clustering multiSOM RHadoopNotes
Acknowledgement
We are gratefully thankful to Mohamed Rahal for his helpful comments and suggestions.
References
- Chan, J.O.: Big data customer knowledge management. Commun. IIMA 14(3) (2014). Article 5Google Scholar
- Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
- Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: International Conference on Collaboration Technologies and Systems (CTS) IEEE, pp. 48–55 (2013)Google Scholar
- Duhon, B.: It’s all in our heads. Assoc. Inf. Image Manage. Int. 12(8), 8–13 (1998)Google Scholar
- Douglas, L.: 3D data management: controlling data volume, velocity and variety, 6 Feb 2001Google Scholar
- Franke, B., Plante, J.-F., Roscher, R., et al.: Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)MathSciNetCrossRefGoogle Scholar
- Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015)CrossRefGoogle Scholar
- García, S., Ramírez-Gallego, S., Luengo, J., et al.: Big data preprocessing: methods and prospects. Big Data Anal. 1, 9 (2016)CrossRefGoogle Scholar
- Ghouila, A., BenYahia, S., Malouche, D., Jmel, H., Laouini, D., Guerfali, Z., Abdelhak, S.: Application of multi-SOM clustering approach to macrophage gene expression analysis. Infect. Genet. Evol. 9, 328–329 (2009)CrossRefGoogle Scholar
- Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)Google Scholar
- Khan, Z., Vorley, T.: Big data text analytics: an enabler of knowledge management. J. Knowl. Manage. 21, 18–34 (2017)CrossRefGoogle Scholar
- Khanchouch, I., Charrad, M., Limam, M.: A comparative study of multi-SOM algorithms for determining the optimal number of clusters. Int. J. Future Comput. Commun. 4(3), 198–202 (2014)CrossRefGoogle Scholar
- Khanchouch, I., Charrad, M., Limam, M.: An improved multi-SOM algorithm for determining the optimal number of clusters. In: Computer and Information Science, pp. 189–201. Springer (2015)Google Scholar
- Kohonen, T.: Automatic formation of topological maps of patterns in a self-organizing system. In: Proceedings of the 2SCIA, Scand, Conference on Image Analysis, pp. 214–220 (1981)Google Scholar
- Lamirel, J.C.: Using artificial neural networks for mapping of science and technology: a multi self-organizing maps approach. Scientometrics 51, 267–292 (2001)CrossRefGoogle Scholar
- Lamirel, J.C.: Multisom: a multimap extension of the som model. Application to information discovery in an iconographic context, pp. 1790–1795 (2002)Google Scholar
- Liao, Z., Yin, Q., Huang, Y., Sheng, L.: Management and application of mobile big data. Int. J. Embed. Syst. 7(1), 63–70 (2014)CrossRefGoogle Scholar
- Sajana, T., Sheela Rani, C.M., Narayana, K.V.: A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 9 (2016)Google Scholar
- Shah, T., Rabhi, F., Ray, P.: Investigating an ontology-based approach for big data analysis of inter-dependent medical and oral health conditions. Cluster Comput. 18(1), 351–367 (2015)CrossRefGoogle Scholar
- Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. Int. J. Very Large Data Bases (VLDB J.) 8, 289–304 (2000)CrossRefGoogle Scholar
- Shen, J., Chang, S.I., Lee, E.S., Deng, Y., Brown, S.J.: Determination of cluster number in clustering microarray data. Appl. Math. Comput. 1172–1185 (2005)MathSciNetCrossRefGoogle Scholar
- Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical big data analysis challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)CrossRefGoogle Scholar
- Tukey, J.W.: The Future of Data Analysis. Ann. Math. Stat. 33, 1–67 (1962). https://doi.org/10.1214/aoms/1177704711, http://projecteuclid.org/euclid.aoms/1177704711 MathSciNetCrossRefMATHGoogle Scholar
- ur Rehman, M.H., Liew, C.S., Abbas, A., et al.: Big data reduction methods: a survey. Data Science and Engineering l.1, 265–284 (2016)CrossRefGoogle Scholar
- Wu, Y., Yuan, G.-X., Ma, K.-L.: Visualizing flow of uncertainty through analytical processes. IEEE Trans. Visual. Comput. Graph. 18(12), 2526–2535 (2012)CrossRefGoogle Scholar
- Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth 10, 13–53 (2016)CrossRefGoogle Scholar