Adapting a Multi-SOM Clustering Algorithm to Large Banking Data

  • Imèn Khanchouch
  • Mohamed Limam
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 745)


It the recent years, Big Data (BD) has attracted researchers in many domains as a new concept providing opportunities to improve research applications including business, science, engineering. Big Data Analytics is becoming a practice that many researchers adopt to construct valuable information from BD. This paper presents the BD technologies and how BD is useful in Cluster Analysis. Then, a clustering approach named multi-SOM is studied. In doing so, a banking dataset is analyzed integrating R statistical tool with BD technologies that include Hadoop Distributed File System, HBase and Map Reduce. Hence, we aim to decrease the time execution of multi-SOM clustering method in determining the number of clusters using R and Hadoop. Results show the performance of integrating R and Hadoop to handle big data using multi-SOM clustering algorithm and to overcome the weaknesses of R.


Big data Big data analytics Clustering multiSOM RHadoop 



We are gratefully thankful to Mohamed Rahal for his helpful comments and suggestions.


  1. Chan, J.O.: Big data customer knowledge management. Commun. IIMA 14(3) (2014). Article 5Google Scholar
  2. Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  3. Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: International Conference on Collaboration Technologies and Systems (CTS) IEEE, pp. 48–55 (2013)Google Scholar
  4. Duhon, B.: It’s all in our heads. Assoc. Inf. Image Manage. Int. 12(8), 8–13 (1998)Google Scholar
  5. Douglas, L.: 3D data management: controlling data volume, velocity and variety, 6 Feb 2001Google Scholar
  6. Franke, B., Plante, J.-F., Roscher, R., et al.: Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)MathSciNetCrossRefGoogle Scholar
  7. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015)CrossRefGoogle Scholar
  8. García, S., Ramírez-Gallego, S., Luengo, J., et al.: Big data preprocessing: methods and prospects. Big Data Anal. 1, 9 (2016)CrossRefGoogle Scholar
  9. Ghouila, A., BenYahia, S., Malouche, D., Jmel, H., Laouini, D., Guerfali, Z., Abdelhak, S.: Application of multi-SOM clustering approach to macrophage gene expression analysis. Infect. Genet. Evol. 9, 328–329 (2009)CrossRefGoogle Scholar
  10. Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)Google Scholar
  11. Khan, Z., Vorley, T.: Big data text analytics: an enabler of knowledge management. J. Knowl. Manage. 21, 18–34 (2017)CrossRefGoogle Scholar
  12. Khanchouch, I., Charrad, M., Limam, M.: A comparative study of multi-SOM algorithms for determining the optimal number of clusters. Int. J. Future Comput. Commun. 4(3), 198–202 (2014)CrossRefGoogle Scholar
  13. Khanchouch, I., Charrad, M., Limam, M.: An improved multi-SOM algorithm for determining the optimal number of clusters. In: Computer and Information Science, pp. 189–201. Springer (2015)Google Scholar
  14. Kohonen, T.: Automatic formation of topological maps of patterns in a self-organizing system. In: Proceedings of the 2SCIA, Scand, Conference on Image Analysis, pp. 214–220 (1981)Google Scholar
  15. Lamirel, J.C.: Using artificial neural networks for mapping of science and technology: a multi self-organizing maps approach. Scientometrics 51, 267–292 (2001)CrossRefGoogle Scholar
  16. Lamirel, J.C.: Multisom: a multimap extension of the som model. Application to information discovery in an iconographic context, pp. 1790–1795 (2002)Google Scholar
  17. Liao, Z., Yin, Q., Huang, Y., Sheng, L.: Management and application of mobile big data. Int. J. Embed. Syst. 7(1), 63–70 (2014)CrossRefGoogle Scholar
  18. Sajana, T., Sheela Rani, C.M., Narayana, K.V.: A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 9 (2016)Google Scholar
  19. Shah, T., Rabhi, F., Ray, P.: Investigating an ontology-based approach for big data analysis of inter-dependent medical and oral health conditions. Cluster Comput. 18(1), 351–367 (2015)CrossRefGoogle Scholar
  20. Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. Int. J. Very Large Data Bases (VLDB J.) 8, 289–304 (2000)CrossRefGoogle Scholar
  21. Shen, J., Chang, S.I., Lee, E.S., Deng, Y., Brown, S.J.: Determination of cluster number in clustering microarray data. Appl. Math. Comput. 1172–1185 (2005)MathSciNetCrossRefGoogle Scholar
  22. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical big data analysis challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)CrossRefGoogle Scholar
  23. Tukey, J.W.: The Future of Data Analysis. Ann. Math. Stat. 33, 1–67 (1962)., MathSciNetCrossRefzbMATHGoogle Scholar
  24. ur Rehman, M.H., Liew, C.S., Abbas, A., et al.: Big data reduction methods: a survey. Data Science and Engineering l.1, 265–284 (2016)CrossRefGoogle Scholar
  25. Wu, Y., Yuan, G.-X., Ma, K.-L.: Visualizing flow of uncertainty through analytical processes. IEEE Trans. Visual. Comput. Graph. 18(12), 2526–2535 (2012)CrossRefGoogle Scholar
  26. Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth 10, 13–53 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.ISGUniversity of TunisTunisTunisia
  2. 2.University of DhofarSalalahOman

Personalised recommendations