Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster

  • Ivan KholodEmail author
  • Aleksey Malov
  • Sergey Rodionov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9247)


This article describes an approach to parallelizing of data mining algorithms, implemented in functional programming language, for distributed data processing in cluster. Here are provided requirements for the functions which form these algorithms for their conversion into parallel type. As an example we describe Naive Bayes algorithm implementation in Common Lisp language, its conversion into parallel type and execution on cluster with MPI system.


Data mining Distributed data mining Distributed information processing Functional language 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Paul, S.: Parallel and Distributed Data Mining, New Fundamental Technologies in Data Mining. Funatsu, K. (ed.), pp. 43–54 (2011)Google Scholar
  2. 2.
    Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining, pp. 1–23. Springer-Verlag, Heidelberg (2000)Google Scholar
  3. 3.
    Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classier for data mining. In: Proc. of the Fifth Intl. Conference on Extending Database Technology (EDBT), Avignon, France (1996)Google Scholar
  4. 4.
    Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classier for data mining. In: 22nd VLDB Conference (1996)Google Scholar
  5. 5.
    Kufrin, R.: Decision trees on parallel processors. In: Geller, J., Kitano, H., Suttner, C. (eds.) Parallel Processing for Artiffcial Intelligence 3. Elsevier-Science (1997)Google Scholar
  6. 6.
    Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multi-processors. In: Supercomputing 1996 (1996)Google Scholar
  7. 7.
    Cheung, D., Hu, K., Xia, S.: Asynchronous parallel algorithm for mining association rules on shared-memory multi-processors. In: 10th ACM Symp. Parallel Algorithms and Architectures (1998)Google Scholar
  8. 8.
    Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: 4th Intl. Conf. Parallel and Distributed Info. Systems (1996)Google Scholar
  9. 9.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: an International Journal 1(4), 343–373 (1997)CrossRefGoogle Scholar
  10. 10.
    Johnson, E.L., Kargupta, H.: Collective, hierarchical clustering from distributed, heterogeneous data. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 221–244. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Goil, S.H.N., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Center for Parallel and Distributed Computing, Northwestern University (1999)Google Scholar
  12. 12.
    Judd, D., McKinley, P., Jain, A.: Large-scale parallel data clustering. In: Intl Conf. Pattern Recognition (1996)Google Scholar
  13. 13.
    Kashef, R.: Cooperative Clustering Model and Its Applications. PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering (2008)Google Scholar
  14. 14.
    Hammouda, K.M., Kamel, M.S.: Distributed collaborative web document clustering using cluster keyphrase summaries. Information Fusion 9(4), 465–480 (2008)CrossRefGoogle Scholar
  15. 15.
    Deb, D., Angryk, R.A.: Distributed document clustering using word-clusters. In: IEEE Symposium on Computational Intelligenceand Data mining, CIDM 2007, pp. 376–383 (2007)Google Scholar
  16. 16.
    Wrobel, S., Dzeroski, S.: The ILP description learning problem: towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3) (1995)Google Scholar
  17. 17.
    Kerdprasop, N., Kerdprasop, K.: Mining Frequent Patterns with Functional Programming. International Journal of Computer, Information, Systems and Control Engineering 1(1), 120–125 (2007)Google Scholar
  18. 18.
    Amanda, C., King, R.: Data mining the yeast genome in a lazy functional language.
  19. 19.
    Aleksovski, D., Erwig, M., Dzeroski, S.: A Functional Programming Approach to Distance-based Machine Learning.
  20. 20.
    Bloomfield, V.A.: Using R for Numerical Analysis in Science and Engineering. Chapman & Hall/CRC p. 359 (2014)Google Scholar
  21. 21.
    Common Warehouse Metamodel Specification.
  22. 22.
    Kholod, I., Karshiyev, Z., Shorov, A.: Formal model of data mining algorithms for algorithm parallelization. The nineteenth international multi-conference on advanced computer systems (ACS 2014). Artificial Intelligence, Software Technologies Biometrics and Information Technology Security (AISBIS 2014), Międzyzdroje, Poland, pp. 385–394, October 22–24, 2014Google Scholar
  23. 23.
    Domingos, P., Pazzani M.: On the optimality of the simple Bayesian classifier under zero-one loss (1997)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Saint Petersburg Electrotechnical University “LETI”Saint PetersburgRussia
  2. 2.Motorola Solutions, Business Centre “T4”Saint PetersburgRussia

Personalised recommendations