Skip to main content

A Functional Approach to Parallelizing Data Mining Algorithms in Java

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10421))

Included in the following conference series:

  • 1099 Accesses

Abstract

We describe a new approach to parallelizing data mining algorithms. We use the representation of an algorithm as a sequence of functions and we use higher-order functions to express parallel execution. Our approach generalizes the popular MapReduce programming model by enabling not only data-parallel, but also task-parallel implementation and a combination of both. We implement our approach as an extension of the industrial-strength library Xelopes, and we illustrate it by developing a multi-threaded Java program for the 1R classification algorithm, with experiments on a multi-core processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Paul, S.: Parallel and distributed data mining. In: Funatsu, K. (ed.) New Fundamental Technologies in Data Mining, Karunya University, Coimbatore, India, pp. 43–54 (2011). ISBN 978-953-307-547-1

    Google Scholar 

  2. Zaki, M.: Parallel and distributed association mining : a survey. IEEE Concurrency 7(4), 14–25 (1999)

    Article  Google Scholar 

  3. Kim, W.: Parallel clustering algorithms: survey. In: CSC 8530 Parallel Algorithms. Spring (2009). http://s3-us-west-2.amazonaws.com/mlsurveys/46.pdf

  4. Satuluri, V.: A survey of parallel algorithms for classification (2007). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.126.5567

  5. Dean, J. Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation. San Francisco (2004)

    Google Scholar 

  6. Lammel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Program. 70(1), 1–30 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gorlatch, S.: Extracting and implementing list homomorphism in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Rasch, A., Gorlatch, S.: Multi-dimensional homomorphisms and their implementation in OpenCL. Int. J. Parallel Prog. 45, 300–319 (2017)

    Article  Google Scholar 

  9. Ng, A.Y., Bradski, G., Chu, C.-T., Olukotun, K., Kim, S.K., Lin, Y.-A., Yu, Y.Y.: Map-Reduce for machine learning on multicore. In: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 281–288 (2006)

    Google Scholar 

  10. Machine learning library (MLlib) guide. http://spark.apache.org/docs/latest/mllib-guide.html

  11. Grant ingersoll, introducing apache mahout. http://www.ibm.com/developerworks/java/library/j-mahout/

  12. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 533 p. Springer, New York (2001)

    Book  MATH  Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman, San Francisco (2001)

    MATH  Google Scholar 

  14. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)

    Article  MATH  Google Scholar 

  15. Witten, I.H., Eibe, F., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn., 629 pp. Morgan Kaufmann, San Francisco (2011)

    Google Scholar 

  16. Bernstein, A.J.: Program analysis for parallel processing. IEEE Trans. Electron. Comput. EC-15, 757–762 (1966)

    Article  MATH  Google Scholar 

  17. Prudsys Xelopes. https://prudsys.de/en/knowledge/technology/prudsys-xelopes/

  18. Rapid Miner. http://rapidminer.com/

Download references

Acknowledgments

This work was supported by the Ministry of Education and Science of the Russian Federation in the framework of the state order “Organization of Scientific Research”, task #2.6113.2017/BУ, and by the German Research Agency (DFG) in the framework of the Cluster of Excellence Cells-in-Motion at the University of Muenster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Kholod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kholod, I., Shorov, A., Gorlatch, S. (2017). A Functional Approach to Parallelizing Data Mining Algorithms in Java. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2017. Lecture Notes in Computer Science(), vol 10421. Springer, Cham. https://doi.org/10.1007/978-3-319-62932-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62932-2_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62931-5

  • Online ISBN: 978-3-319-62932-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics