Skip to main content

A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid

  • Conference paper
High Performance Computing for Computational Science - VECPAR 2006 (VECPAR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4395))

Abstract

The work described in this paper tackles the problem of data mining and classification of large amounts of data using the K nearest neighbours classifier (KNN) [1]. The large computing demand of this process is solved with a parallel computing implementation specially designed to work in Grid environments of multiprocessor computer farms. The different parallel computing approaches (intra-node, inter-node and inter-organisations) are not sufficient by themselves to face the computing demand of such a big problem. Instead of using parallel techniques separately, we propose to combine the three of them considering the parallelism grain of the different parts of the problem. The main purpose is to complete a 1 month-CPU job in a few hours. The technologies that are being used are the EGEE Grid Computing Infrastructure running the Large Hadron Collider Computing Grid (LCG 2.6) middleware [3], MPI [4] [5] and POSIX [6] threads. Finally, we compare the results obtained with the most popular and used tools to understand the importance of this strategy.

Topics: Grid, Parallel Computing, Threads and Data Mining.

The authors wish to thank the financial support received from the Spanish Ministry of Science and Technology to develop the GRID-IT project (TIC2003-01318).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cover, T.M., Hart, P.E.: Nearest neighbour pattern recognition. IEEE Trans. on Information Theory 13(1), 2127 (1967)

    Article  Google Scholar 

  2. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications 15(3) (2001), http://www.globus.org/research/papers/anatomy.pdf

  3. LCG: World Wide Web Computing Grid. Distributed Production Environment of Physics Data Processing. http://lcg.web.cern.ch/LCG

  4. Message Passing Interface Forum: MPI: A message-passing interface standard (2003), http://www.mpi-forum.org/

  5. Gropp, W., et al.: MPI: The Complete Reference. MIT Press, Cambridge (1998)

    Google Scholar 

  6. Drepper, U., Molnar, I.: The Native POSIX Thread Library for Linux (2003), http://people.redhat.com/drepper/nptl-design.pdf

  7. Frank, E., Hall, M., L.T.: Weka 3: Data Mining Software in Java (2005), http://www.cs.waikato.ac.nz/ml/weka

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michel Daydé José M. L. M. Palma Álvaro L. G. A. Coutinho Esther Pacitti João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Aparício, G., Blanquer, I., Hernández, V. (2007). A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2006. VECPAR 2006. Lecture Notes in Computer Science, vol 4395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71351-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71351-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71350-0

  • Online ISBN: 978-3-540-71351-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics