Data Mining in a Parallel Environment
In this section, we give a very brief introduction to parallel computing, with the aim of giving to the reader the basic knowledge needed to understand the parallel version of some of the data mining techniques discussed in this book.Avery simple example of a parallel algorithm is presented in Section 9.2. A parallel version of the k-means algorithm, the k-nearest neighbor decision rule, and the training phases of a neural network and a support vector machine are presented in Section 9.3.
When there is the need to analyze a large amount of data, the parallel computing paradigm can be used to fulfill these tasks and also reduce both the computational time and the memory requirement. A parallel environment is a machine or a set of machines in which more processors can simultaneously work on the same task. When working in a parallel environment, the computational time needed for carrying a standard algorithm out is sped up, because it is performed in parallel on more processors. The basic idea is to split the problem at hand into smaller subproblems that can be solved on different processors simultaneously. Each processor can also have a private memory in which it can store its own data. This reduces the memory requirement on each single processor.
KeywordsParallel Computing Parallel Algorithm Parallel Machine Message Passing Interface Data Mining Technique
Unable to display preview. Download preview PDF.