Abstract
In this paper we describe a new hybrid distributed/shared memory parallel software for support vector machine learning on large data sets. The support vector machine (SVM) method is a well-known and reliable machine learning technique for classification and regression tasks. Based on a recently developed shared memory decomposition algorithm for support vector machine classifier design we increased the level of parallelism by implementing a cross validation routine based on message passing. With this extention we obtained a flexible parallel SVM software that can be used on high-end machines with SMP architectures to process the large data sets that arise more and more in bioinformatics and other fields of research.
Chapter PDF
Similar content being viewed by others
References
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Yu, H., Yang, J., Wang, W., Han, J.: Discovering compact and highly discriminative features or feature combinations of drug activities using support vector machines. In: 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003), Stanford, pp. 220–228. IEEE Computer Society, Los Alamitos (2003)
Vapnik, V.N.: Statistical learning theory. John Wiley & Sons, New York (1998)
Schölkopf, B.: The kernel trick for distances. In: NIPS, pp. 301–307 (2000)
Chen, N., Lu, W., Yang, J., Li, G.: Support vector machine in chemistry. World Scientific Pub. Co. Inc., Singapore (2004)
Thrun, S., Mitchell, T.M.: Learning one more thing. In: IJCAI, pp. 1217–1225 (1995)
Eitrich, T., Lang, B.: On the advantages of weighted L 1-norm support vector learning for unbalanced binary classification problems (to appear, 2006)
Eitrich, T., Lang, B.: Efficient optimization of support vector machine learning parameters for unbalanced datasets. JCAM (in press, 2005)
Chapelle, O., Vapnik, V.N., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46(1), 131–159 (2002)
Serafini, T., Zanghirati, G., Zanni, L.: Gradient projection methods for quadratic programs and applications in training support vector machines. Optimization Methods and Software 20(2-3), 353–378 (2005)
Skillicorn, D.: Strategies for parallelizing data mining. In: Proceedings of the Workshop on High-Performance Data Mining at IPPS/SPDP (1998)
Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1), 1–29 (2001)
Joshi, M.V., Karypis, G., Kumar, V.: ScalparC: a new scalable and efficient parallel classification algorithm for mining large datasets. In: IPPS: 11th International Parallel Processing Symposium, IEEE Computer Society Press, Los Alamitos (1998)
Hofer, J.: Distributed induction of decision tree classifier within the grid data mining framework: Gridminer-core. AURORA Technical Report 2004-04, Institute for Software Science, University of Vienna, Vienna (2004)
Kantabutra, S., Couch, A.L.: Parallel k-means clustering algorithm on NOWs. NECTEC Technical Journal 1, 243–248 (2000)
Callahan, P.B.: Optimal parallel all-nearest-neighbors using the well-separated pair decomposition. In: 34th Symp. Found. of Comp. Science, pp. 332–340. IEEE, Los Alamitos (1993)
Misra, M.: Parallel environments for implementing neural networks. Neural Computing Surveys 1, 48–60 (1997)
Jin, R., Yang, G., Agrawal, G.: Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. IEEE Transactions on Knowledge and Data Engineering 17(1), 71–89 (2005)
Schapire, R.E.: A brief introduction to boosting. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (1999)
Lazarevic, A., Obradovic, Z.: The distributed boosting algorithm. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp. 311–316. ACM Press, New York (2001)
Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Advances in Neural Information Processing Systems 17, pp. 521–528. MIT Press, Cambridge (2005)
Dong, J.X., Suen, C.Y.: A fast SVM training algorithm. International Journal of Pattern Recognition 17(3), 367–384 (2003)
Yu, H., Yang, J., Han, J.: Classifying large data sets using SVMs with hierarchical clusters. In: ACM SIGKDD, pp. 306–315 (2003)
Dong, J.X., Krzyzak, A., Suen, C.Y.: A fast parallel optimization for training support vector machines. In: Perner, P., Rosenfeld, A. (eds.) Proceedings of 3rd International Conference on Machine Learning and Data Mining, pp. 96–105 (2003)
Poulet, F.: Multi-way distributed SVM algorithms. In: Proc.of ECML/PKDD 2003 Int.Workshop on Parallel and Distributed Algorithms for Data Mining (2003)
Celis, S., Musicant, D.R.: Weka-parallel: machine learning in parallel. Computer Science Technical Report 2002b, Carleton College (2002)
Eitrich, T., Lang, B.: Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In: R. Berthold, M., Glen, R.C., Diederichs, K., Kohlbacher, O., Fischer, I. (eds.) CompLife 2005. LNCS (LNBI), vol. 3695, pp. 253–264. Springer, Heidelberg (2005)
Qiu, S., Lane, T.: Parallel computation of RBF kernels for support vector classifiers. In: SDM (2005)
Serafini, T., Zanghirati, G., Zanni, L.: Parallel decomposition approaches for training support vector machines. In: ParCo 2003, pp. 259–266 (2003)
Eitrich, T., Lang, B.: Shared memory parallel support vector machine learning. Technical Report FZJ-ZAM-IB-2005-11, Research Centre Jülich (2005)
IBM: ESSL - engineering and scientific subroutine library for aix version 4.1 (2003)
Detert, U.: Introduction to the JUMP architecture (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eitrich, T., Frings, W., Lang, B. (2006). HyParSVM – A New Hybrid Parallel Software for Support Vector Machine Learning on SMP Clusters. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_36
Download citation
DOI: https://doi.org/10.1007/11823285_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)