Advertisement

Parallel Computation of a MMDBM Algorithm on GPU Mining with Big Data

  • S. Sivakumar
  • S. Vidyanandini
  • Soumya Ranjan NayakEmail author
  • S. Sundar
Chapter
Part of the Studies in Big Data book series (SBD, volume 49)

Abstract

Big data is the collection of data sets which are large and complex in nature. It contains structured and unstructured types of data. For example, Financial Services, Retail, Manufacturing, Healthcare, Social network (Twitter, Fackbook, Linkedin and Google), Digital pictures and Videos. To extract useful data from big data, several classifiers like SLIQ, SPRINT, MMDBM are used. Among this one of the fast classifier is the Mixed Mode data Based Miner (MMDBM) using Graphical Processor Unit (GPU) mining. This classifier describes the outline of parallel computing with high performance, using radix algorithm for multicore GPUs, by taking a program presented by Compute Unified Device Architecture (CUDA). The classifier can deal with both categorical and numerical attributes in a simple manner. The classification method handles big data with huge number of attributes by taking it from the medical data base. This can be parallelized on GPU to get high-speed and better performance than CPU-Radix sort algorithm. We proposed the parallelized Radix sort algorithm on GPU computing using CUDA platform developed by NVIDIA Corporation. In this chapter, we discuss the performance of fast classifier method and radix algorithm to relate the processing time of MMDBM, SLIQ CPU with GPU computing and computed acceleration ratio (Speed-up) time. Also, The classifiers [SLIQ, SPRINT, MMDBM] are evaluated and compared with CPU and GPU. GPU provides quick and accurate results with least processing time and supports real time applications.

Keywords

Classification GPU mining Decision Tree Radix sort 

References

  1. 1.
    NVIDIA Corporation.: NVIDIA CUDA Programming Guild, 3.2 edn. (2010)Google Scholar
  2. 2.
    NVIDIA Corporation.: NVIDIA CUDA Best Practices Guild, 3.2 edn. (2010)Google Scholar
  3. 3.
    Chiu, C.C., Luo, G.H., Yuan, S.M.: A decision tree using CUDA GPUs, iiWAS ’11. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, pp. 399–402Google Scholar
  4. 4.
    Nayak, J., Naik, B., Jena, A. K., Barik, R. K., & Das, H.: Nature inspired optimizations in cloud computing: applications and challenges. In: Cloud Computing for Optimization: Foundations, Applications, and Challenges, pp. 1–2. Springer, Cham (2018)Google Scholar
  5. 5.
    Shapiro, G.P., Frawley, W.J..: Knowledge Discovery in Databases. AAAI/MIT Press (1991)Google Scholar
  6. 6.
    Breiman, L. et al.: Wadswort, Classification and Regression Trees, Belmont (1984)Google Scholar
  7. 7.
    Sundar, S., Srikanth, D., Shanmugam, M.S.: A new predictive classifier for improved performance in data mining: object oriented design and implementation. In: Proceedings of the International Conference on Industrial Mathematics, pp. 491–514. IIT Bombay, Narosa, (2006)Google Scholar
  8. 8.
    Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, pp. 18–32 (1996)Google Scholar
  9. 9.
    Agarwal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of International Conference Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  10. 10.
    Shafer, C.J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases, pp. 544–555 (1996)Google Scholar
  11. 11.
    Barik, R.K., Tripathi, A., Dubey, H., Lenka, R.K., Pratik, T., Sharma, S., Das, H.: Mistgis: optimizing geospatial data analysis using mist computing. In: Progress in Computing. Analytics and Networking, pp. 733–742. Springer, Singapore (2018)Google Scholar
  12. 12.
    Panchatcharam, M., Sundar, S., Vetrivel, V., Klar, A., Tiwari, S.: GPU computing for meshfree particle method. Int. J. Numer. Anal. Model. Ser. B 4, 394–412 (2013)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Panigrahi, C.R., Tiwary, M., Pati, B., Das, H.: Big data and cyber foraging: future scope and challenges. In: Techniques and Environments for Big Data Analysis, pp. 75–100. Springer, Cham (2016)Google Scholar
  14. 14.
    Mishra, B.S.P., Das, H., Dehuri, S., Jagadev, A.K.: Cloud Computing for Optimization: Foundations, Applications, and Challenges, vol. 39. Springer (2018)Google Scholar
  15. 15.
    Reddy, K.H.K., Das, H., Roy, D.S.: A Data Aware Scheme for Scheduling Big-Data Applications with SAVANNA Hadoop. Futures of Network. CRC Press (2017)Google Scholar
  16. 16.
    Das, H., Naik, B., Behera, H.S.: Classification of Diabetes Mellitus Disease (DMD): A Data Mining (DM) Approach. Progress in Computing. Analytics and Networking, pp. 539–549. Springer, Singapore (2018)Google Scholar
  17. 17.
    Sarkar, J.L., Panigrahi, C.R., Pati, B., Das, H.: A novel approach for real-time data management in wireless sensor networks. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 599–607. Springer, New Delhi (2016)Google Scholar
  18. 18.
    Barik, R.K., Dubey, H., Misra, C., Borthakur, D., Constant, N., Sasane, S.A., Mankodiya, K.: Fog assisted cloud computing in Era of big data and internet-of-things: systems, architectures, and applications. In: Cloud Computing for Optimization: Foundations. Applications, and Challenges, pp. 367–394. Springer, Cham (2018)Google Scholar
  19. 19.
    Kar, I., Parida, R.R., Das, H.: Energy aware scheduling using genetic algorithm in cloud data centers. In International Conference on IEEE Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 3545–3550), Mar 2016Google Scholar
  20. 20.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for many core GPUs. In: Proceedings of IEEE International Symposium on Parallel & Distributed Processing (2009)Google Scholar
  21. 21.
    Sahani, R., Rout, C., Badajena, J.C., Jena, A.K., Das, H.: Classification of Intrusion Detection Using Data Mining Techniques. Progress in Computing. Analytics and Networking, pp. 753–764. Springer, Singapore (2018)Google Scholar
  22. 22.
    Das, H., Jena, A. K., Nayak, J., Naik, B., Behera, H.S.: A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for classification. In: Computational Intelligence in Data Mining-Volume 2, pp. 461–471. Springer, New Delhi (2015)Google Scholar
  23. 23.
    Dusseau, A.C., Culler, D.E., Schauser, K.E., Martin, R.P.: Fast parallel sorting under LogP: experience with the CM-5. IEEE Trans. Parallel Distrib. Syst. 7(8), 791–805 (1996)CrossRefGoogle Scholar
  24. 24.
    Grand, S.L.: In: Nguyen, H. (ed.) Broad-Phase Collision Detection with CUDA, in GPU Gems 3. Addison-Wesley Professional, ch. 32, pp. 697–721 (2007)Google Scholar
  25. 25.
    Coremen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithm, 2nd edn. MIT Press (2001)Google Scholar
  26. 26.
    Zagha, M., Blelloch, G.E.: Radix sort for vector multiprocessors. In: Proceedings of ACM/IEEE Conference on supercomputing, pp. 712–721 (1991)Google Scholar
  27. 27.
    Nasridinov, A., Lee, Y., Park, Y.-H.: Decision tree construction on GPU: ubiquitous parallel computing approach. Computing 96, 403–413 (2014)Google Scholar
  28. 28.
    Harris, M.: CUDPP:CUDA Data-Parallel Primitives Library 1.1.1, NVIDIA, UCDAVIS,29 (2010). http://code.google.com/p/cudpp/
  29. 29.
    AKGÖEK, Ö.: A rule induction algorithm for knowledge discovery and classification. Turk. J. Electr. Eng. Comput. Sci. 21, 1223–1241 (2013)Google Scholar
  30. 30.
    Sundar, S., Panchatcharam, S.: Finite pointset method for 2D dam-break problem with GPU acceleration. Int. J. Appl. Math. 25, 547–557 (2012)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Michile, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning Neural and Statistical Classification. Ellis Horwood (1994)Google Scholar
  32. 32.
    Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., Swami, A.: An interval classifier for database mining application. In: Proceedings of the VLDB Conference, pp. 560–573 (1992)Google Scholar
  33. 33.
    Nasridinov, A., Lee, Y., Park, Y.-H.: Decision tree construction on GPU: ubiquitous parallel computing approach. Computing 96, 403–413 (2014)CrossRefGoogle Scholar
  34. 34.
    Sarkhel, P., Das, H., Vashishtha, L.K.: Task-scheduling algorithms in cloud environment. In: Computational Intelligence in Data Mining, pp. 553–562. Springer, Singapore (2017)Google Scholar
  35. 35.
    Sivakumar, S., Nayak, S.R., Vidyanandini, S., Palai, G.: An empirical study of supervised learning methods for breast cancer diseases. Int. J. Light Electron Opt. 175, 105–114 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • S. Sivakumar
    • 1
  • S. Vidyanandini
    • 2
  • Soumya Ranjan Nayak
    • 1
    Email author
  • S. Sundar
    • 3
  1. 1.Department of Computer Science and EngineeringKoneru Lakshmaiah Education FoundationVaddeswaram, GunturIndia
  2. 2.Department of MathematicsSRM Institute of Science and TechnologyChennaiIndia
  3. 3.Department of MathematicsIndian Institute of Technology MadrasChennaiIndia

Personalised recommendations