Cluster Computing

, Volume 22, Supplement 5, pp 12381–12388 | Cite as

Appliance of effective clustering technique for gene expression datasets using GPU

  • V. SaveethaEmail author
  • S. Sophia
  • P. D. R. Vijayakumar


The study of medical datasets for analytical purpose is made possible by the innovation of different data mining techniques. Microarrays make simultaneous monitoring of genes under several conditions. Finding out co-expressed genes and coherent patterns is the main goal in bioinformatics research. Cluster analysis of gene expression data has been proven to be a valuable tool for finding biologically groups of genes. The mutual information criteria of the algorithm try to measure the dependency among gene variables. Simulated annealing is applied to solve the local minima problem of K-means algorithm. The improvements in the algorithm utilized further enhances with the use of parallelization techniques. The computational tasks in data mining can be effectively performed by graphics processing units (GPU). An optimized K-means implementation on the GPU using compute unified device architecture (CUDA) of NVIDIA is used as the programming environment. Importance is given on optimizations directly working on data parallel architecture to best use the computational capabilities available. The algorithm is performed in a hybrid manner, parallelizing simulated annealing K-means based on mutual information criteria (MIK). A performance study on medical dataset is performed, demonstrating a maximum 7\(\times \) speed increase. Experimental analysis shows that the proposed method performs well on gene expression data. The performances of the new clustering methods are compared with those of some existing methods. It is seen that the clustering algorithm based on a combined metric of mutual information and Euclidean distance metric achieves the best performance.


Clustering K-means Gene microarray Simulated annealing Parallelization GPGPU 


  1. 1.
    Akutsu, T., Miyano, S., Kuhara, S.: Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics 16(8), 727–734 (2000)CrossRefGoogle Scholar
  2. 2.
    Debouck, C., Goodfellow, P.N.: DNA microarrays in drug discovery and development. Nat. Genet. 21, 48–50 (1999)CrossRefGoogle Scholar
  3. 3.
    Shmulevich, I., Zhang, W.: Binary analysis and optimization-based normalization of gene expression data. Bioinformatics 18(4), 555–565 (2002)CrossRefGoogle Scholar
  4. 4.
    Chandrasekhar, T., Thangavel, K., Elayaraja, E.: Effective clustering algorithms for gene expression data, arXiv:1201.4914, (2012)
  5. 5.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  6. 6.
    Luo, F., Tang, K., Khan, L.: Hierarchical clustering of gene expression data, In: Proceedings of the Third IEEE Symposium in Bioinformatics and Bioengineering, IEEE, pp. 328–335, (2003)Google Scholar
  7. 7.
    Zechner, M., Granitzer, M.: K-means on the graphics processor: design and experimental analysis. Inte. J. Adv. Syst. Meas. 2(3), 224–235 (2009)Google Scholar
  8. 8.
    Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(4), 281–297 (1999)CrossRefGoogle Scholar
  9. 9.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, pp. 281–297, (1967)Google Scholar
  10. 10.
    Al-Shboul, B., Myaeng, S.H.: Initializing k-means using genetic algorithms. World Acad. Sci. Eng. Technol. 54(30), 114–118 (2009)Google Scholar
  11. 11.
    Pei, J., Zhao, L., Dong, X., Dong, X.: Effective algorithm for determining the number of clusters and its application in image segmentation. Clust. Comput. 20(4), 2845–2854 (2017)CrossRefGoogle Scholar
  12. 12.
    Zhou, X., Wang, X., Dougherty, E.R., Russ, D., Suh, E.: Gene clustering based on clusterwide mutual information. J. Comput. Biol. 11(1), 147–161 (2004)CrossRefGoogle Scholar
  13. 13.
    Suresh, A., Shunmuganathan, K.I.: Image texture classification using gray level co-occurrence matrix based statistical features. Eur. J. Sci. Res. 75(4), 591–597 (2012)Google Scholar
  14. 14.
    Niknam, T., Fard, E.T., Pourjafarian, N., Rousta, A.: An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering. Eng. Appl. Artif. Intell. 24(2), 306–317 (2011)CrossRefGoogle Scholar
  15. 15.
    Hall, J.D., Hart, J.C.: GPU acceleration of iterative clustering. In: Proceedings of the ACM Workshop on General Purpose Computing on Graphics Processors, (2004)Google Scholar
  16. 16.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)CrossRefGoogle Scholar
  17. 17.
    Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-means clustering on GPUs. PDPT 13(2), 212–312 (2008)Google Scholar
  18. 18.
    Shalom, S.A., Dash, M., Tue, M.: Efficient k-means clustering using accelerated graphics processors. In: Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Springer Berlin, pp. 166–175, (2008)Google Scholar
  19. 19.
    Hong-Tao, B., Li-li, H., Dan-tong, O., Zhan-shan, L., He, L.: K-means on commodity GPUs with CUDA. Comput. Sci. Inf. Eng. 3, 651–655 (2009)Google Scholar
  20. 20.
    Zechner, M., Granitzer, M.: Accelerating k-means on the graphics processor via cuda. In: Proceedings of the First International Conference on Intensive Applications and Services, INTENSIVE’09, IEEE, pp. 7–15, (2009)Google Scholar
  21. 21.
    Suresh, A.: An efficient view classification of echocardiogram using morphological operations. J. Theor. Appl. Inf. Technol. 67(3), 732–735 (2014)MathSciNetGoogle Scholar
  22. 22.
    Ma, W., Agrawal, G.: A translation system for enabling data mining applications on GPUs. In: Proceedings of the 23rd International Conference on Supercomputing, ACM, pp. 400–409, (2009)Google Scholar
  23. 23.
    Chiosa, I., Kolb, A.: GPU-based multilevel clustering. IEEE Trans. Visual Comput. Gr. 17(2), 132–145 (2011)CrossRefGoogle Scholar
  24. 24.
    Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-means algorithm by gpus. In: Proceedings of the IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 115–122, (2010)Google Scholar
  25. 25.
    Suresh, A., Varatharajan, R.: Competent resource provisioning and distribution techniques for cloud computing environment. Clust. Comput. 6, 1–8 (2017). CrossRefGoogle Scholar
  26. 26.
    Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)CrossRefGoogle Scholar
  27. 27.
    Junger, D., Hundt, C., Domínguez, J.G., Schmidt, B.: Speed and accuracy improvement of higher-order epistasis detection on CUDA-enabled GPUs. Clust. Comput. 20(3), 1–10 (2017)CrossRefGoogle Scholar
  28. 28.
  29. 29.
    Delgado, M.: Shannon information and the mutual information of two random variableGoogle Scholar
  30. 30.
    Chen, H. and Flann, N.S., Parallel simulated annealing and genetic algorithms: a space of hybrid methods. In: Proceedings of the International Conference on Parallel Problem Solving from Nature. Springer, Berlin. pp. 428–438, (1994)Google Scholar
  31. 31.
    Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach. Elsevier, Amsterdam (2016)Google Scholar
  32. 32.
    Li, J.,Liu, H.: Kent ridge bio-medical data set repository, Institute for Infocomm Research. (2002)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • V. Saveetha
    • 1
    Email author
  • S. Sophia
    • 2
  • P. D. R. Vijayakumar
    • 1
  1. 1.Department of ITInfo Institute of EngineeringCoimbatoreIndia
  2. 2.Department of ECESri Krishna College of Engineering and TechnologyCoimbatoreIndia

Personalised recommendations