Skip to main content

Parallel Gene Clustering Using MapReduce

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8597))

Included in the following conference series:

Abstract

Data clustering has been considered as one of the most important techniques for unsupervised learning in diverse applications. Gene clustering is to find out groups of genes similarly expressed in large size of microarray data. Meanwhile, recent development of microarray technology generates a very large number of microarray data with low cost and handles more than 10,000 genes simultaneously in one chip. Thus, high performance computing of gene clustering has become increasingly important in microarray data analysis. In this paper, we propose a scalable parallel gene clustering method using the MapReudce programming model. The proposed method utilizes the k-means algorithm for identifying similar groups of genes. Experiment results show that the proposed method can offer good scalability with data size increases, and different numbers of nodes, and it can also provide effective clustering results against real microarray data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Castekkanos-Garzon, J.A., Diaz, F.: An evolutionary computationary model applied cluster analysis of DNA microarray data. Expert Syst. Appl. 40, 2575–2951 (2013)

    Article  Google Scholar 

  2. Yi, G., Sze, S.-H., Thon, M.R.: Indenifying clustering functionally related genes in genomes. Bioinformatics 23(9), 1053–1060 (2007)

    Article  Google Scholar 

  3. Zhihua, D., Wang, Y., Ji, Z.: PK-means: A new algorithm for gene clustering. Comput. Biol. Chem. 32, 243–247 (2008)

    Article  MATH  Google Scholar 

  4. Cordeiro, R.L.F., Traina, C. Jr., Traina, A.J.M., Lopez, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: International Conference on Knowledge and Data Discovery (2011)

    Google Scholar 

  5. Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Appl. Stat. 28, 126–130 (1979)

    Article  Google Scholar 

  6. Lam, Y.K., Tsang, P.W.M.: eXploratory K-means: a new simple and efficient algorithm for gene clustering. Appl. Soft Comput. 12, 1149–1157 (2012)

    Article  Google Scholar 

  7. Greene, W.A.: Unsupervised hierarchical clustering via genetic algorithm. In: Congress on Evolutionary Computation, pp. 998–1005 (2003)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov. Accessed 03 Feb 2014

  10. The Saccharomyces Genome Database(SGD). http://www.yeastgenome.org. Accessed 03 Feb 2014

  11. The Gene Ontology project(GO). http://www.geneontology.org/. Accessed 02 March 2014

  12. The XAMPP open source package. http://www.apachefriends.org/en/xampp.html. Accessed 02 March 2014

  13. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2), 203–209 (2002)

    Article  Google Scholar 

  14. Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., et al.: Functional discovery via a compendium of expression profiles. Cell 102(1), 109–126 (2000). Elsevier

    Article  Google Scholar 

  15. Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, Am. Soc. Cell Biol. 9(12), 3273–3297 (1998)

    Article  Google Scholar 

  16. Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on mapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Sun, Z.: A parallel clustering method study based on mapReduce. In: 1st International Workshop on Cloud Computing and Information Security, Atlantis Press (2013)

    Google Scholar 

Download references

Acknowledgment

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2013R1A1A2006236).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Byeong-Soo Jeong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Islam, A.K.M.T., Lim, CG., Jeong, BS. (2014). Parallel Gene Clustering Using MapReduce. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11538-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11537-5

  • Online ISBN: 978-3-319-11538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics