Parallel Gene Clustering Using MapReduce

Islam, A. K. M. Tauhidul; Lim, Chae-Gyun; Jeong, Byeong-Soo

doi:10.1007/978-3-319-11538-2_34

A. K. M. Tauhidul Islam²³,
Chae-Gyun Lim²³ &
Byeong-Soo Jeong²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8597))

Included in the following conference series:

International Conference on Web-Age Information Management

1933 Accesses
1 Citations

Abstract

Data clustering has been considered as one of the most important techniques for unsupervised learning in diverse applications. Gene clustering is to find out groups of genes similarly expressed in large size of microarray data. Meanwhile, recent development of microarray technology generates a very large number of microarray data with low cost and handles more than 10,000 genes simultaneously in one chip. Thus, high performance computing of gene clustering has become increasingly important in microarray data analysis. In this paper, we propose a scalable parallel gene clustering method using the MapReudce programming model. The proposed method utilizes the k-means algorithm for identifying similar groups of genes. Experiment results show that the proposed method can offer good scalability with data size increases, and different numbers of nodes, and it can also provide effective clustering results against real microarray data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Castekkanos-Garzon, J.A., Diaz, F.: An evolutionary computationary model applied cluster analysis of DNA microarray data. Expert Syst. Appl. 40, 2575–2951 (2013)
Article Google Scholar
Yi, G., Sze, S.-H., Thon, M.R.: Indenifying clustering functionally related genes in genomes. Bioinformatics 23(9), 1053–1060 (2007)
Article Google Scholar
Zhihua, D., Wang, Y., Ji, Z.: PK-means: A new algorithm for gene clustering. Comput. Biol. Chem. 32, 243–247 (2008)
Article MATH Google Scholar
Cordeiro, R.L.F., Traina, C. Jr., Traina, A.J.M., Lopez, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: International Conference on Knowledge and Data Discovery (2011)
Google Scholar
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Appl. Stat. 28, 126–130 (1979)
Article Google Scholar
Lam, Y.K., Tsang, P.W.M.: eXploratory K-means: a new simple and efficient algorithm for gene clustering. Appl. Soft Comput. 12, 1149–1157 (2012)
Article Google Scholar
Greene, W.A.: Unsupervised hierarchical clustering via genetic algorithm. In: Congress on Evolutionary Computation, pp. 998–1005 (2003)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov. Accessed 03 Feb 2014
The Saccharomyces Genome Database(SGD). http://www.yeastgenome.org. Accessed 03 Feb 2014
The Gene Ontology project(GO). http://www.geneontology.org/. Accessed 02 March 2014
The XAMPP open source package. http://www.apachefriends.org/en/xampp.html. Accessed 02 March 2014
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2), 203–209 (2002)
Article Google Scholar
Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., et al.: Functional discovery via a compendium of expression profiles. Cell 102(1), 109–126 (2000). Elsevier
Article Google Scholar
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, Am. Soc. Cell Biol. 9(12), 3273–3297 (1998)
Article Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on mapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Chapter Google Scholar
Sun, Z.: A parallel clustering method study based on mapReduce. In: 1st International Workshop on Cloud Computing and Information Security, Atlantis Press (2013)
Google Scholar

Download references

Acknowledgment

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2013R1A1A2006236).

Author information

Authors and Affiliations

Department of Computer Engineering, Kyung Hee University, 1-Seocheon-dong, Yongin-si, Gyeonggi-do, 446-701, Korea
A. K. M. Tauhidul Islam, Chae-Gyun Lim & Byeong-Soo Jeong

Authors

A. K. M. Tauhidul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Chae-Gyun Lim
View author publications
You can also search for this author in PubMed Google Scholar
Byeong-Soo Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Byeong-Soo Jeong .

Editor information

Editors and Affiliations

DEKE Lab., Renmin University of China, Beijing, China
Yueguo Chen
Institute for Information Systems, Technical University Braunschweig, Braunschweig, Germany
Wolf-Tilo Balke
Hong Kong Baptist University Dept. Computer Science, Kowloon Tong, Hong Kong SAR
Jianliang Xu
School of Information, Renmin University of China, Beijing, China
Wei Xu
School of Computer Science and Technology, Hefei, China
Peiquan Jin
Department of Computer Science, East China Normal University, Shanghai, China
Xin Lin
Department of Computer Science, Kean University, Wenzhou, China
Tiffany Tang
School of Electrical Engineering, Korea University, Seoul, Korea, Republic of (South Korea)
Eenjun Hwang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Islam, A.K.M.T., Lim, CG., Jeong, BS. (2014). Parallel Gene Clustering Using MapReduce. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-11538-2_34
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11537-5
Online ISBN: 978-3-319-11538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics