Scalable Co-clustering Algorithms

Kwon, Bongjune; Cho, Hyuk

doi:10.1007/978-3-642-13119-6_3

Bongjune Kwon²⁰ &
Hyuk Cho²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6081))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1941 Accesses
7 Citations

Abstract

Co-clustering has been extensively used in varied applications because of its potential to discover latent local patterns that are otherwise unapparent by usual unsupervised algorithms such as k-means. Recently, a unified view of co-clustering algorithms, called Bregman co-clustering (BCC), provides a general framework that even contains several existing co-clustering algorithms, thus we expect to have more applications of this framework to varied data types. However, the amount of data collected from real-life application domains easily grows too big to fit in the main memory of a single processor machine. Accordingly, enhancing the scalability of BCC can be a critical challenge in practice. To address this and eventually enhance its potential for rapid deployment to wider applications with larger data, we parallelize all the twelve co-clustering algorithms in the BCC framework using message passing interface (MPI). In addition, we validate their scalability on eleven synthetic datasets as well as one real-life dataset, where we demonstrate their speedup performance in terms of varied parameter settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmad, W., Zhou, J., Khokhar, A.: SPHier: scalable parallel biclustering using weighted bigraph crossing minimization. Technical report, Dept. of ECE, University of Illinois at Chicago (2004)
Google Scholar
Banerjee, A., Dhillon, I.S., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research 8, 1919–1986 (2007)
MathSciNet Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)
Google Scholar
Cho, H., Dhillon, I.S.: Co-clustering of human cancer microarrays using minimum sum-squared residue co-clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE/ACM TCBB) 5(3), 385–400 (2008)
Article Google Scholar
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum squared residue based co-clustering of gene expression data. In: SDM, pp. 114–125 (2004)
Google Scholar
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS (2006)
Google Scholar
Dhillon, I.S., Modha, D.S.: A data clustering algorithm on distributed memory multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 245–260. Springer, Heidelberg (2000)
Chapter Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: SIGKDD, pp. 89–98 (2003)
Google Scholar
George, T., Merugu, S.: A scalable collaborative filtering framework based on coclustering. In: ICDM, pp. 625–628 (2005)
Google Scholar
IBM Quest synthetic data generation code for classification, http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html
Nagesh, H., Goil, S., Choudhary, A.: Parallel alogrithms for clustering high-dimensional large-scale datasets. In: Grossmen, R.L., Kamth, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds.) Data Mining for Scientific for Engineering Applications, pp. 335–356. Kluwer Academy Publishers, Dordrecht (2001)
Google Scholar
Pizzuti, C., Talia, D.: P-AutoClass: scalable parallel clustering for mining large data sets. IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE) 15(3), 629–641 (2003)
Article Google Scholar
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: HPCA (2007)
Google Scholar
Zhou, J., Khokar, A.: ParRescue: scalable parallel algorithm and implementation for biclustering over large distributed datasets. In: ICDCS (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Biomedical Engineering, The University of Texas at Austin, Austin, TX, 78712, USA
Bongjune Kwon
Computer Science, Sam Houston State University, Huntsville, TX, 77341-2090, USA
Hyuk Cho

Authors

Bongjune Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk Cho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Chung Hua University, 300, Hsinchu, Taiwan, China
Ching-Hsien Hsu
Department of Computer Science, St. Francis Xavier University, B2G 2W5, Antigonish, NS, Canada
Laurence T. Yang
Department of Computer Science ad Engineering, Seoul National University of Technology, 172 Gongreund 2-dong, Nowon-gou, 139-742, Seoul, Korea
Jong Hyuk Park
Division of Computer Engineering, Mokwon University, 302-729, Daejeon, Korea
Sang-Soo Yeo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kwon, B., Cho, H. (2010). Scalable Co-clustering Algorithms. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-13119-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13118-9
Online ISBN: 978-3-642-13119-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics