A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks

Liu, Li; Fan, Dangping; Liu, Ming; Xu, Guandong; Chen, Shiping; Zhou, Yuan; Chen, Xiwei; Wang, Qianru; Wei, Yufeng

doi:10.1007/978-3-642-35527-1_12

A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks

Li Liu²²,
Dangping Fan²²,
Ming Liu²³,
Guandong Xu²⁴,
Shiping Chen^23,25,
Yuan Zhou²²,
Xiwei Chen²²,
Qianru Wang²² &
…
Yufeng Wei²⁶

Conference paper

3541 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Abstract

Clustering proteins or identifying functionally related proteins in Protein-Protein Interaction (PPI) networks is one of the most computation-intensive problems in the proteomic community. Most researches focused on improving the accuracy of the clustering algorithms. However, the high computation cost of these clustering algorithms, such as Girvan and Newmans clustering algorithm, has been an obstacle to their use on large-scale PPI networks. In this paper, we propose an algorithm, called Clustering-MR, to address the problem. Our solution can effectively parallelize the Girvan and Newmans clustering algorithms based on edge-betweeness using MapReduce. We evaluated the performance of our Clustering-MR algorithm in a cloud environment with different sizes of testing datasets and different numbers of worker nodes. The experimental results show that our Clustering-MR algorithm can achieve high performance for large-scale PPI networks with more than 1000 proteins or 5000 interactions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Maslov, S., Sneppen, K.: Specificity and stability in topology of protein networks. Science 296(5569), 910–913 (2002)
Article Google Scholar
Baraba’si, A., Oltvai, Z.N.: Network Biology: Understanding the Cell’s Functional Organization. Nature Reviews Genetics 5, 101–113 (2004)
Article Google Scholar
Satuluri, V., Parthasarathy, S.: Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, Paris, France, pp. 737–745 (2009)
Google Scholar
Hwang, W., Cho, Y., Zhang, A., Ramanathan, M.: CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions. BMC Bioinformatics 9(64) (2008)
Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)
Article MathSciNet MATH Google Scholar
Dunn, R., Dudbridge, F., Sanderson, C.M.: The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks. BMC Bioinformatics 6(39) (2005)
Google Scholar
Bader, D.A., Madduri, K.: Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks. In: International Conference on Parallel Processing (ICPP 2006), pp. 539–550 (2006)
Google Scholar
Madduri, K., Ediger, D., Jiang, K., Bader, D.A., Chavarria-Miranda, D.: A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), pp. 1–8 (2009)
Google Scholar
Tan, G., Tu, D., Sun, N.: A Parallel Algorithm for Computing Betweenness Centrality. In: International Conference on Parallel Processing (ICPP 2009), pp. 340–347 (2009)
Google Scholar
Maier, M., Rattigan, M., Jensen, D.: Indexing network structure with shortest-path tree. ACM Transactions on Knowledge Discovery from Data 5(3) (2011)
Google Scholar
DIP Database, http://dip.doe-mbi.ucla.edu/

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Lanzhou University, Gansu, 730000, P.R.China
Li Liu, Dangping Fan, Yuan Zhou, Xiwei Chen & Qianru Wang
School of Electrical and Information Engineering, The University of Sydney, NSW, 2006, Australia
Ming Liu & Shiping Chen
Advanced Analytics Institute, University of Technology Sydney, NSW, 2008, Australia
Guandong Xu
CSIRO ICT Centre, Australia
Shiping Chen
The Third Peoples Hospital of Lanzhou, Gansu, 730050, P.R. China
Yufeng Wei

Authors

Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dangping Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guandong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shiping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qianru Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Wei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Fudan University, Handan Road 220, 200433, Shanghai, China
Shuigeng Zhou
Chinese Academy of Sciences, Academy of Mathematics and Systems Science, Dongguancun East Road 55, 100190, Beijing, China
Songmao Zhang
Department of Computer Science and Engineering, University of Minnesota, Union Street SE 200, 55455, Minneapolis, MN, USA
George Karypis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L. et al. (2012). A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-35527-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics