Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios

Biswas, Anupam; Arora, Gourav; Tiwari, Gaurav; Khare, Srijan; Agrawal, Vyankatesh; Biswas, Bhaskar

doi:10.1007/978-3-319-31861-5_3

Anupam Biswas²,
Gourav Arora²,
Gaurav Tiwari²,
Srijan Khare²,
Vyankatesh Agrawal² &
…
Bhaskar Biswas²

4433 Accesses
1 Altmetric

Abstract

Big Data systems are often confronted with storage and processing-related issues. Nowadays, data in various domains is growing so enormously and so quickly that storage and processing are becoming the two key concerns in such large systems of data. In addition to the size, complex relationship within the data is making the system highly sophisticated. Such complex relationships are often represented as network of data objects. Parallel processing, external memory algorithms, and data partitioning are at the forefront of techniques to deal with the Big Data issues. This chapter discusses these techniques in relation to storage and processing of Big Data. The Big Data partitioning techniques, such as agglomerative approaches in particular, have been studied and reported. Network data partitioning or clustering is common to most of the network-related applications where the objective is to group similar objects based on the connectivity among them. Application areas include social network analysis, World Wide Web, image processing, biological networks, supply chain networks, and many others. In this chapter, we discuss the relevant agglomerative approaches. Relative advantages with respect to Big Data scenarios are also presented. The discussion also covers the impact on Big Data scenarios with respect to strategic changes in the presented agglomerative approaches. Tuning of various parameters of agglomerative approaches is also addressed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Sys with Appl 42:2264–2275
Article Google Scholar
Li S, Wu D (2015) Modularity-based image segmentation. IEEE Trans Circuit Syst Video Technol 25:570–581
Article Google Scholar
Nikolaev AG, Razib R, Kucheriya A (2015) On efficient use of entropy centrality for social network analysis and community detection. Soc Networks 40:154–162
Article Google Scholar
Li S, Daie P (2014) Configuration of assembly supply chain using hierarchical cluster analysis. Procedia fCIRPg 17:622–627
Article Google Scholar
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: SIGMOD’10, 2010
Google Scholar
The Apache Software Foundation (2014) http://giraph.apache.org/. Accessed 28 Apr 2015
Xue J, Yang Z, Qu Z, Hou S, Dai Y (2014) Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of ACM HPDC’2014, Vancouver, Canada, 23–26 June
Google Scholar
Vial T (2012) http://blog.octo.com/en/introduction-to-large-scale-graph-processing/. Accessed 28 Apr 2015.
Ajwani D, Dementiev R, Meyer U (2006) A computational study of external-memory BFS algorithms, SODA 2006 ACM-SIAM Symposium on Discrete Algorithms, Miami, Florida, USA, January 2006
Google Scholar
Kanawati R (2011) Licod: leaders identification for community detection in complex networks. In: SocialCom/PASSAT, pp 577–582
Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 10:P10008
Article Google Scholar
Fan W, Yeung K (2015) Similarity between community structures of different online social networks and its impact on underlying community detection. Commun Nonlinear Sci Numer Simul 20:1015–1025
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, vol. 25, No. 2, pp. 103–114, ACM.
Google Scholar
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: VLDB, vol 97, pp 186–195
Google Scholar
Shang R, Luo S, Li Y, Jiao L, Stolkin R (2015) Large-scale community detection based on node membership grade and sub-communities integration. Physica A Stat Mech Appl 428:279–294
Article Google Scholar
Shah D, Zaman T (2010) Community detection in networks: the leaderfollower algorithm. In: Workshop on networks across disciplines in theory and applications, NIPS, November 2010
Google Scholar
Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620
Article MathSciNet Google Scholar
Khorasgani RR, Chen J, Zaïane OR (2010) Top leaders community detection approach in information networks. In: Proceedings of the 4th workshop on social network mining and analysis, ACM.
Google Scholar
Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, CA, USA, August 12–15, 2007
Google Scholar
Steinhaeuser K, Chawla NV (2010) Identifying and evaluating community structure in complex networks. Pattern Recogn Lett 31(5):413–421
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Fred A, Jain A (2003) Robust data clustering. In: Proceedings of the CVPR, 2003
Google Scholar
Wikipedia (2015) Precision and recall http://en.wikipedia.org/wiki/Precision_and_recall. Accessed 28 Apr 2015
Laney D (2001) 3d data management: controlling data volume, velocity, and variety, application delivery strategies. META Group Inc, Stamford
Google Scholar
Minas M, Subrahmanyam K, Dennis J (2015) Facebook use and academic performance among college students: a mixed-methods study with a multi-ethnic sample. Comput Hum Behav 45:265–272
Article Google Scholar
Debra AG, Kullar R, Newland JG (2015) Review of Twitter for infectious diseases clinicians: useful or a waste of time? Clin Infect Dis 60(10):1533–1540
Google Scholar
Google Search Statistics (2015) http://www.internetlivestats.com/google-search-statistics/. Accessed 28 Apr 2015
Duffy DE, McIntosh AA, Rosenstein M, Willinger W (1993) Analyzing telecommunications traffic data from working common channel signaling subnetworks. Comput Sci Stat 1993:156–156
Google Scholar
Joo H, Hong B, Choi H (2015) A study on the monitoring model development for quality measurement of internet traffic. Inf Syst 48:236–240
Article Google Scholar
Joan BE (2015) Content-based image retrieval methods and professional image users. J Assoc Inform Sci Technol 67(2):2330–1643
Google Scholar
Sebastiano B, Farinella GM, Puglisi G, Ravì D (2014) Aligning codebooks for near duplicate image detection. Multimedia Tools Appl 72(2):1483–1506
Article Google Scholar
Karmakar D, Murthy CA,(2015) Face recognition using face-autocropping and facial feature points extraction. In: Proceedings of the 2nd international conference on perception and machine Intelligence, Kolkata, West Bengal, India, pp 116–122
Google Scholar
Sabeur A, Lacomme P, Ren L, Vincent B (2015) A MapReduce-based approach for shortest path problem in large-scale networks. Eng Appl Artif Intell 41:151–165
Article Google Scholar
Mehlhorn K, Meyer U (2002) External-memory breadth-first search with sublinear I/O. In: Proceedings 10th annual European Symposium on Algorithms (ESA), vol 2461 of LNCS, pp 723–735. Springer
Google Scholar
Munagala K, Ranade A (1999) I/O-complexity of graph algorithms. In: Proceedings 10th symposium on discrete algorithms, ACM-SIAM, pp 687–694
Google Scholar
Biswas A, Biswas B (2015) Investigating community structure in perspective of ego network. Expert Sys Appl 42(20):6913–6934
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
Anupam Biswas, Gourav Arora, Gaurav Tiwari, Srijan Khare, Vyankatesh Agrawal & Bhaskar Biswas

Authors

Anupam Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Gourav Arora
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Srijan Khare
View author publications
You can also search for this author in PubMed Google Scholar
Vyankatesh Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Bhaskar Biswas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anupam Biswas .

Editor information

Editors and Affiliations

Department of Computing and Mathematics , University of Derby, Derby, United Kingdom
Zaigham Mahmood

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Biswas, A., Arora, G., Tiwari, G., Khare, S., Agrawal, V., Biswas, B. (2016). Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-31861-5_3
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics