Abstract
Big Data systems are often confronted with storage and processing-related issues. Nowadays, data in various domains is growing so enormously and so quickly that storage and processing are becoming the two key concerns in such large systems of data. In addition to the size, complex relationship within the data is making the system highly sophisticated. Such complex relationships are often represented as network of data objects. Parallel processing, external memory algorithms, and data partitioning are at the forefront of techniques to deal with the Big Data issues. This chapter discusses these techniques in relation to storage and processing of Big Data. The Big Data partitioning techniques, such as agglomerative approaches in particular, have been studied and reported. Network data partitioning or clustering is common to most of the network-related applications where the objective is to group similar objects based on the connectivity among them. Application areas include social network analysis, World Wide Web, image processing, biological networks, supply chain networks, and many others. In this chapter, we discuss the relevant agglomerative approaches. Relative advantages with respect to Big Data scenarios are also presented. The discussion also covers the impact on Big Data scenarios with respect to strategic changes in the presented agglomerative approaches. Tuning of various parameters of agglomerative approaches is also addressed in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Sys with Appl 42:2264–2275
Li S, Wu D (2015) Modularity-based image segmentation. IEEE Trans Circuit Syst Video Technol 25:570–581
Nikolaev AG, Razib R, Kucheriya A (2015) On efficient use of entropy centrality for social network analysis and community detection. Soc Networks 40:154–162
Li S, Daie P (2014) Configuration of assembly supply chain using hierarchical cluster analysis. Procedia fCIRPg 17:622–627
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: SIGMOD’10, 2010
The Apache Software Foundation (2014) http://giraph.apache.org/. Accessed 28 Apr 2015
Xue J, Yang Z, Qu Z, Hou S, Dai Y (2014) Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of ACM HPDC’2014, Vancouver, Canada, 23–26 June
Vial T (2012) http://blog.octo.com/en/introduction-to-large-scale-graph-processing/. Accessed 28 Apr 2015.
Ajwani D, Dementiev R, Meyer U (2006) A computational study of external-memory BFS algorithms, SODA 2006 ACM-SIAM Symposium on Discrete Algorithms, Miami, Florida, USA, January 2006
Kanawati R (2011) Licod: leaders identification for community detection in complex networks. In: SocialCom/PASSAT, pp 577–582
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 10:P10008
Fan W, Yeung K (2015) Similarity between community structures of different online social networks and its impact on underlying community detection. Commun Nonlinear Sci Numer Simul 20:1015–1025
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, vol. 25, No. 2, pp. 103–114, ACM.
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: VLDB, vol 97, pp 186–195
Shang R, Luo S, Li Y, Jiao L, Stolkin R (2015) Large-scale community detection based on node membership grade and sub-communities integration. Physica A Stat Mech Appl 428:279–294
Shah D, Zaman T (2010) Community detection in networks: the leaderfollower algorithm. In: Workshop on networks across disciplines in theory and applications, NIPS, November 2010
Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620
Khorasgani RR, Chen J, Zaïane OR (2010) Top leaders community detection approach in information networks. In: Proceedings of the 4th workshop on social network mining and analysis, ACM.
Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, CA, USA, August 12–15, 2007
Steinhaeuser K, Chawla NV (2010) Identifying and evaluating community structure in complex networks. Pattern Recogn Lett 31(5):413–421
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Fred A, Jain A (2003) Robust data clustering. In: Proceedings of the CVPR, 2003
Wikipedia (2015) Precision and recall http://en.wikipedia.org/wiki/Precision_and_recall. Accessed 28 Apr 2015
Laney D (2001) 3d data management: controlling data volume, velocity, and variety, application delivery strategies. META Group Inc, Stamford
Minas M, Subrahmanyam K, Dennis J (2015) Facebook use and academic performance among college students: a mixed-methods study with a multi-ethnic sample. Comput Hum Behav 45:265–272
Debra AG, Kullar R, Newland JG (2015) Review of Twitter for infectious diseases clinicians: useful or a waste of time? Clin Infect Dis 60(10):1533–1540
Google Search Statistics (2015) http://www.internetlivestats.com/google-search-statistics/. Accessed 28 Apr 2015
Duffy DE, McIntosh AA, Rosenstein M, Willinger W (1993) Analyzing telecommunications traffic data from working common channel signaling subnetworks. Comput Sci Stat 1993:156–156
Joo H, Hong B, Choi H (2015) A study on the monitoring model development for quality measurement of internet traffic. Inf Syst 48:236–240
Joan BE (2015) Content-based image retrieval methods and professional image users. J Assoc Inform Sci Technol 67(2):2330–1643
Sebastiano B, Farinella GM, Puglisi G, Ravì D (2014) Aligning codebooks for near duplicate image detection. Multimedia Tools Appl 72(2):1483–1506
Karmakar D, Murthy CA,(2015) Face recognition using face-autocropping and facial feature points extraction. In: Proceedings of the 2nd international conference on perception and machine Intelligence, Kolkata, West Bengal, India, pp 116–122
Sabeur A, Lacomme P, Ren L, Vincent B (2015) A MapReduce-based approach for shortest path problem in large-scale networks. Eng Appl Artif Intell 41:151–165
Mehlhorn K, Meyer U (2002) External-memory breadth-first search with sublinear I/O. In: Proceedings 10th annual European Symposium on Algorithms (ESA), vol 2461 of LNCS, pp 723–735. Springer
Munagala K, Ranade A (1999) I/O-complexity of graph algorithms. In: Proceedings 10th symposium on discrete algorithms, ACM-SIAM, pp 687–694
Biswas A, Biswas B (2015) Investigating community structure in perspective of ego network. Expert Sys Appl 42(20):6913–6934
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Biswas, A., Arora, G., Tiwari, G., Khare, S., Agrawal, V., Biswas, B. (2016). Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-31861-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)