Skip to main content

Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios

  • Chapter
  • First Online:
Data Science and Big Data Computing

Abstract

Big Data systems are often confronted with storage and processing-related issues. Nowadays, data in various domains is growing so enormously and so quickly that storage and processing are becoming the two key concerns in such large systems of data. In addition to the size, complex relationship within the data is making the system highly sophisticated. Such complex relationships are often represented as network of data objects. Parallel processing, external memory algorithms, and data partitioning are at the forefront of techniques to deal with the Big Data issues. This chapter discusses these techniques in relation to storage and processing of Big Data. The Big Data partitioning techniques, such as agglomerative approaches in particular, have been studied and reported. Network data partitioning or clustering is common to most of the network-related applications where the objective is to group similar objects based on the connectivity among them. Application areas include social network analysis, World Wide Web, image processing, biological networks, supply chain networks, and many others. In this chapter, we discuss the relevant agglomerative approaches. Relative advantages with respect to Big Data scenarios are also presented. The discussion also covers the impact on Big Data scenarios with respect to strategic changes in the presented agglomerative approaches. Tuning of various parameters of agglomerative approaches is also addressed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Sys with Appl 42:2264–2275

    Article  Google Scholar 

  2. Li S, Wu D (2015) Modularity-based image segmentation. IEEE Trans Circuit Syst Video Technol 25:570–581

    Article  Google Scholar 

  3. Nikolaev AG, Razib R, Kucheriya A (2015) On efficient use of entropy centrality for social network analysis and community detection. Soc Networks 40:154–162

    Article  Google Scholar 

  4. Li S, Daie P (2014) Configuration of assembly supply chain using hierarchical cluster analysis. Procedia fCIRPg 17:622–627

    Article  Google Scholar 

  5. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: SIGMOD’10, 2010

    Google Scholar 

  6. The Apache Software Foundation (2014) http://giraph.apache.org/. Accessed 28 Apr 2015

  7. Xue J, Yang Z, Qu Z, Hou S, Dai Y (2014) Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of ACM HPDC’2014, Vancouver, Canada, 23–26 June

    Google Scholar 

  8. Vial T (2012) http://blog.octo.com/en/introduction-to-large-scale-graph-processing/. Accessed 28 Apr 2015.

  9. Ajwani D, Dementiev R, Meyer U (2006) A computational study of external-memory BFS algorithms, SODA 2006 ACM-SIAM Symposium on Discrete Algorithms, Miami, Florida, USA, January 2006

    Google Scholar 

  10. Kanawati R (2011) Licod: leaders identification for community detection in complex networks. In: SocialCom/PASSAT, pp 577–582

    Google Scholar 

  11. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 10:P10008

    Article  Google Scholar 

  12. Fan W, Yeung K (2015) Similarity between community structures of different online social networks and its impact on underlying community detection. Commun Nonlinear Sci Numer Simul 20:1015–1025

    Article  Google Scholar 

  13. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, vol. 25, No. 2, pp. 103–114, ACM.

    Google Scholar 

  14. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: VLDB, vol 97, pp 186–195

    Google Scholar 

  15. Shang R, Luo S, Li Y, Jiao L, Stolkin R (2015) Large-scale community detection based on node membership grade and sub-communities integration. Physica A Stat Mech Appl 428:279–294

    Article  Google Scholar 

  16. Shah D, Zaman T (2010) Community detection in networks: the leaderfollower algorithm. In: Workshop on networks across disciplines in theory and applications, NIPS, November 2010

    Google Scholar 

  17. Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620

    Article  MathSciNet  Google Scholar 

  18. Khorasgani RR, Chen J, Zaïane OR (2010) Top leaders community detection approach in information networks. In: Proceedings of the 4th workshop on social network mining and analysis, ACM.

    Google Scholar 

  19. Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, CA, USA, August 12–15, 2007

    Google Scholar 

  20. Steinhaeuser K, Chawla NV (2010) Identifying and evaluating community structure in complex networks. Pattern Recogn Lett 31(5):413–421

    Article  Google Scholar 

  21. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  22. Fred A, Jain A (2003) Robust data clustering. In: Proceedings of the CVPR, 2003

    Google Scholar 

  23. Wikipedia (2015) Precision and recall http://en.wikipedia.org/wiki/Precision_and_recall. Accessed 28 Apr 2015

  24. Laney D (2001) 3d data management: controlling data volume, velocity, and variety, application delivery strategies. META Group Inc, Stamford

    Google Scholar 

  25. Minas M, Subrahmanyam K, Dennis J (2015) Facebook use and academic performance among college students: a mixed-methods study with a multi-ethnic sample. Comput Hum Behav 45:265–272

    Article  Google Scholar 

  26. Debra AG, Kullar R, Newland JG (2015) Review of Twitter for infectious diseases clinicians: useful or a waste of time? Clin Infect Dis 60(10):1533–1540

    Google Scholar 

  27. Google Search Statistics (2015) http://www.internetlivestats.com/google-search-statistics/. Accessed 28 Apr 2015

  28. Duffy DE, McIntosh AA, Rosenstein M, Willinger W (1993) Analyzing telecommunications traffic data from working common channel signaling subnetworks. Comput Sci Stat 1993:156–156

    Google Scholar 

  29. Joo H, Hong B, Choi H (2015) A study on the monitoring model development for quality measurement of internet traffic. Inf Syst 48:236–240

    Article  Google Scholar 

  30. Joan BE (2015) Content-based image retrieval methods and professional image users. J Assoc Inform Sci Technol 67(2):2330–1643

    Google Scholar 

  31. Sebastiano B, Farinella GM, Puglisi G, Ravì D (2014) Aligning codebooks for near duplicate image detection. Multimedia Tools Appl 72(2):1483–1506

    Article  Google Scholar 

  32. Karmakar D, Murthy CA,(2015) Face recognition using face-autocropping and facial feature points extraction. In: Proceedings of the 2nd international conference on perception and machine Intelligence, Kolkata, West Bengal, India, pp 116–122

    Google Scholar 

  33. Sabeur A, Lacomme P, Ren L, Vincent B (2015) A MapReduce-based approach for shortest path problem in large-scale networks. Eng Appl Artif Intell 41:151–165

    Article  Google Scholar 

  34. Mehlhorn K, Meyer U (2002) External-memory breadth-first search with sublinear I/O. In: Proceedings 10th annual European Symposium on Algorithms (ESA), vol 2461 of LNCS, pp 723–735. Springer

    Google Scholar 

  35. Munagala K, Ranade A (1999) I/O-complexity of graph algorithms. In: Proceedings 10th symposium on discrete algorithms, ACM-SIAM, pp 687–694

    Google Scholar 

  36. Biswas A, Biswas B (2015) Investigating community structure in perspective of ego network. Expert Sys Appl 42(20):6913–6934

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anupam Biswas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Biswas, A., Arora, G., Tiwari, G., Khare, S., Agrawal, V., Biswas, B. (2016). Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31861-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31859-2

  • Online ISBN: 978-3-319-31861-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics