Skip to main content

Distributed Computing Patterns Useful in Big Data Analytics

  • Chapter
  • First Online:
Distributed Computing in Big Data Analytics

Abstract

The history of Distributed Computing is more than 40 years old. Throughout these years many concepts have been created and applied in different computing models, system architectures, and platforms for the development of distributed systems. Several Big Data frameworks of today implement these concepts of distributed system for data synchronization, message exchange, real time data processing and transaction control in architectures for Big Data Analytics applications. This Chapter discusses basic patterns of distributed systems, those abstract these concepts and can be used in homogeneous, heterogeneous or hybrid environments of Big Data Analytics implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.openstack.org/.

References

  1. M. D. Assuncao, R. N. Calheiros, S. Bianchi, M. A. Netto, R. Buyya, Big Data computing and clouds: Trends and future directions, Journal of Parallel and Distributed Computing 79–80 (2015) 3–15, special Issue on Scalable Systems for Big Data Management and Analytics. doi:10.1016/j.jpdc.2014.08.003.

  2. J. Dean, S. Ghemawat, MapReduce - A Flexible Data Processing Tool, Communications of the ACM 53 (1) (2010) 72–77. doi:10.1145/1629175.1629198.

    Article  Google Scholar 

  3. T. White, Hadoop - The Definitive Guide, 3rd Edition, Vol. 1, O’Reilly Media, Inc., California, USA, 2012.

    Google Scholar 

  4. M. Chen, S. Mao, Y. Liu, Big Data: A Survey, Mobile Networks and Applications 19 (2) (2014) 171–209. doi:10.1007/s11036-013-0489-0.

    Article  Google Scholar 

  5. L. M. Pham, A. Tchana, D. Donsez, V. Zurczak, P.-Y. Gibello, N. de Palma, An adaptable framework to deploy complex applications onto multi-cloud platforms, in: Computing Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on, 2015, pp. 169–174. doi:10.1109/RIVF.2015.7049894.

  6. A. N. Toosi, R. N. Calheiros, R. Buyya, Interconnected Cloud Computing Environments: Challenges, Taxonomy, and Survey, ACM Comput. Surv. 47 (1) (2014) 7:1–7:47.

    Google Scholar 

  7. S. Sakr, A. Liu, D. Batista, M. Alomari, A Survey of Large Scale Data Management Approaches in Cloud Environments, Communications Surveys Tutorials, IEEE 13 (3) (2011) 311–336. doi:10.1109/SURV.2011.032211.00087.

    Article  Google Scholar 

  8. C. Jayalath, J. Stephen, P. Eugster, From the Cloud to the Atmosphere: Running MapReduce across Data Centers, Computers, IEEE Transactions on 63 (1) (2014) 74–87. doi:10.1109/TC.2013.121.

    MathSciNet  MATH  Google Scholar 

  9. B. Heintz, A. Chandra, R. K. Sitaraman, J. Weissman, End-to-end Optimization for GeoDistributed MapReduce, Cloud Computing, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TCC.2014.2355225.

  10. C. Cerin, G. Fedak (Eds.), Desktop Grid Computing, 1st Edition, Numerical Analysis and Scientific Computing, CRC Press, 2012.

    Google Scholar 

  11. H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M. Gardner, Z. Zhang, MOON: MapReduce On Opportunistic eNvironments, in: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ‘10, ACM, New York, NY, USA, 2010, pp. 95–106. doi:10.1145/1851476.1851489.

  12. F. Costa, L. Silva, M. Dahlin, Volunteer Cloud Computing: MapReduce over the Internet, in: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, 2011, pp. 1855–1862. doi:10.1109/IPDPS.2011.345.

  13. L. Lu, H. Jin, X. Shi, G. Fedak, Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce, in: Proceedings of the 2012 ACM/IEEE 13th Int. Conference on Grid Computing, GRID ‘12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 76–84. doi:10.1109/Grid.2012.31.

  14. W. R. Stevens, S. A. Rago, Advanced Programming in the UNIX Environment, 3rd Edition, Addison-Wesley Professional, 2013.

    Google Scholar 

  15. G. R. Andrews, Concurrent Programming: Principles and Practice, Benjamin/Cummings Publishing Company, 1991.

    Google Scholar 

  16. N. A. Lynch, Distributed Algorithms, The Morgan Kaufmann Series in Data Management System Series, Morgan Kaufmann Publishers, 1997.

    Google Scholar 

  17. A. Alexandrov, R. Bergmann, S. Ewen, J. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinlander, M. J. Sax, S. Schelter, M. Hoger, K. Tzoumas, D. Warneke, The Stratosphere platform for big data analytics, VLBD Journal 23 (6) (2014) 939–964. doi:10.1007/s00778-014-0357-y.

    Google Scholar 

  18. A. Chauhan, T. Dunning, A. Gates, O. O’Malley, S. Owen, H. Saputra, Apache Flink (2015). URL https://flink.apache.org

  19. T. Zhang, Reliable Event Messaging in Big Data Enterprises: Looking for the Balance Between Producers and Consumers, in: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS ‘15, ACM, New York, NY, USA, 2015, pp. 226–233.

    Google Scholar 

  20. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, I. Stoica, Spark: Cluster Computing with Working Sets, in: 2010 USENIX Federated Conferences Week, 2nd - Workshop on Hot Topics in Cloud Computing, 2010, pp. 1–8.

    Google Scholar 

  21. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, I. Stoica, Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 2–14. URL http://dl.acm.org/citation.cfm?id=2228298.2228301

  22. A. S. Foundation, Apache Zookeeper (Jul. 2016). URL https://zookeeper.apache.org

  23. Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, K. Chen, Survey of MapReduce frame operation in bioinformatics, Journal Briefings in Bioinformatics 15 (4) (2014) 637–647. doi:10.1093/bib/bbs088.

    Article  Google Scholar 

  24. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data, Genome Research 20 (9) (2010) 1297–1303. doi:10.1101/gr.107524.110.

    Article  Google Scholar 

  25. R. Tudoran, A. Costan, R. Wang, L. Bouge, G. Antoniu, Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 92–101. doi:10.1109/CCGrid.2014.86.

  26. K. Krish, A. Anwar, A. R. Butt, HATS: A Heterogeneity-Aware Tiered Storage for Hadoop, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 502–511.

    Google Scholar 

  27. S. Ji, B. Li, Wide area analytics for geographically distributed datacenters, Tsinghua Science and Technology 21 (2) (2016) 125–135. doi:10.1109/TST.2016.7442496.

    Article  Google Scholar 

  28. G. Antoniu, J. Bigot, C. Blanchet, L. Bouge, F. Briant, F. Cappello, A. Costan, F. Desprez, G. Fedak, S. Gault, K. Keahey, B. Nicolae, C. Perez, A. Simonet, F. Suter, B. Tang, R. Terreux, Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures, Int. Journal of Cloud Computing 2 (2013) 150–170.

    Article  Google Scholar 

  29. J. C. S. Anjos, G. Fedak, C. F. R. Geyer, BIGhybrid: a simulator for MapReduce applications in hybrid distributed infrastructures validated with the Grid5000 experimental platform, Concurrency and Computation: Practice and Experience 28 (8) (2016) 2416–2439. doi:10.1002/cpe.3665.

  30. D.-H. Le, H.-L. Truong, G. Copil, S. Nastic, S. Dustdar, SALSA: A Framework for Dynamic Configuration of Cloud Services, in: Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on, 2014, pp. 146–153. doi:10.1109/CloudCom.2014.99.

  31. L. Mashayekhy, M. Nejad, D. Grosu, A PTAS Mechanism for Provisioning and Allocation of Heterogeneous Cloud Resources, Parallel and Distributed Systems, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TPDS.2014.2355228.

  32. Y. Mansouri, A. Toosi, R. Buyya, Brokering Algorithms for Optimizing the Availability and Cost of Cloud Storage Services, in: Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, Vol. 1, 2013, pp. 581–589. doi:10.1109/CloudCom.2013.83.

  33. D. Loreti, A. Ciampolini, A Hybrid Cloud Infrastructure for Big Data Applications, in: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems, HPCC-CSS-ICESS ‘15, IEEE Computer Society, Washington, DC, USA, 2015, pp. 1713–1718. doi:10.1109/HPCC-CSS-ICESS.2015.140.

  34. Z. Zheng, Y. Gui, F. Wu, G. Chen, STAR: Strategy-Proof Double Auctions for Multi-Cloud, Multi-Tenant Bandwidth Reservation, Computers, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TC.2014.2346204.

  35. A. Iosup, N. Yigitbasi, D. Epema, On the Performance Variability of Production Cloud Services, in: Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, 2011, pp. 104–113. doi:10.1109/CCGrid.2011.22.

  36. N. Grozev, R. Buyya, Performance Modelling and Simulation of Three-Tier Applications in Cloud and Multi-Cloud Environments, The Computer Journal 58 (1) (2015) 1–22. doi:10.1093/comjnl/bxt107.

    Article  Google Scholar 

  37. B. Sharma, T. Wood, C. Das, HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers, in: Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on, 2013, pp. 102–111. doi:10.1109/ICDCS.2013.31.

  38. R. Tudoran, K. Keahey, P. Riteau, S. Panitkin, G. Antoniu, Evaluating Streaming Strategies for Event Processing Across Infrastructure Clouds, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 151–159.

    Google Scholar 

  39. M. Zaharia, T. Das, H. Li, S. Shenker, I. Stoica, Discretized streams: an efficient and faulttolerant model for stream processing on large clusters, in: Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing, HotCloud’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 10–10.

    Google Scholar 

  40. M. Ding, L. Zheng, Y. Lu, L. Li, S. Guo, M. Guo, More Convenient More Overhead: The Performance Evaluation of Hadoop Streaming, in: Proceedings of the 2011 ACM Symposium on Research in Applied Computation, RACS ‘11, ACM, New York, NY, USA, 2011, pp. 307–313. doi:10.1145/2103380.2103444.

  41. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, OSDI (2008) 29–42.

    Google Scholar 

  42. B. Tang, H. He, G. Fedak, HybridMR: a new approach for hybrid MapReduce combining desktop grid and cloud infrastructures, Concurrency and Computation: Practice and Experience 27 (16) (2015) 4140–4155.

    Google Scholar 

  43. F. J. Clemente-Castello, B. Nicolae, K. Katrinis, M. M. Rafique, R. Mayo, J. C. Fernandez, D. Loreti, Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce, in: Utility and Cloud Computing - UCC, 2015 IEEE/ACM 8th International Conference on, IEEE Computer Society, 2015, pp. 290–299. doi:10.1109/UCC.2015.47.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julio César Santos dos Anjos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

dos Anjos, J.C.S., Geyer, C.F.R., Barbosa, J.L.V. (2017). Distributed Computing Patterns Useful in Big Data Analytics. In: Mazumder, S., Singh Bhadoria, R., Deka, G. (eds) Distributed Computing in Big Data Analytics. Scalable Computing and Communications. Springer, Cham. https://doi.org/10.1007/978-3-319-59834-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59834-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59833-8

  • Online ISBN: 978-3-319-59834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics