Abstract
The history of Distributed Computing is more than 40 years old. Throughout these years many concepts have been created and applied in different computing models, system architectures, and platforms for the development of distributed systems. Several Big Data frameworks of today implement these concepts of distributed system for data synchronization, message exchange, real time data processing and transaction control in architectures for Big Data Analytics applications. This Chapter discusses basic patterns of distributed systems, those abstract these concepts and can be used in homogeneous, heterogeneous or hybrid environments of Big Data Analytics implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
M. D. Assuncao, R. N. Calheiros, S. Bianchi, M. A. Netto, R. Buyya, Big Data computing and clouds: Trends and future directions, Journal of Parallel and Distributed Computing 79–80 (2015) 3–15, special Issue on Scalable Systems for Big Data Management and Analytics. doi:10.1016/j.jpdc.2014.08.003.
J. Dean, S. Ghemawat, MapReduce - A Flexible Data Processing Tool, Communications of the ACM 53 (1) (2010) 72–77. doi:10.1145/1629175.1629198.
T. White, Hadoop - The Definitive Guide, 3rd Edition, Vol. 1, O’Reilly Media, Inc., California, USA, 2012.
M. Chen, S. Mao, Y. Liu, Big Data: A Survey, Mobile Networks and Applications 19 (2) (2014) 171–209. doi:10.1007/s11036-013-0489-0.
L. M. Pham, A. Tchana, D. Donsez, V. Zurczak, P.-Y. Gibello, N. de Palma, An adaptable framework to deploy complex applications onto multi-cloud platforms, in: Computing Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on, 2015, pp. 169–174. doi:10.1109/RIVF.2015.7049894.
A. N. Toosi, R. N. Calheiros, R. Buyya, Interconnected Cloud Computing Environments: Challenges, Taxonomy, and Survey, ACM Comput. Surv. 47 (1) (2014) 7:1–7:47.
S. Sakr, A. Liu, D. Batista, M. Alomari, A Survey of Large Scale Data Management Approaches in Cloud Environments, Communications Surveys Tutorials, IEEE 13 (3) (2011) 311–336. doi:10.1109/SURV.2011.032211.00087.
C. Jayalath, J. Stephen, P. Eugster, From the Cloud to the Atmosphere: Running MapReduce across Data Centers, Computers, IEEE Transactions on 63 (1) (2014) 74–87. doi:10.1109/TC.2013.121.
B. Heintz, A. Chandra, R. K. Sitaraman, J. Weissman, End-to-end Optimization for GeoDistributed MapReduce, Cloud Computing, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TCC.2014.2355225.
C. Cerin, G. Fedak (Eds.), Desktop Grid Computing, 1st Edition, Numerical Analysis and Scientific Computing, CRC Press, 2012.
H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M. Gardner, Z. Zhang, MOON: MapReduce On Opportunistic eNvironments, in: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ‘10, ACM, New York, NY, USA, 2010, pp. 95–106. doi:10.1145/1851476.1851489.
F. Costa, L. Silva, M. Dahlin, Volunteer Cloud Computing: MapReduce over the Internet, in: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, 2011, pp. 1855–1862. doi:10.1109/IPDPS.2011.345.
L. Lu, H. Jin, X. Shi, G. Fedak, Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce, in: Proceedings of the 2012 ACM/IEEE 13th Int. Conference on Grid Computing, GRID ‘12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 76–84. doi:10.1109/Grid.2012.31.
W. R. Stevens, S. A. Rago, Advanced Programming in the UNIX Environment, 3rd Edition, Addison-Wesley Professional, 2013.
G. R. Andrews, Concurrent Programming: Principles and Practice, Benjamin/Cummings Publishing Company, 1991.
N. A. Lynch, Distributed Algorithms, The Morgan Kaufmann Series in Data Management System Series, Morgan Kaufmann Publishers, 1997.
A. Alexandrov, R. Bergmann, S. Ewen, J. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinlander, M. J. Sax, S. Schelter, M. Hoger, K. Tzoumas, D. Warneke, The Stratosphere platform for big data analytics, VLBD Journal 23 (6) (2014) 939–964. doi:10.1007/s00778-014-0357-y.
A. Chauhan, T. Dunning, A. Gates, O. O’Malley, S. Owen, H. Saputra, Apache Flink (2015). URL https://flink.apache.org
T. Zhang, Reliable Event Messaging in Big Data Enterprises: Looking for the Balance Between Producers and Consumers, in: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS ‘15, ACM, New York, NY, USA, 2015, pp. 226–233.
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, I. Stoica, Spark: Cluster Computing with Working Sets, in: 2010 USENIX Federated Conferences Week, 2nd - Workshop on Hot Topics in Cloud Computing, 2010, pp. 1–8.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, I. Stoica, Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 2–14. URL http://dl.acm.org/citation.cfm?id=2228298.2228301
A. S. Foundation, Apache Zookeeper (Jul. 2016). URL https://zookeeper.apache.org
Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, K. Chen, Survey of MapReduce frame operation in bioinformatics, Journal Briefings in Bioinformatics 15 (4) (2014) 637–647. doi:10.1093/bib/bbs088.
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data, Genome Research 20 (9) (2010) 1297–1303. doi:10.1101/gr.107524.110.
R. Tudoran, A. Costan, R. Wang, L. Bouge, G. Antoniu, Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 92–101. doi:10.1109/CCGrid.2014.86.
K. Krish, A. Anwar, A. R. Butt, HATS: A Heterogeneity-Aware Tiered Storage for Hadoop, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 502–511.
S. Ji, B. Li, Wide area analytics for geographically distributed datacenters, Tsinghua Science and Technology 21 (2) (2016) 125–135. doi:10.1109/TST.2016.7442496.
G. Antoniu, J. Bigot, C. Blanchet, L. Bouge, F. Briant, F. Cappello, A. Costan, F. Desprez, G. Fedak, S. Gault, K. Keahey, B. Nicolae, C. Perez, A. Simonet, F. Suter, B. Tang, R. Terreux, Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures, Int. Journal of Cloud Computing 2 (2013) 150–170.
J. C. S. Anjos, G. Fedak, C. F. R. Geyer, BIGhybrid: a simulator for MapReduce applications in hybrid distributed infrastructures validated with the Grid5000 experimental platform, Concurrency and Computation: Practice and Experience 28 (8) (2016) 2416–2439. doi:10.1002/cpe.3665.
D.-H. Le, H.-L. Truong, G. Copil, S. Nastic, S. Dustdar, SALSA: A Framework for Dynamic Configuration of Cloud Services, in: Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on, 2014, pp. 146–153. doi:10.1109/CloudCom.2014.99.
L. Mashayekhy, M. Nejad, D. Grosu, A PTAS Mechanism for Provisioning and Allocation of Heterogeneous Cloud Resources, Parallel and Distributed Systems, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TPDS.2014.2355228.
Y. Mansouri, A. Toosi, R. Buyya, Brokering Algorithms for Optimizing the Availability and Cost of Cloud Storage Services, in: Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, Vol. 1, 2013, pp. 581–589. doi:10.1109/CloudCom.2013.83.
D. Loreti, A. Ciampolini, A Hybrid Cloud Infrastructure for Big Data Applications, in: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems, HPCC-CSS-ICESS ‘15, IEEE Computer Society, Washington, DC, USA, 2015, pp. 1713–1718. doi:10.1109/HPCC-CSS-ICESS.2015.140.
Z. Zheng, Y. Gui, F. Wu, G. Chen, STAR: Strategy-Proof Double Auctions for Multi-Cloud, Multi-Tenant Bandwidth Reservation, Computers, IEEE Transactions on PP (99) (2014) 1–14. doi:10.1109/TC.2014.2346204.
A. Iosup, N. Yigitbasi, D. Epema, On the Performance Variability of Production Cloud Services, in: Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, 2011, pp. 104–113. doi:10.1109/CCGrid.2011.22.
N. Grozev, R. Buyya, Performance Modelling and Simulation of Three-Tier Applications in Cloud and Multi-Cloud Environments, The Computer Journal 58 (1) (2015) 1–22. doi:10.1093/comjnl/bxt107.
B. Sharma, T. Wood, C. Das, HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers, in: Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on, 2013, pp. 102–111. doi:10.1109/ICDCS.2013.31.
R. Tudoran, K. Keahey, P. Riteau, S. Panitkin, G. Antoniu, Evaluating Streaming Strategies for Event Processing Across Infrastructure Clouds, in: Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, Chicago, IL, USA, 2014, pp. 151–159.
M. Zaharia, T. Das, H. Li, S. Shenker, I. Stoica, Discretized streams: an efficient and faulttolerant model for stream processing on large clusters, in: Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing, HotCloud’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 10–10.
M. Ding, L. Zheng, Y. Lu, L. Li, S. Guo, M. Guo, More Convenient More Overhead: The Performance Evaluation of Hadoop Streaming, in: Proceedings of the 2011 ACM Symposium on Research in Applied Computation, RACS ‘11, ACM, New York, NY, USA, 2011, pp. 307–313. doi:10.1145/2103380.2103444.
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, OSDI (2008) 29–42.
B. Tang, H. He, G. Fedak, HybridMR: a new approach for hybrid MapReduce combining desktop grid and cloud infrastructures, Concurrency and Computation: Practice and Experience 27 (16) (2015) 4140–4155.
F. J. Clemente-Castello, B. Nicolae, K. Katrinis, M. M. Rafique, R. Mayo, J. C. Fernandez, D. Loreti, Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce, in: Utility and Cloud Computing - UCC, 2015 IEEE/ACM 8th International Conference on, IEEE Computer Society, 2015, pp. 290–299. doi:10.1109/UCC.2015.47.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
dos Anjos, J.C.S., Geyer, C.F.R., Barbosa, J.L.V. (2017). Distributed Computing Patterns Useful in Big Data Analytics. In: Mazumder, S., Singh Bhadoria, R., Deka, G. (eds) Distributed Computing in Big Data Analytics. Scalable Computing and Communications. Springer, Cham. https://doi.org/10.1007/978-3-319-59834-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-59834-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59833-8
Online ISBN: 978-3-319-59834-5
eBook Packages: Computer ScienceComputer Science (R0)