An Algorithmic Framework for Geo-Distributed Analytics

Kandula, Srikanth; Menache, Ishai; Naor, Joseph (Seffi); Timnat, Erez

doi:10.1007/978-3-030-10880-9_6

Srikanth Kandula¹⁴,
Ishai Menache¹⁴,
Joseph (Seffi) Naor¹⁵ &
…
Erez Timnat¹⁶

Part of the book series: Static & Dynamic Game Theory: Foundations & Applications ((SDGTFA))

538 Accesses

Abstract

Large-scale cloud enterprises operate tens to hundreds of datacenters, running a variety of services that produce enormous amounts of data, such as search clicks and infrastructure operation logs. A recent research direction in both academia and industry is to attempt to process the “big data” in multiple datacenters, as the alternative of centralized processing might be too slow and costly (e.g., due to transferring all the data to a single location). Running such geo-distributed analytics jobs at scale gives rise to key resource management decisions: Where should each of the computations take place? Accordingly, which data should be moved to which location, and when? Which network paths should be used for moving the data, etc. These decisions are complicated not only because they involve the scheduling of multiple types of resources (e.g., compute and network), but also due to the complicated internal data flow of the jobs—typically structured as a DAG of tens of stages, each of which with up to thousands of tasks. Recent work [17, 22, 25] has dealt with the resource management problem by abstracting away certain aspects of the problem, such as the physical network connecting the datacenters, the DAG structure of the jobs, and/or the compute capacity constraints at the (possibly heterogeneous) datacenters. In this paper, we provide the first analytical model that includes all aspects of the problem, with the objective of minimizing the makespan of multiple geo-distributed jobs. We provide exact and approximate algorithms for certain practical scenarios and suggest principled heuristics for other scenarios of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hadoop YARN Project. http://bit.ly/1iS8xvP.
Seattle department of transportation live traffic videos. http://web6.seattle.gov/travelers/.
TPC-H Benchmark. http://bit.ly/1KRK5gl.
TPC-DS Benchmark. http://bit.ly/1J6uDap, 2012.
A. Greenberg, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009.
Google Scholar
Sameer Agarwal, Srikanth Kandula, Nico Burno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. Re-optimizing data parallel computing. In NSDI, 2012.
Google Scholar
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008.
Google Scholar
Michael Armbrust et al. Spark sql: Relational data processing in spark. In SIGMOD, 2015.
Google Scholar
Peter Bodík, Ishai Menache, Joseph Seffi Naor, and Jonathan Yaniv. Brief announcement: deadline-aware scheduling of big-data processing jobs. In SPAA, pages 211–213, 2014.
Google Scholar
Ronnie Chaiken et al. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008.
Google Scholar
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, 2004.
Google Scholar
Pierre-François Dutot, Grégory Mounié, and Denis Trystram. Scheduling parallel tasks approximation algorithms. In Handbook of Scheduling - Algorithms, Models, and Performance Analysis. 2004.
Google Scholar
Ronald L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 1969.
Google Scholar
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In OSDI, 2016.
Google Scholar
Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. Achieving high utilization with software-driven wan. In SIGCOMM, 2013.
Google Scholar
Chien-Chun Hung, Ganesh Ananthanarayanan, Leana Golubchik, Minlan Yu, and Mingyang Zhang. Wide-area analytics with multiple resources. In EuroSys, 2018.
Google Scholar
Chien-Chun Hung, Leana Golubchik, and Minlan Yu. Scheduling jobs across geo-distributed datacenters. In SOCC, 2015.
Google Scholar
IDC. Network video surveillance: Addressing storage challenges. http://bit.ly/1OGOtzA, 2012.
Michael Isard. Autopilot: Automatic Data Center Management. OSR, 41(2), 2007.
Google Scholar
Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. B4: Experience with a globally-deployed software defined wan. In SIGCOMM, 2013.
Google Scholar
Klaus Jansen and Hu Zhang. Scheduling malleable tasks with precedence constraints. J. Comput. Syst. Sci., 78(1):245–259, 2012.
Article MathSciNet Google Scholar
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. Low latency geo-distributed analytics. In SIGCOMM, 2015.
Google Scholar
Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In EuroSys, 2013.
Google Scholar
Ashish Thusoo et al. Hive- a warehousing solution over a map-reduce framework. In VLDB, 2009.
Google Scholar
Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese. Global analytics in the face of bandwidth and regulatory constraints. In NSDI, 2015.
Google Scholar
M. Zaharia et al. Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Srikanth Kandula & Ishai Menache
Technion – Israel Institute of Technology, Haifa, Israel
Joseph (Seffi) Naor
Google, Tel Aviv, Israel
Erez Timnat

Authors

Srikanth Kandula
View author publications
You can also search for this author in PubMed Google Scholar
Ishai Menache
View author publications
You can also search for this author in PubMed Google Scholar
Joseph (Seffi) Naor
View author publications
You can also search for this author in PubMed Google Scholar
Erez Timnat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ishai Menache .

Editor information

Editors and Affiliations

University of California, Berkeley, Berkeley, CA, USA
Jean Walrand
Tandon School of Engineering, New York University, Brooklyn, NY, USA
Quanyan Zhu
LIA/CERI, Université d’Avignon, Avignon, France
Yezekael Hayel
Laboratoire Informatique d’Avignon, Université d’Avignon, Avignon, France
Tania Jimenez

6 Appendix: Linearizing Multiplicative Constraints

Recall that our multiplicative constraints are of the form

$$ \forall i,j\in [n],e=(u,v)\in E\,:\,\sum _{t\le T}r_{i,j,e,t}=D_{e}\frac{OUT_{i,u,T}}{D_{OUT,u}}\cdot \frac{IN_{j,v,T}}{D_{IN,v}} $$

The Taylor series is expanded around the estimated values $\widehat{OUT}_{i,u,T},\widehat{IN}_{j,v,T}$. We solve the LP iteratively and use the values the LP found for the variables in the previous iteration as the estimated values. After several iterations, the values converge to their true value, and thus, the multiplications become more accurate.

Recall that the first-order Taylor expansion for the multiplication $x\cdot y$ expanded around the point $(\hat{x},\hat{y})$ is: $x\cdot y\simeq \hat{x}\cdot \hat{y}+(x-\hat{x})\cdot \hat{y}+(y-\hat{y})\cdot \hat{x}$. Dividing both sides by the constant $\frac{D_{e}}{D_{OUT,u}D_{IN,v}}$ and approximating the multiplication using first-order Taylor, we obtain $\forall i,j\in [n],e=(u,v)\in E$:

$$\begin{aligned}&\frac{D_{OUT,u}D_{IN,v}}{D_{e}}\sum _{t\le T}r_{i,j,e,t}\ge \widehat{OUT}_{i,u,T}\cdot \widehat{IN}_{j,v,T} \\&+(OUT_{i,u,T}-\widehat{OUT}_{i,u,T})\cdot \widehat{IN}_{j,v,T}+(IN_{j,v,T}-\widehat{IN}_{j,v,T})\cdot \widehat{OUT}_{i,u,T}. \end{aligned}$$

Note that we have turned this equality constraint into a non-equality, since we have other constraints for the total flow from previous sections. Our multiplication evaluating the flow might be slightly more or less than the true multiplication value. In case it is more than the true value, we simply send less flow, and no constraints are violated. In case it is less than the true value, we need to send more data than planned—which might violate capacity constraints. In this case, we simply use a little more time for the entire flow to be sent. As long as the approximation is reasonable, this extra-time will be small.

To obtain relatively accurate values for $\widehat{OUT}_{i,u,T},\widehat{IN}_{j,v,T}$, we solve the LP iteratively and use the previous values as our approximation. For the first iteration only we use $\widehat{OUT}_{i,u,T}=0,\widehat{IN}_{j,v,T}=0$. We use the same approach for our second set of multiplicative constraints (see Sect. 3.4) $\forall i,j\in [n],e=(u,v)\in E,t\in [T]\,:\,\frac{\sum _{t'\le t}r_{i,j,e,t'}}{\sum _{t'\le T}r_{i,j,e,t'}}\ge \frac{COMP_{j,v,t}}{COMP_{j,v,T}};$ details omitted for brevity. The resulting LP is then solved iteratively, as described, to obtain a nearly feasible solution. The small infeasibility translates into some extra-time required for completing flows that are larger than anticipated by the LP.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kandula, S., Menache, I., Naor, J.(., Timnat, E. (2019). An Algorithmic Framework for Geo-Distributed Analytics. In: Walrand, J., Zhu, Q., Hayel, Y., Jimenez, T. (eds) Network Games, Control, and Optimization. Static & Dynamic Game Theory: Foundations & Applications. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-10880-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-10880-9_6
Published: 08 February 2019
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-10879-3
Online ISBN: 978-3-030-10880-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

An Algorithmic Framework for Geo-Distributed Analytics

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

6 Appendix: Linearizing Multiplicative Constraints

6 Appendix: Linearizing Multiplicative Constraints

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation