Geo-Scale Transaction Processing
Geo-Scale Transaction Processing considers the processing of transactions on nodes that are separated by wide-area links.
Replication and distribution of data across nodes have been used for various objectives, such as fault tolerance, load balancing, read availability, and others. This practice dates back to the early days of computing (Kemme et al. 2010; Bernstein and Goodman 1981) and continues to develop to accommodate the development and advancement of new computing technologies. The availability of publicly accessible cloud resources that are dispersed around the world has allowed the replication and distribution of data across large distances, potentially covering many continents. This is denoted a geo-scale deployment and allows achieving higher levels of the objectives of replication and distribution. For example, geo-scale deployments can tolerate data center-scale failures and disasters that cover large geographic regions. Also, geo-scale deployments can provide faster access to users across the globe. In addition to the classical objectives of replication and distribution, geo-scale deployment allows a better utilization of cloud resources, as it enables the use of resources from multiple cloud providers and cloud locations. Typically, different cloud providers and locations offer varying services with diverse characteristics. A geo-scale deployment allows leveraging heterogeneous cloud resources to optimize various performances and monetary objectives.
With the increasing distance between nodes, the data management tasks to provide access to data become more challenging. One essential data management task for access is transaction processing. A transaction is an abstraction of data that consists of a collection of operations, typically reads and writes, and that is guaranteed to execute with certain defined properties. These properties include ones about consistency, recoverability, and others, and each property has many flavors, defining the strictness of the provided guarantees. The cost of providing various guarantees increases significantly in geo-scale deployment due to the wide-area link between nodes. This is due to the communication needed between nodes to coordinate access while preserving defined guarantees. This has stimulated interest in developing new transaction processing technologies for geo-scale deployments that ameliorate the cost of wide-area communication. This includes new designs that achieve classical guarantees more efficiently in geo-scale environments and new explorations on transactional access guarantees that are more feasible in geo-scale environments.
Key Research Findings
Geo-Scale Transaction Processing includes work that covers various consistency guarantees. Next is an overview of key research findings for different levels of transactional consistency guarantees. Related research findings on Geo-Scale Transaction Processing follows.
Strongly Consistent Transactions
Serializability (Bernstein et al. 1987) is a strong consistency guarantee that ensures that the outcome of concurrent transaction execution is equivalent to some serial execution of these transactions. It has been extensively studied in the context of database systems and is considered one of the strongest guarantees on consistency that enables abstracting out the complexities of concurrency and replication, thus ridding users from thinking about these complexities. However, serializability is a strong consistency guarantee that requires extensive coordination between nodes executing transactions and nodes hosting data copies. This makes it especially challenging in a geo-scale deployment, where coordination is across wide-area links. To overcome the expensive cost of wide-area coordination, a redesign of well-known transaction processing protocols and approaches has been conducted in addition to the design of novel transaction processing protocols specifically for geo-scale deployments.
Paxos (Lamport 1998) is a fault-tolerant consensus protocol that has been used widely to implement transaction processing systems that provide strong guarantees. In these implementations, Paxos provides a state machine replication (SMR) component that guarantees the totally ordered execution of requests (e.g., transactions). Due to its wide use in data management systems, there have been many attempts to build Paxos-based protocols for Geo-Scale Transaction Processing, such as megastore (Baker et al. 2011), Paxos-CP (Patterson et al. 2012), Egalitarian Paxos (Moraru et al. 2013), and MDCC (Kraska et al. 2013; Pang et al. 2014).
Megastore (Baker et al. 2011) uses Paxos to execute transactions on multiple nodes in different data centers. Paxos is used to assign transactions to log positions and to make all nodes agree on the assigned order. Then, each node executes the transactions according to the assigned order. This guarantees that all nodes maintain a consistent copy of data, since they all execute transactions in the same order. Assigning the position for a request requires a round of communication, at least, between a Paxos leader and other nodes. Paxos-CP (Patterson et al. 2012) extends megastore by increasing the utilization of Paxos communication. Specifically, in the presence of concurrent transactions, Paxos-CP allows combining transactions in the same log position rather than starting a new round of messaging.
MDCC (Kraska et al. 2013; Pang et al. 2014) leverages a follow-up Paxos variant, called Fast Paxos (Lamport 2006) that exhibits better performance characteristics for geo-scale deployments compared to the original Paxos protocol. Fast Paxos, unlike the original Paxos protocol, is a leaderless protocol, meaning that leader election, a major component of Paxos’ design, is not needed. Therefore, for users at different locations, it is possible for all of them to process transactions without having to go through a leader. This comes at the cost of larger quorums (i.e., groups of nodes) that are needed to process requests. In Paxos, a simple majority suffices to process a request (from a leader), whereas Fast Paxos requires a fast quorum that is typically larger than a majority of nodes. Egalitarian Paxos (EPaxos) (Moraru et al. 2013) proposes a new variant of Paxos that combine many attractive features for geo-scale deployments. First, EPaxos is a leaderless protocol; thus, it enjoys the of leaderless protocols in comparison to the original Paxos protocol as we discuss previously for Fast Paxos. Additionally, EPaxos reduces the quorum size to be smaller than what is typically needed by MDCC and achieves the optimal quorum size (i.e., three nodes) of the widely used case of five nodes. EPaxos also utilizes many useful features to increase concurrency, such as allowing nonconflicting operations to execute concurrently. This is done via a multidimensional log, similar to generalized Paxos (Lamport 2005) rather than a simple log of requests that enforce ordering events even if they do not conflict.
Paxos was also shown to be suitable to act as a synchronous communication layer integrated with a transaction commit protocol (Corbett et al. 2012; Glendenning et al. 2011; Mahmoud et al. 2013). For example, Spanner (Corbett et al. 2012) and Replicated Commit (Mahmoud et al. 2013) commit using variants of two-phase commit (2PC) and strict two-phase locking (S2PL) and leverage Paxos to replicate across data centers. Spanner partitions data and assigns a leader for each partition. Each partition is synchronously replicated, for fault tolerance, to a set of replicas using Paxos. Transactions that span more than one partition are executed by coordinating between the corresponding partition leaders. This architecture separates consistency from durability, allowing optimizing each in isolation. Also, transactions that access a single partition are not subject to the multi-partition coordination. Replicated Commit executes transactions by getting votes from a majority of data centers. A data center’s positive vote is a promise to not allow the execution of conflicting transactions. Once a majority positive votes are collected, the transaction proceeds to commit. Replicated Commit conflates consistency and durability, which reduces the amount of redundant messages. Additionally, since there are no leaders, Replicated Commit exhibits better load balancing characteristics. However, a majority vote is always needed for all transactions, even ones that access a single partition.
To ameliorate the cost of wide-area communication, proactive coordination techniques were developed that exchange data continuously to enable early detection of conflicts (Nawab et al. 2013, 2015b). Helios (Nawab et al. 2015b), a work based on proactive coordination, finds the optimal transaction commit latency via a theoretical model of coordination and proposes a protocol design that achieves the optimal latency. Helios shows that for any pair of nodes, the sum of their transaction latency cannot be lower than the round-trip time latency between them (Nawab et al. 2015b).
Timestamp-based and time synchronization techniques have been explored for Geo-Scale Transaction Processing systems (Corbett et al. 2012; Du et al. 2013a,b). Although the distance between data centers may be large, Spanner (Corbett et al. 2012) shows that accurate time synchronization is achievable using specialized infrastructure such as atomic clocks and GPS. Without specialized hardware, accurate time synchronization is challenging. However, other geo-scale systems explored the use of loosely synchronized clock methods (Du et al. 2013a,b; Nawab et al. 2015b).
Weak and Relaxed Transactional Semantics
A major source of performance overhead in Geo-Scale Transaction Processing protocols is due to the coordination between nodes to guarantee consistent outcomes. To overcome the overhead of coordination, many lines of work investigates the trade-off between consistency and performance. In such investigations, consistency is relaxed or weakened to enable higher levels of performance. Central to this study is to understand the implications of relaxing and weakening consistency and what this means to the users and application developers. For example, application developers may have to cope with anomalies that arise due to the weaker consistency levels.
An extreme point in the performance-consistency trade-off spectrum is the weakening of consistency and isolation guarantees in favor of performance (Cooper et al. 2008; DeCandia et al. 2007; Bailis and Ghodsi 2013). Adopting this approach entails weakening the guarantees of how conflicts between requests are handled. For example, eventual consistency delays the arbitration of conflicts and only guarantees that conflicts are resolved eventually – thus creating the possibility of temporary inconsistency. Also, this approach may entail weakening the access abstractions, such as providing single-key access abstractions rather than multi-object transactions. This limits the interface to application developers; however, it is easier to manage and enables higher levels of performance.
In addition to the two extremes in the consistency-performance trade-off spectrum, there are consistency levels in between that exhibit various performance and consistency characteristics. Exploring consistency levels in the context of Geo-Scale Transaction Processing aimed to overcome the high cost of wide-area coordination while providing some consistency guarantees for application developers. To this end, there have been many efforts to study weaker consistency guarantees and then extending them to adapt to geo-scale environments. Causal consistency, that is inspired by the causal ordering principle (Lamport 1978), is one such consistency level. Causal consistency guarantees that events (e.g., transactions) are ordered at all nodes according to their causal relations. Causal relations in this context denote ordering guarantees between events at the same node (total order), ordering of events at a node before any subsequent received events (happens-before relation), and ordering due to transitive total order or happens-before causal relations. There have been many Geo-Scale Transaction Processing systems based on causal consistency (Lloyd et al. 2011, 2013; Bailis et al. 2013; Nawab et al. 2015a; Du et al. 2013a). An example is COPS (Lloyd et al. 2011, 2013) which extends causal consistency with a convergence guarantee and call it Causal+ consistency. COPS provides a read-only transactional interface that guarantee Causal+ execution via a two-round protocol.
Another relaxed consistency guarantee that was explored for Geo-Scale Transaction Processing is snapshot isolation (SI) (Berenson et al. 1995) which guarantees that a transaction reads a consistent snapshot of data and that write-write conflicts are detected. Geo-Scale Transaction Processing systems have been built by extending the concept of SI (Sovran et al. 2011; Daudjee and Salem 2006; Lin et al. 2005, 2007; Du et al. 2013b). An example is Walter (Sovran et al. 2011) which builds upon SI and proposes Parallel SI (PSI). PSI relaxes the snapshot and write-write conflict definitions to allow replicating data asynchronously between data centers, which is not possible in SI.
Another approach to tackle the consistency-performance trade-off is to explore classes of transactions that can be executed without coordination while maintaining consistency. This approach is called coordination-free processing and has been explored in the context of Geo-Scale Transaction Processing (Bailis et al. 2014; Roy et al. 2015; Zhang et al. 2013). An example of this work is transaction chains (Zhang et al. 2013) that analyze an application’s transactions and build an execution plan for distributed transactions with the objective of fast response time, equivalent to executing at a single data center with no wide-area communication.
The topology of the geo-scale system plays a significant role in transaction performance. Therefore, placement of data and workers on carefully chosen data centers has the potential of optimizing performance. Placement has been tackled in the context of geo-scale systems (Zakhary et al. 2016, 2018; Wu et al. 2013; Vulimiri et al. 2015; Agarwal et al. 2010; Endo et al. 2011; Pu et al. 2015; Shankaranarayanan et al. 2014; Sharov et al. 2015). Some of these placement frameworks specifically target optimizing placement for Geo-Scale Transaction Processing (Zakhary et al. 2016, 2018; Sharov et al. 2015) For example, (Sharov et al. 2015) propose an optimization formulation for placement with an objective of minimizing latency for leader-based transaction processing systems such as Spanner (Corbett et al. 2012). Zakhary et al. (2016, 2018) propose placement formulations for quorum-based transaction processing systems such as Paxos-based systems.
Managing the Consistency-Performance Trade-Off
Rather than a fixed Geo-Scale Transaction Processing protocol, there have been studies on dynamic protocols that adapt their consistency level and protocol characteristics based on the current performance (Wu et al. 2015; Terry et al. 2013; Ardekani and Terry 2014). For example, CosTLO (Wu et al. 2015) introduces redundancy in messaging to lower the variance of latency. Another example is Pileus and Tuba (Terry et al. 2013; Ardekani and Terry 2014) that enables applications to dynamically control the consistency level by declaring consistency and latency priorities. According to these priorities, the system adapts its protocol.
Geo-Scale Transaction Processing provides easy-to-use abstractions of access to data that is globally replicated or distributed. One of the major challenges faced by Geo-Scale Transaction Processing is the large wide-area latency between nodes. To overcome this challenge, redesigns of transaction processing systems are proposed, and studies of the consistency-performance trade-off are conducted. These approaches are shown to optimize the performance of Geo-Scale Transaction Processing by considering the wide-area latency as the main bottleneck.
Examples of Application
Geo-Scale Transaction Processing is used by web and cloud applications.
Future Directions for Research
An avenue of future research is to expand beyond data centers and build Geo-Scale Transaction Processing protocols that utilize edge resources.
- Agarwal S, Dunagan J, Jain N, Saroiu S, Wolman A, Bhogan H (2010) Volley: automated data placement for geo-distributed cloud services. In: Proceedings of the 7th USENIX conference on networked systems design and implementation, USENIX association, NSDI’10, Berkeley, pp 2–2. http://dl.acm.org/citation.cfm?id=1855711.1855713
- Ardekani MS, Terry DB (2014) A self-configurable geo-replicated cloud storage system. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, USENIX association, OSDI’14, Berkeley, pp 367–381. http://dl.acm.org/citation.cfm?id=2685048.2685077
- Bailis P, Ghodsi A (2013) Eventual consistency today: limitations, extensions, and beyond. Queue 11(3):20:20–20:32. http://doi.acm.org/10.1145/2460276.2462076
- Baker J, Bond C, Corbett JC, Furman JJ, Khorlin A, Larson J, Leon J, Li Y, Lloyd A, Yushprakh V (2011) Megastore: providing scalable, highly available storage for interactive services. In: CIDR 2011, Fifth biennial conference on innovative data systems research, Asilomar, pp 223–234, 9–12 Jan 2011. Online Proceedings. http://cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
- Berenson H, Bernstein P, Gray J, Melton J, O’Neil E, O’Neil P (1995) A critique of ansi SQL isolation levels, pp 1–10. http://doi.acm.org/10.1145/223784.223785
- Bernstein PA, Hadzilacos V, Goodman N (1987) Concurrency control and recovery in database systems. Addison-Wesley, ReadingGoogle Scholar
- Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D (2012) Spanner: Google’s globally-distributed database, pp 251–264. http://dl.acm.org/citation.cfm?id=2387880.2387905
- Daudjee K, Salem K (2006) Lazy database replication with snapshot isolation. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB’06, pp 715–726. http://dl.acm.org/citation.cfm?id=1182635.1164189
- DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles, SOSP’07. ACM, New York, pp 205–220. http://doi.acm.org/10.1145/1294261.1294281 CrossRefGoogle Scholar
- Du J, Elnikety S, Roy A, Zwaenepoel W (2013a) Orbe: scalable causal consistency using dependency matrices and physical clocks. In: Proceedings of the 4th annual symposium on cloud computing, SOCC’13. ACM, New York, pp 11:1–11:14. http://doi.acm.org/10.1145/2523616.2523628
- Du J, Elnikety S, Zwaenepoel W (2013b) Clock-si: Snapshot isolation for partitioned data stores using loosely synchronized clocks. In: Proceedings of the 2013 IEEE 32nd international symposium on reliable distributed systems, SRDS’13. IEEE Computer Society, Washington, pp 173–184, http://dx.doi.org/10.1109/SRDS.2013.26 CrossRefGoogle Scholar
- Kemme B, Jimenez-Peris R, Patino-Martinez M (2010) Database replication. Synth Lect Data Manage 2(1):1–153. http://www.morganclaypool.com/doi/abs/10.2200/S00296ED1V01Y201008DTM007 CrossRefMATHGoogle Scholar
- Lamport L (2005) Generalized consensus and paxos. Technical report, MSR-TR-2005-33, Microsoft ResearchGoogle Scholar
- Lin Y, Kemme B, Patiño Martínez M, Jiménez-Peris R (2005) Middleware based data replication providing snapshot isolation. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD’05. ACM, New York, pp 419–430. http://doi.acm.org/10.1145/1066157.1066205 CrossRefGoogle Scholar
- Lin Y, Kemme B, Patino-Martinez M, Jimenez-Peris R (2007) Enhancing edge computing with database replication. In: Proceedings of the 26th IEEE international symposium on reliable distributed systems, SRDS’07. IEEE Computer Society, Washington, pp 45–54. http://dl.acm.org/citation.cfm?id=1308172.1308219 Google Scholar
- Lloyd W, Freedman MJ, Kaminsky M, Andersen DG (2011) Don’t settle for eventual: scalable causal consistency for wide-area storage with cops. In: Proceedings of the twenty-third ACM symposium on operating systems principles, SOSP’11. ACM, New York, pp 401–416. http://doi.acm.org/10.1145/2043556.2043593 CrossRefGoogle Scholar
- Lloyd W, Freedman MJ, Kaminsky M, Andersen DG (2013) Stronger semantics for low-latency geo-replicated storage. In: Proceedings of the 10th USENIX conference on networked systems design and implementation, NSDI’13. USENIX Association, Berkeley, pp 313–328. http://dl.acm.org/citation.cfm?id=2482626.2482657 Google Scholar
- Nawab F, Agrawal D, El Abbadi A (2013) Message futures: fast commitment of transactions in multi-datacenter environments. In: CIDR 2013, sixth biennial conference on innovative data systems research, Asilomar, 6–9 Jan 2013. Online Proceedings. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper103.pdf
- Nawab F, Arora V, Agrawal D, El Abbadi A (2015a) Chariots: a scalable shared log for data management in multi-datacenter cloud environments. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, 23–27 Mar 2015, pp 13–24. https://doi.org/10.5441/002/edbt.2015.03
- Nawab F, Arora V, Agrawal D, El Abbadi A (2015b) Minimizing commit latency of transactions in geo-replicated data stores. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD’15. ACM, New York, pp 1279–1294. http://doi.acm.org/10.1145/2723372.2723729 Google Scholar
- Pang G, Kraska T, Franklin MJ, Fekete A (2014) Planet: making progress with commit processing in unpredictable environments. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD’14. ACM, New York, pp 3–14. http://doi.acm.org/10.1145/2588555.2588558 Google Scholar
- Pu Q, Ananthanarayanan G, Bodik P, Kandula S, Akella A, Bahl P, Stoica I (2015) Low latency geo-distributed data analytics. In: Proceedings of the 2015 ACM conference on special interest group on data communication, SIGCOMM’15. ACM, New York, pp 421–434. http://doi.acm.org/10.1145/2785956.2787505 Google Scholar
- Roy S, Kot L, Bender G, Ding B, Hojjat H, Koch C, Foster N, Gehrke J (2015) The homeostasis protocol: avoiding transaction coordination through program analysis. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD’15. ACM, New York, pp 1311–1326. http://doi.acm.org/10.1145/2723372.2723720 Google Scholar
- Shankaranarayanan PN, Sivakumar A, Rao S, Tawarmalani M (2014) Performance sensitive replication in geo-distributed cloud datastores. In: Proceedings of the 2014 44th annual IEEE/IFIP international conference on dependable systems and networks, DSN’14. IEEE Computer Society, Washington, pp 240–251. http://dx.doi.org/10.1109/DSN.2014.34 CrossRefGoogle Scholar
- Terry DB, Prabhakaran V, Kotla R, Balakrishnan M, Aguilera MK, Abu-Libdeh H (2013) service level agreements for cloud storage. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP’13. ACM, New York, pp 309–324. http://doi.acm.org/10.1145/2517349.2522731 CrossRefGoogle Scholar
- Vulimiri A, Curino C, Godfrey PB, Jungblut T, Padhye J, Varghese G (2015) Global analytics in the face of bandwidth and regulatory constraints. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 323–336. http://dl.acm.org/citation.cfm?id=2789770.2789793 Google Scholar
- Wu Z, Butkiewicz M, Perkins D, Katz-Bassett E, Madhyastha HV (2013) Spanstore: cost-effective geo-replicated storage spanning multiple cloud services. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP’13. ACM, New York, pp 292–308. http://doi.acm.org/10.1145/2517349.2522730 CrossRefGoogle Scholar
- Wu Z, Yu C, Madhyastha HV (2015) Costlo: cost-effective redundancy for lower latency variance on cloud storage services. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 543–557. http://dl.acm.org/citation.cfm?id=2789770.2789808 Google Scholar
- Zakhary V, Nawab F, Agrawal D, El Abbadi A (2018) Global-scale placement of transactional data stores. In: Proceedings of the 2018 international conference on extending database technology, EDBT’18Google Scholar
- Zhang Y, Power R, Zhou S, Sovran Y, Aguilera MK, Li J (2013) Transaction chains: achieving serializability with low latency in geo-distributed storage systems. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP’13. ACM, New York, pp 276–291. http://doi.acm.org/10.1145/2517349.2522729 CrossRefGoogle Scholar