Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic ContentWeb Sites
Dynamic content Web sites consist of a front-end Web server, an application server and a back-end database. In this paper we introduce distributed versioning, a new method for scaling the back-end database through replication. Distributed versioning provides both the consistency guarantees of eager replication and the scaling properties of lazy replication. It does so by combining a novel concurrency control method based on explicit versions with conflict-aware query scheduling that reduces the number of lock conflicts. We evaluate distributed versioning using three dynamic content applications: the TPC-W e-commerce benchmark with its three workload mixes, an auction site benchmark, and a bulletin board benchmark.We demonstrate that distributed versioning scales better than previous methods that provide consistency. Furthermore, we demonstrate that the bene.ts of relaxing consistency are limited, except for the conflict-heavy TPC-W ordering mix.
KeywordsApplication Server Consistency Model Bulletin Board Version Number Dynamic Content
- 1.C. Amza, E. Cecchet, A. Chanda, A. Cox, S. Elnikety, R. Gil, J. Marguerite, K. Rajamani, and W. Zwaenepoel. Specification and implementation of dynamic web site benchmarks. In 5th IEEE Workshop on Workload Characterization, November 2002.Google Scholar
- 2.C. Amza, A. Cox, and W. Zwaenepoel. Scaling and availability for dynamic content websites. Technical Report TR02-395, Rice University, 2002.Google Scholar
- 3.Cristiana Amza, Alan Cox, and Willy Zwaenepoel. Conflict-Aware Scheduling for Dynamic Content Applications. In Proceedings of the Fifth USENIX Symposium on Internet Technologies and Systems, March 2003.Google Scholar
- 4.Todd Anderson, Yuri Breitbart, Henry F. Korth, and Avishai Wool. Replication, consistency, and practicality: are these mutually exclusive? In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data: June, pages 484–495, 1998.Google Scholar
- 5.The Apache Software Foundation. http://www.apache.org/.
- 7.P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, Massachusetts, 1987.Google Scholar
- 8.L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the IEEE Infocom Conference, 1999.Google Scholar
- 9.Shu-Yao Chien, Vassilis J. Tsotras, and Carlo Zaniolo. Efficient management of multiversion documents by object referencing. In The VLDB Journal, pages291–300, 2001.Google Scholar
- 10.On-line auctions at eBay. http://ebay.com.
- 11.Jim Gray, Pat Helland, Patrick O’Neil, and Dennis Shasha. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages173–182, 1996.Google Scholar
- 12.H.V. Jagadish, Inderpal Singh Mumick, and Michael Rabinovich. Asynchronous Version Advancement in a Distributed Three-Version Database. In Proceedings of the 14th International Conference on Data Engineering, 1998.Google Scholar
- 13.P. Keleher. Decentralized replicated-object protocols. In Proc. of the 18th Annual ACM Symp on Principles of Distributed Computing (PODC’99), May 1999.Google Scholar
- 14.Bettina Kemme and Gustavo Alonso. Don’t be lazy, be consistent: Postgres-R, a new way to implement Database Replication. In Proceedings of the 26th International Conference on Very Large Databases, September 2000.Google Scholar
- ]15.MySQL. http://www.mysql.com.
- 16.PHP Hypertext Preprocessor. http://www.php.net.
- 17.Postgres. http://www.postgresql.org/docs.
- 18.O. T. Satyanarayanan and Divyakant Agrawal. Efficient execution of read-only transactions in replicated multiversion databases. In TKDE, volume 5, 859–871, 1993.Google Scholar
- 20.Kai Shen, Tao Yang, Lingkun Chu, JoAnne L. Holliday, Doug Kuschner, and Huican Zhu. Neptune: Scalable Replica Management and Programming Support for Cluster-based Network Services. In Proceedings of the Third USENIX Symposium on Internet Technologies and Systems, pages 207–216, March 2001.Google Scholar
- 21.Slashdot: News for Nerds. Stuff that Matters. http://slashdot.org.
- 22.D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings 15th Symposium on Operating Systems Principles, pages 172–183, December 1995.Google Scholar
- 23.Transaction Processing Council. http://www.tpc.org/.
- 24.M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Database replication techniques: a three parameter classification. In Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems (SRDS2000), October 2000.Google Scholar
- 25.Haifeng Yu and Amin Vahdat. Design and evaluation of a continuous consistency model for replicated services. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI), October 2000.Google Scholar