On the Expressiveness and Trade-Offs of Large Scale Tuple Stores

  • Ricardo Vilaça
  • Francisco Cruz
  • Rui Oliveira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6427)


Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing. This is being acknowledged by several major Internet players embracing the cloud computing model and offering first generation distributed tuple stores.

Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements. Furthermore, while availability is commonly assumed to be sustained by the massive scale itself, data consistency and freshness is usually severely hindered. By doing so, these services focus on a specific narrow trade-off between consistency, availability, performance, scale, and migration cost, that is much less attractive to common business needs.

In this paper we introduce DataDroplets, a novel tuple store that shifts the current trade-off towards the needs of common business users, providing additional consistency guarantees and higher level data processing primitives smoothing the migration path for existing applications. We present a detailed comparison between DataDroplets and existing systems regarding their data model, architecture and trade-offs. Preliminary results of the system’s performance under a realistic workload are also presented.


Peer-to-Peer DHT Cloud Computing Dependability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amazon: Amazon WebServices (June 2010),
  2. 2.
    Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)Google Scholar
  3. 3.
    Boyd, D., Golder, S., Lotan, G.: Tweet tweet retweet: Conversational aspects of retweeting on twitter. In: Society, I.C. (ed.) Proceedings of HICSS-43 (January 2010)Google Scholar
  4. 4.
    Brewer, E.A.: Towards robust distributed systems (abstract). In: Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, PODC 2000, p. 7. ACM, New York (2000)CrossRefGoogle Scholar
  5. 5.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, USENIX Association, OSDI 2006, Berkeley, CA, USA, pp. 205–218 (2006)Google Scholar
  6. 6.
    Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. In: Proc. VLDB Endow, vol. 1(2), pp. 1277–1288 (2008)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation, OSDI 2004, San Francisco, CA (December 2004)Google Scholar
  8. 8.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 205–220. ACM, New York (2007)CrossRefGoogle Scholar
  9. 9.
    Fox, A., Gribble, S.D., Chawathe, Y., Brewer, E.A., Gauthier, P.: Cluster-based scalable network services. SIGOPS Oper. Syst. Rev. 31(5), 78–91 (1997)CrossRefGoogle Scholar
  10. 10.
    Galuba, W., Aberer, K., Despotovic, Z., Kellerer, W.: Protopeer: From simulation to live deployment in one step. In: Eighth International Conference on Peer-to-Peer Computing, 2008. P2P 2008, pp. 191–192 (September 2008)Google Scholar
  11. 11.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRefGoogle Scholar
  12. 12.
    Google: Google App Engine (June 2010),
  13. 13.
    Google: Google App Engine Datastore (June 2010),
  14. 14.
    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  15. 15.
    Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: WebKDD/SNA-KDD 2007: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pp. 56–65. ACM, New York (2007)Google Scholar
  16. 16.
    Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: WOSP 2008: Proceedings of the first workshop on Online social networks, pp. 19–24. ACM, New York (2008)CrossRefGoogle Scholar
  17. 17.
    Lakshman, A., Malik, P.: Cassandra - A Decentralized Structured Storage System. In: SOSP Workshop on Large Scale Distributed Systems and Middleware (LADIS), Big Sky, MT (Ocotber 2009)Google Scholar
  18. 18.
    Meijer, E., Beckman, B., Bierman, G.: Linq: reconciling object, relations and xml in framework. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 706–706. ACM, New York (2006)CrossRefGoogle Scholar
  19. 19.
    Nadkarni, P., Brandt, C.: Data extraction and ad hoc query of an entity-attribute-value database. Journal of the American Medical Informatics Association 5(6), 511–527 (1998)CrossRefGoogle Scholar
  20. 20.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008)CrossRefGoogle Scholar
  21. 21.
    Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Sci. Program. 13(4), 277–298 (2005)Google Scholar
  22. 22.
    Skillicorn, D.: The case for datacentric grids. Tech. Rep. ISSN-0836-0227-2001-451, Department of Computing and Information Science, Queen’s University (November 2001)Google Scholar
  23. 23.
    Sousa, A., Pereira, J., Soares, L., Jr., A.C., Rocha, L., Oliveira, R., Moura, F.: Testing the Dependability and Performance of Group Communication Based Database Replication Protocols. In: International Conference on Dependable Systems and Networks (DSN 2005) (June 2005)Google Scholar
  24. 24.
    Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)Google Scholar
  25. 25.
    Twitter.: Twitter API documentation (March 2010),
  26. 26.
    Vilaca, R., Oliveira, R., Pereira, J.: A correlation-aware data placement strategy for key-value stores. Tech. Rep. DI-CCTC-10-08, CCTC Research Centre, Universidade do Minho (2010),
  27. 27.
    Vilaça, R., Oliveira, R.: Clouder: a flexible large scale decentralized object store: architecture overview. In: Proceedings of the Third Workshop on Dependable Distributed Data Management, WDDDM 2009, pp. 25–28. ACM, New York (2009)CrossRefGoogle Scholar
  28. 28.
    Xiongpai, Q., Wei, C., Shan, W.: Simulation of main memory database parallel recovery. In: Proceedings of the 2009 Spring Simulation Multiconference, SpringSim 2009, San Diego, CA, USA, pp. 1–8 (2009)Google Scholar
  29. 29.
    Zhong, M., Shen, K., Seiferas, J.: Correlation-aware object placement for multi-object operations. In: The 28th International Conference on Distributed Computing Systems, ICDCS 2008, pp. 512–521. IEEE Computer Society, Washington (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ricardo Vilaça
    • 1
  • Francisco Cruz
    • 1
  • Rui Oliveira
    • 1
  1. 1.Computer Science and Technology CenterUniversidade do MinhoBragaPortugal

Personalised recommendations