Skip to main content

Social Locality in Data Storage

  • Chapter
  • First Online:
Data Storage for Social Networks

Part of the book series: SpringerBriefs in Optimization ((BRIEFSOPTI))

  • 831 Accesses

Abstract

The locality property in data storage can be interpreted in different ways. In Cassandra a column family is a group of columns that are frequently accessed together, e.g., name, address, phone number, and email address information. These columns therefore have the same row key resulting in their being stored on the same machine. Data locality of this kind is content-based. By social locality, we are refering to the data that are accessed by users that share some social relationship. Therefore, although these data may be content-wise unrelated, they are frequently queried together in an online social network and therefore should be stored in close proximity on disk. Another way to look at locality is in terms of geography. It may be desirable to store in the same server the data for those users that reside in the same geographic region (e.g., think Akamai).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Wattenhofer, R.P.: Farsite: federated, available, and reliable storage for an incompletely trusted environment. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation, OSDI ’02, pp. 1–14. ACM, New York, NY, USA (2002)

    Google Scholar 

  2. AirwideSolutions: Mobile social networking and the rise of the smart machines—2015ad. (2010) http://www.airwidesolutions.com/whitepapers/MobileSocialNetworking.pdf. White paper

  3. Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 222–231. ACM, New York, NY, USA (2004)

    Google Scholar 

  4. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pp. 335–350. USENIX Association, Seattle, Washington, USA (2006). http://dl.acm.org/citation.cfm?id=1298455.1298487

  5. Carrasco, B., Lu, Y., da Trindade, J.M.F.: Partitioning social networks for time-dependent queries. In: Proceedings of the 4th Workshop on Social Network Systems, SNS ’11, pp. 2:1–2:6. ACM, New York, NY, USA (2011). 10.1145/1989656.1989658. http://doi.acm.org/10.1145/1989656.1989658

  6. Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39(4), 12–27 (2011). 10.1145/1978915.1978919. http://doi.acm.org/10.1145/1978915.1978919

  7. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. In: Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation, OSDI ’06, 7, 15–15. USENIX Association, Seattle, WA, USA (2006). http://dl.acm.org/citation.cfm?id=1267308.1267323

  8. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, pp. 15–15. Berkeley, CA, USA (2006)

    Google Scholar 

  9. Curino, C., Zhang, Y., Jones, E.P.C., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3(1), 48–57 (2010)

    Google Scholar 

  10. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  11. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205–220 (2007)

    Article  Google Scholar 

  12. Scellato, S., Mascolo, C., Musolesi, M., Latora, V.: Distance matters: geo-social metrics for online social networks: salvatore scellato and cecilia mascolo and mirco musolesi and vito latora. In: Proceedings of the 3rd Workshop on Online Social Networks (WOSN ’10). pp. 8–8. USENIX Association, Boston, MA, USA (2010). http://dl.acm.org/citation.cfm?id=1863190.1863198

  13. Ehrgott, M., Gandibleux, X.: Multiple criteria optimization. State of the Art Annotated Bibliographic Surveys. Kluwer Academic, Dordrecht (2002)

    MATH  Google Scholar 

  14. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS Oper. Syst. Rev. 37 (2003)

    Google Scholar 

  15. Haghani, P., Michel, S., Cudré-Mauroux, P., Aberer, K.: LSH at large—distributed KNN search in high dimensions. 11th International Workshop on the Web and Databases (WebDB). Canada (2008). http://webdb2008.como.polimi.it/images/stories/WebDB2008/paper14.pdf

  16. Hewitt, E.: Cassandra: The Definitive Guide, 1st edn. O’Reilly Media, pages 332 Nov 29, (2010)

    Google Scholar 

  17. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 654–663. ACM, New York, NY, USA (1997)

    Google Scholar 

  18. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  19. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  20. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not. 35, 190–201 (2000)

    Article  Google Scholar 

  21. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)

    Article  Google Scholar 

  22. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. ACM Commun. 21, 558–565 (1978)

    Article  MATH  Google Scholar 

  23. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 631–640. ACM, New York, NY, USA (2010). http://doi.acm.org/10.1145/1772690.1772755. http://doi.acm.org/10.1145/1772690.1772755

  24. Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., Tomkins, A.: Geographic routing in social networks. Proc. Natl. Acad. Sci. USA 102(33), 11623–11628 (2005)

    Article  Google Scholar 

  25. Merkle, R.C.: A digital signature based on a conventional encryption function. In: A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology, CRYPTO ’87, pp. 369–378, Springer-Verlag, UK (1988). http://dl.acm.org/citation.cfm?id=646752.704751

  26. Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  27. Nguyen, K., Pham, C., Tran, D.A., Zhang, F.: Preserving social locality in data replication for social networks. In: IEEE ICDCS 2011 Workshop on Simplifying Complex Networks for Practitioners (SIMPLEX 2011). Minneapolis, MN (2011)

    Google Scholar 

  28. Niesen: Social networks/blogs now account for one in every four and a half minutes online. (2010) http://blog.nielsen.com/nielsenwire/global/social-media-accounts-for-22-percent-of-time-online/. Report

  29. Nishida, H., Nguyen, T.: Optimal client-server assignment for internet distributed systems. In: 20th International Conference on Computer Communications and Networks (ICCCN 2011). Maui, Hawaii, USA (2011)

    Google Scholar 

  30. Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, load balancing and efficient range query processing in dhts. In: Proceedings of the 10th international conference on Advances in Database Technology, EDBT ’06, pp. 131–148, Munich, Germany, Springer-Verlag, Heidelberg (2006). http://dx.doi.org/10.1007/11687238_11

  31. Pitoura, T., Triantafillou, P.: Load distribution fairness in p2p data management systems. Data Engineering, International Conference on 0, 396–405 (2007). http: //doi.ieeecomputersociety.org/10.1109/ICDE.2007.367885

    Google Scholar 

  32. Pujol, J.M., Erramilli, V., Siganos, G., Yang, X., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could: scaling online social networks. In: Proceedings of the ACM SIGCOMM 2010 Conference, pp. 375–386. ACM, New York, NY, USA (2010)

    Google Scholar 

  33. Reiher, P.L., Heidemann, J.S., Ratner, D., Skinner, G., Popek, G.J.: Resolving file conflicts in the ficus file system. In: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, USTC ’94, 1, 183–195, Boston, Massachusetts, USENIX Association, USA (1994). http://dl.acm.org/citation.cfm?id=1267257.1267269

  34. Renesse, R., Dumitriu, D., Gough, V., Thomas, C.: Efficient reconciliation and flow control for anti-entropy protocols. In: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, LADIS ’08, pp. 6:1–6:7. ACM, Yorktown Heights, New York, USA (2008). http://doi.acm.org/10.1145/1529974.1529983

  35. Rowstron, A., Druschel, P.: Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. SIGOPS Oper. Syst. Rev. 35, 188–201 (2001)

    Article  Google Scholar 

  36. Satyanarayanan, M., Kistler, J.J., Kumar, P., Okasaki, M.E., Siegel, E.H., Steere, D.C.: Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 447–459 (1990)

    Article  Google Scholar 

  37. Terry, D.B., Theimer, M.M., Petersen, K., Demers, A.J., Spreitzer, M.J., Hauser, C.H.: Managing update conflicts in bayou, a weakly connected replicated storage system. SIGOPS Oper. Syst. Rev. 29, 172–182 (1995)

    Article  Google Scholar 

  38. Tran, D.A., Nguyen, K., Pham, C.: S-CLONE: Socially-aware data replication for social networks. Computer Networks 56(7), 2001–2013. Elsevier North-Holland, Inc., USA (2012). http://dx.doi.org/10.1016/j.comnet.2012.02.010

  39. Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN ’09, pp. 37–42. ACM, Barcelona, Spain, USA (2009). http://doi.acm.org/10.1145/1592665.1592675

  40. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Giannakoglou, K., et al. (eds.) Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), pp. 95–100. International Center for Numerical Methods in Engineering (CIMNE) (2002)

    Google Scholar 

  41. Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms—a comparative case study. In: Proceedings of the 5th International Conference on Parallel Problem Solving from Nature, PPSN V, pp. 292–304. Springer-Verlag, London, UK (1998). http://dl.acm.org/citation.cfm?id=645824.668610

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Duc A. Tran

About this chapter

Cite this chapter

Tran, D.A. (2012). Social Locality in Data Storage. In: Data Storage for Social Networks. SpringerBriefs in Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4636-1_2

Download citation

Publish with us

Policies and ethics