Abstract
The locality property in data storage can be interpreted in different ways. In Cassandra a column family is a group of columns that are frequently accessed together, e.g., name, address, phone number, and email address information. These columns therefore have the same row key resulting in their being stored on the same machine. Data locality of this kind is content-based. By social locality, we are refering to the data that are accessed by users that share some social relationship. Therefore, although these data may be content-wise unrelated, they are frequently queried together in an online social network and therefore should be stored in close proximity on disk. Another way to look at locality is in terms of geography. It may be desirable to store in the same server the data for those users that reside in the same geographic region (e.g., think Akamai).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Wattenhofer, R.P.: Farsite: federated, available, and reliable storage for an incompletely trusted environment. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation, OSDI ’02, pp. 1–14. ACM, New York, NY, USA (2002)
AirwideSolutions: Mobile social networking and the rise of the smart machines—2015ad. (2010) http://www.airwidesolutions.com/whitepapers/MobileSocialNetworking.pdf. White paper
Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 222–231. ACM, New York, NY, USA (2004)
Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pp. 335–350. USENIX Association, Seattle, Washington, USA (2006). http://dl.acm.org/citation.cfm?id=1298455.1298487
Carrasco, B., Lu, Y., da Trindade, J.M.F.: Partitioning social networks for time-dependent queries. In: Proceedings of the 4th Workshop on Social Network Systems, SNS ’11, pp. 2:1–2:6. ACM, New York, NY, USA (2011). 10.1145/1989656.1989658. http://doi.acm.org/10.1145/1989656.1989658
Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39(4), 12–27 (2011). 10.1145/1978915.1978919. http://doi.acm.org/10.1145/1978915.1978919
Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. In: Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation, OSDI ’06, 7, 15–15. USENIX Association, Seattle, WA, USA (2006). http://dl.acm.org/citation.cfm?id=1267308.1267323
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, pp. 15–15. Berkeley, CA, USA (2006)
Curino, C., Zhang, Y., Jones, E.P.C., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3(1), 48–57 (2010)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205–220 (2007)
Scellato, S., Mascolo, C., Musolesi, M., Latora, V.: Distance matters: geo-social metrics for online social networks: salvatore scellato and cecilia mascolo and mirco musolesi and vito latora. In: Proceedings of the 3rd Workshop on Online Social Networks (WOSN ’10). pp. 8–8. USENIX Association, Boston, MA, USA (2010). http://dl.acm.org/citation.cfm?id=1863190.1863198
Ehrgott, M., Gandibleux, X.: Multiple criteria optimization. State of the Art Annotated Bibliographic Surveys. Kluwer Academic, Dordrecht (2002)
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS Oper. Syst. Rev. 37 (2003)
Haghani, P., Michel, S., Cudré-Mauroux, P., Aberer, K.: LSH at large—distributed KNN search in high dimensions. 11th International Workshop on the Web and Databases (WebDB). Canada (2008). http://webdb2008.como.polimi.it/images/stories/WebDB2008/paper14.pdf
Hewitt, E.: Cassandra: The Definitive Guide, 1st edn. O’Reilly Media, pages 332 Nov 29, (2010)
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 654–663. ACM, New York, NY, USA (1997)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not. 35, 190–201 (2000)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. ACM Commun. 21, 558–565 (1978)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 631–640. ACM, New York, NY, USA (2010). http://doi.acm.org/10.1145/1772690.1772755. http://doi.acm.org/10.1145/1772690.1772755
Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., Tomkins, A.: Geographic routing in social networks. Proc. Natl. Acad. Sci. USA 102(33), 11623–11628 (2005)
Merkle, R.C.: A digital signature based on a conventional encryption function. In: A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology, CRYPTO ’87, pp. 369–378, Springer-Verlag, UK (1988). http://dl.acm.org/citation.cfm?id=646752.704751
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006)
Nguyen, K., Pham, C., Tran, D.A., Zhang, F.: Preserving social locality in data replication for social networks. In: IEEE ICDCS 2011 Workshop on Simplifying Complex Networks for Practitioners (SIMPLEX 2011). Minneapolis, MN (2011)
Niesen: Social networks/blogs now account for one in every four and a half minutes online. (2010) http://blog.nielsen.com/nielsenwire/global/social-media-accounts-for-22-percent-of-time-online/. Report
Nishida, H., Nguyen, T.: Optimal client-server assignment for internet distributed systems. In:Â 20th International Conference on Computer Communications and Networks (ICCCN 2011). Maui, Hawaii, USA (2011)
Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, load balancing and efficient range query processing in dhts. In: Proceedings of the 10th international conference on Advances in Database Technology, EDBT ’06, pp. 131–148, Munich, Germany, Springer-Verlag, Heidelberg (2006). http://dx.doi.org/10.1007/11687238_11
Pitoura, T., Triantafillou, P.: Load distribution fairness in p2p data management systems. Data Engineering, International Conference on 0, 396–405 (2007). http: //doi.ieeecomputersociety.org/10.1109/ICDE.2007.367885
Pujol, J.M., Erramilli, V., Siganos, G., Yang, X., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could: scaling online social networks. In: Proceedings of the ACM SIGCOMM 2010 Conference, pp. 375–386. ACM, New York, NY, USA (2010)
Reiher, P.L., Heidemann, J.S., Ratner, D., Skinner, G., Popek, G.J.: Resolving file conflicts in the ficus file system. In: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, USTC ’94, 1, 183–195, Boston, Massachusetts, USENIX Association, USA (1994). http://dl.acm.org/citation.cfm?id=1267257.1267269
Renesse, R., Dumitriu, D., Gough, V., Thomas, C.: Efficient reconciliation and flow control for anti-entropy protocols. In: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, LADIS ’08, pp. 6:1–6:7. ACM, Yorktown Heights, New York, USA (2008). http://doi.acm.org/10.1145/1529974.1529983
Rowstron, A., Druschel, P.: Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. SIGOPS Oper. Syst. Rev. 35, 188–201 (2001)
Satyanarayanan, M., Kistler, J.J., Kumar, P., Okasaki, M.E., Siegel, E.H., Steere, D.C.: Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 447–459 (1990)
Terry, D.B., Theimer, M.M., Petersen, K., Demers, A.J., Spreitzer, M.J., Hauser, C.H.: Managing update conflicts in bayou, a weakly connected replicated storage system. SIGOPS Oper. Syst. Rev. 29, 172–182 (1995)
Tran, D.A., Nguyen, K., Pham, C.: S-CLONE: Socially-aware data replication for social networks. Computer Networks 56(7), 2001–2013. Elsevier North-Holland, Inc., USA (2012). http://dx.doi.org/10.1016/j.comnet.2012.02.010
Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN ’09, pp. 37–42. ACM, Barcelona, Spain, USA (2009). http://doi.acm.org/10.1145/1592665.1592675
Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Giannakoglou, K., et al. (eds.) Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), pp. 95–100. International Center for Numerical Methods in Engineering (CIMNE) (2002)
Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms—a comparative case study. In: Proceedings of the 5th International Conference on Parallel Problem Solving from Nature, PPSN V, pp. 292–304. Springer-Verlag, London, UK (1998). http://dl.acm.org/citation.cfm?id=645824.668610
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Duc A. Tran
About this chapter
Cite this chapter
Tran, D.A. (2012). Social Locality in Data Storage. In: Data Storage for Social Networks. SpringerBriefs in Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4636-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4636-1_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4635-4
Online ISBN: 978-1-4614-4636-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)