Advertisement

Semantic-Aware Data Cube for Cloud Networks

  • Yu HuaEmail author
  • Xue Liu
Chapter

Abstract

Today’s cloud data centers contain more than millions of servers and offer high bandwidth. A fundamental problem is how to significantly improve the large-scale system’s scalability to interconnect a large number of servers and meanwhile support various online services in cloud computing. One way is to deal with the challenge of potential mismatching between the network architecture and the data placement. To address this challenge, we present ANTELOPE, a scalable distributed data-centric scheme in cloud data centers, in which we systematically take into account both the property of network architecture and the optimization of data placement. The basic idea behind ANTELOPE is to leverage precomputation-based data cube to support online cloud services. Since the construction of data cube suffers from the high costs of full materialization, we use a semantic-aware partial materialization solution to significantly reduce the operation and space overheads. Extensive experiments on the real system implementations demonstrate the efficacy and efficiency of our proposed scheme (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

References

  1. 1.
    Y. Hua, X. Liu, H. Jiang, ANTELOPE: a semantic-aware data cube scheme for cloud data center networks. IEEE Trans. Comput. (TC) 63(9), 2146–2159 (2014)MathSciNetCrossRefGoogle Scholar
  2. 2.
    IDC iView, The Digital Universe Decade - Are You Ready?, May 2010Google Scholar
  3. 3.
    A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in Proceedings of the SIGMOD (2010), pp. 1013–1020Google Scholar
  4. 4.
    Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)CrossRefGoogle Scholar
  5. 5.
    J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in Proceedings of the ACM SIGOPS/EuroSys (2007), pp. 59–72Google Scholar
  8. 8.
    C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD (2008), pp. 1099–1110Google Scholar
  9. 9.
    A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, R. Murthy, Hive-a petabyte scale data warehouse using hadoop, in Proceedings of the ICDE (2010)Google Scholar
  10. 10.
    G. Bell, T. Hey, A. Szalay, Beyond the data deluge. Science 323(5919), 1297–1298 (2009)CrossRefGoogle Scholar
  11. 11.
    J. Dai, J. Huang, S. Huang, B. Huang, Y. Liu, Hitune: dataflow-based performance analysis for big data cloud, in Proceedings of the USENIX Annual Technical Conference (2011)Google Scholar
  12. 12.
    R. Katz, Tech titans building boom. IEEE Spectr. 46(2), 40–54 (2009)CrossRefGoogle Scholar
  13. 13.
    A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, Cutting the electric bill for internet-scale systems. ACM SIGCOMM Comput. Commun. Rev. 39(4), 123–134 (2009)CrossRefGoogle Scholar
  14. 14.
    J. Dean, Evolution and future directions of large-scale storage and computation systems at Google, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)Google Scholar
  15. 15.
    J. Sobel, Building Facebook: performance at massive scale, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)Google Scholar
  16. 16.
    D. Kossmann, How new is the cloud?, in Keynotes in ICDE (2010)Google Scholar
  17. 17.
    C. Lu, G. Alvarez, J. Wilkes, Aqueduct: online data migration with performance guarantees, in Proceedings of the FAST (2002), pp. 219–230Google Scholar
  18. 18.
    C. Pu, A. Leff, Replica control in distributed systems: as asynchronous approach. ACM SIGMOD Rec. 20(2), 377–386 (1991)CrossRefGoogle Scholar
  19. 19.
    N. Mysore et al., PortLand: a scalable fault-tolerant layer 2 data center network fabric, in Proceedings of the ACM SIGCOMM (2009)Google Scholar
  20. 20.
    D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, FiConn: using backup port for server interconnection in data centers, in Proceedings of the IEEE INFOCOM (2009)Google Scholar
  21. 21.
    A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, VL2: a scalable and flexible data center network, in Proceedings of the ACM SIGCOMM (2009)Google Scholar
  22. 22.
    C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, DCell: a scalable and fault-tolerant network structure for data centers, in Proceedings of the ACM SIGCOMM (2008)Google Scholar
  23. 23.
    C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, S. Lu, BCube: a high performance, server-centric network architecture for modular data centers, in Proceedings of the ACM SIGCOMM (2009)Google Scholar
  24. 24.
    J. Mudigonda, P. Yalagandula, M. Al-Fares, J. Mogul, Spain: COTS data-center ethernet for multipathing over arbitrary topologies, in Proceedings of the USENIX NSDI (2010)Google Scholar
  25. 25.
    M. Al-Fares, A. Loukissas, A. Vahdat, A scalable, commodity data center network architecture, in Proceedings of the ACM SIGCOMM 2008 (2008)Google Scholar
  26. 26.
    A. Shieh, S. Kandula, A. Greenberg, C. Kim, B. Saha, Sharing the data center network, in Proceedings of the USENIX NSDI (2011)Google Scholar
  27. 27.
    K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, W. Wu, Generic and automatic address configuration for data center networks, in Proceedings of the ACM SIGCOMM (2010)Google Scholar
  28. 28.
    A. Viswanathan, A. Hussain, J. Mirkovic, S. Schwab, J. Wroclawski, A semantic framework for data analysis in networked systems, in Proceedings of the USENIX NSDI (2011)Google Scholar
  29. 29.
    S. Ghemawat, H. Gobioff, S. Leung, The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 43 (2003)CrossRefGoogle Scholar
  30. 30.
    F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, R. Gruber, Bigtable: a distributed storage system for structured data, in Proceedings of the OSDI (2006)Google Scholar
  31. 31.
    J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRefGoogle Scholar
  32. 32.
    J. Hamilton, Internet scale storage, in Keynote in SIGMOD (2011)Google Scholar
  33. 33.
    J. Larus, The cloud will change everything, in Keynote in ASPLOS (2011)Google Scholar
  34. 34.
    R. Weber, H. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the VLDB (1998), pp. 194–205Google Scholar
  35. 35.
    P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the ACM Symposium on Theory of Computing (1998), pp. 604–613Google Scholar
  36. 36.
    A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  37. 37.
    Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the International Conference on Parallel Processing (ICPP) (2008), pp. 644–651Google Scholar
  38. 38.
    Los Alamos National Lab (LANL) File System Data, http://institute.lanl.gov/data/archive-data/
  39. 39.
    E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)Google Scholar
  40. 40.
    S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production Windows servers, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2008)Google Scholar
  41. 41.
    J.L. Hellerstein, Google Cluster Data, http://googleresearch.blogspot.com/2010/01/google-cluster-data.html, Jan 2010
  42. 42.
    B. Babcock, S. Chaudhuri, G. Das, Dynamic sample selection for approximate query processing, in Proceedings of the ACM SIGMOD (2003)Google Scholar
  43. 43.
    R. Missaoui, C. Goutte, A. Choupo, A. Boujenoui, A probabilistic model for data cube compression and query approximation, in Proceedings of the ACM Data Warehousing and OLAP (2007), pp. 33–40Google Scholar
  44. 44.
    J. Shanmugasundaram, U. Fayyad, P. Bradley, Compressed data cubes for OLAP aggregate query approximation on continuous dimensions, in Proceedings of the ACM SIGKDD (1999), pp. 223–232Google Scholar
  45. 45.
    D. Barbara, X. Wu, Loglinear-based quasi cubes. J. Intell. Inf. Syst. 16(3), 255–276 (2001)CrossRefGoogle Scholar
  46. 46.
    T. Wu, D. Xin, J. Han, ARCube: supporting ranking aggregate queries in partially materialized data cubes, in Proceedings of the ACM SIGMOD (2008), pp. 79–92Google Scholar
  47. 47.
    D. Xin, J. Han, H. Cheng, X. Li, Answering top-k queries with multi-dimensional selections: the ranking cube approach, in Proceedings of the VLDB (2006), pp. 463–474Google Scholar
  48. 48.
    M. Riedewald, D. Agrawal, A. El Abbadi, pCube: update-efficient online aggregation with progressive feedback and error bounds, in Proceedings of the SSDBM (2000), pp. 95–108Google Scholar
  49. 49.
    W. Lu, J. Yu, Condensed cube: an effective approach to reducing data cube size, in Proceedings of the ICDE (2002), pp. 155–165Google Scholar
  50. 50.
    Y. Feng, D. Agrawal, A. El Abbadi, A. Metwally, Range cube: efficient cube computation by exploiting data correlation, in Proceedings of the ICDE (2004), pp. 658–669Google Scholar
  51. 51.
    X. Jin, J. Han, L. Cao, J. Luo, B. Ding, C. Lin, Visual cube and on-line analytical processing of images, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (2010), pp. 849–858Google Scholar
  52. 52.
    P. Zhao, X. Li, D. Xin, J. Han, Graph cube: on warehousing and OLAP multidimensional networks, in Proceedings of the SIGMOD (2011), pp. 853–864Google Scholar
  53. 53.
    B. Ding, B. Zhao, C. Lin, J. Han, C. Zhai, Topcells: keyword-based search of top-k aggregated documents in text cube, in Proceedings of the ICDE (2010), pp. 381–384Google Scholar
  54. 54.
    Y. Yu, C. Lin, Y. Sun, C. Chen, J. Han, B. Liao, T. Wu, C. Zhai, D. Zhang, B. Zhao, iNextCube: information network-enhanced text cube, in Proceedings of the VLDB (2009)Google Scholar
  55. 55.
    B. Bi, S. Lee, B. Kao, R. Cheng, CubeLSI: an effective and efficient method for searching resources in social tagging systems, in Proceedings of the IDCE (2011), pp. 27–38Google Scholar
  56. 56.
    M. Liu, E. Rundensteiner, K. Greenfield, C. Gupta, S. Wang, I. Ari, A. Mehta, E-cube: multi-dimensional event sequence processing using concept and pattern hierarchies, in Proceedings of the ICDE (2010), pp. 1097–1100Google Scholar
  57. 57.
    J. Lee, S. Hwang, Z. Nie, J. Wen, Product entitycube: a recommendation and navigation system for product search, in Demonstrations in ICDE (2010)Google Scholar
  58. 58.
    G. Salton, A. Wong, C. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefGoogle Scholar
  59. 59.
    J. Hartigan, M. Wong, Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 100–108, (1979)Google Scholar
  60. 60.
    S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  61. 61.
    M.W. Berry, S. Dumas, G. OBrien, Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)MathSciNetCrossRefGoogle Scholar
  62. 62.
    C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)MathSciNetCrossRefGoogle Scholar
  63. 63.
    G. Golub, C. Van Loan, Matrix Computations (Johns Hopkins University Press, 1996)Google Scholar
  64. 64.
    Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans. Parallel Distrib. Syst. 23(2), 337–344 (2012)CrossRefGoogle Scholar
  65. 65.
    C. Tang, S. Dwarkadas, Z. Xu, On scaling latent semantic indexing for large peer-to-peer systems, in Proceedings of the ACM SIGIR (2004), pp. 112–121Google Scholar
  66. 66.
    S. Lee, S. Chun, D. Kim, J. Lee, C. Chung, Similarity search for multidimensional data sequences, in Proceedings of the ICDE (2000)Google Scholar
  67. 67.
    A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of the ACM SIGMOD (1984), pp. 47–57CrossRefGoogle Scholar
  68. 68.
    G. Salton, A. Wong, C. Yang, A vector space model for information retrieval. J. Am. Soc. Inf. Retr. 613–620, (1975)Google Scholar
  69. 69.
    S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of the OSDI (2006)Google Scholar
  70. 70.
    C. Tang, Z. Xu, S. Dwarkadas, Peer-to-peer information retrieval using self-organizing semantic overlay networks, in Proceedings of the SIGCOMM (2003)Google Scholar
  71. 71.
    Z. Xu, C. Tang, Z. Zhang, Building topology-aware overlays using global soft-state, in Proceedings of the ICDCS (2003)Google Scholar
  72. 72.
    C. Buckley, Implementation of the smart information retrieval system. Technical Report, Cornell University (1985)Google Scholar
  73. 73.
    M.W. Berry, Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)CrossRefGoogle Scholar
  74. 74.
    G.H. Golub, C. Reinsch, Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)MathSciNetCrossRefGoogle Scholar
  75. 75.
    L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)MathSciNetCrossRefGoogle Scholar
  76. 76.
    H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang, MDCube: a high performance network structure for modular data center interconnection, in Proceedings of the CoNEXT (2009), pp. 25–36Google Scholar
  77. 77.
    M. Casado, D. Erickson, I.A. Ganichev, R. Griffith, B. Heller, N. Mckeown, D. Moon, T. Koponen, S. Shenker, K. Zarifis, Ripcord: a modular platform for data center networking. Technical Report No. UCB/EECS-2010-93, EECS Department, University of California, Berkeley (2010)Google Scholar
  78. 78.
    D. Li, M. Xu, H. Zhao, X. Fu, Building mega data center from heterogeneous containers, in Proceedings of the IEEE ICNP (2011)Google Scholar
  79. 79.
    D. Li, H. Cui, Y. Hu, Y. Xia, X. Wang, Scalable data center multicast using multi-class Bloom filter, in Proceedings of the IEEE ICNP (2011)Google Scholar
  80. 80.
    J. Mudigonda, P. Yalagandula, J.C. Mogul, Taming the flying cable monster: A topology design and optimization framework for data-center networks, in Proceedings of the USENIX Annual Technical Conference (2011)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.McGill UniversityMontrealCanada

Personalised recommendations