Skip to main content
Log in

Evolution of Cloud Operating System: From Technology to Ecosystem

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The cloud operating system (cloud OS) is used for managing the cloud resources such that they can be used effectively and efficiently. And also it is the duty of cloud OS to provide convenient interface for users and applications. However, these two goals are often conflicting because convenient abstraction usually needs more computing resources. Thus, the cloud OS has its own characteristics of resource management and task scheduling for supporting various kinds of cloud applications. The evolution of cloud OS is in fact driven by these two often conflicting goals and finding the right tradeoff between them makes each phase of the evolution happen. In this paper, we have investigated the ways of cloud OS evolution from three different aspects: enabling technology evolution, OS architecture evolution and cloud ecosystem evolution. We show that finding the appropriate APIs (application programming interfaces) is critical for the next phase of cloud OS evolution. Convenient interfaces need to be provided without scarifying efficiency when APIs are chosen. We present an API-driven cloud OS practice, showing the great capability of APIs for developing a better cloud OS and helping build and run the cloud ecosystem healthily.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armbrust M, Fox A, Griffith R et al. A view of cloud computing. Communications of the ACM, 2010, 53(4): 50-58.

    Article  Google Scholar 

  2. Tanenbaum A S, Woodhull A S. Operating Systems Design and Implementation (3rd edition). Pearson, 2006.

  3. Auslander M A, Larkin D C, Scherr A L. The evolution of the MVS operating system. IBM Journal of Research and Development, 1981, 25(5): 471-482.

    Article  Google Scholar 

  4. Deitel H M, Deitel P J, Choffnes D. Operating Systems. Pearson/Prentice Hall, 2004.

  5. Bic L F, Shaw A C. Operating Systems Principles. Prentice Hall, 2003.

  6. Silberschatz A, Galvin P B, Gagne G. Operating System Concepts. John Wiley & Sons Ltd., 2008.

  7. Hu T H. A Prehistory of the Cloud. MIT Press, 2016.

  8. Mell P, Grance T. SP800-145. The NIST definition of cloud computing. Communications of the ACM, 2010, 53(6): 50.

    Google Scholar 

  9. Zheng W. An introduction to Tsinghua cloud. Science China Information Sciences, 2010, 53(7): 1481-1486.

    Article  Google Scholar 

  10. Hindman B, Konwinski A, Zaharia M et al. Mesos: A platform for fine-grained resource sharing in the data center. In Pro. USENIX Conference on Networked Systems Design and Implementation, Mar.31-Apr.1, 2013, pp.429-483.

  11. Schwarzkopf M, Konwinski A, Abd-El-Malek M et al. Omega: Flexible, scalable schedulers for large compute clusters. In Proc. ACM European Conference on Computer Systems, Apr. 2013, pp.351-364.

  12. Verma A, Pedrosa L, Korupolu M et al. Large-scale cluster management at Google with Borg. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, pp.18:1-18:17.

  13. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In Proc. the 6th Symposium on Operating Systems Design & Implementation, Dec. 2004, pp.137-150.

  14. Ghemawat S, Gobioff H, Leung S T. The Google file system. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29-43.

    Article  Google Scholar 

  15. Chang F, Dean J, Ghemawat S et al. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 2008, 26(2): 205-218.

    Article  Google Scholar 

  16. Baker J, Bond C, Corbett J et al. Megastore: Providing scalable, highly available storage for interactive services. In Proc. the 5th Biennial Conference on Innovative Data Systems Research, January 2011, pp.223-234.

  17. Corbett J C, Dean J, Epstein M et al. Spanner: Google’s globally distributed database. ACM Transactions on Computer Systems (TOCS), 2013, 31(3): 8:1-8:22.

  18. Yu Y, Isard M, Fetterly D et al. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. the 8th USENIX Symposium on Operating Systems Design and Implementation, Dec. 2008, pp.1-14.

  19. Isard M, Budiu M, Yu Y et al. Dryad: Distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59-72.

    Article  Google Scholar 

  20. Zaharia M, Chowdhury M, Das T et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, Apr. 2012, pp.141-146.

  21. Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010, pp.293-306.

  22. Melnik S, Gubarev A, Long J J et al. Dremel: Interactive analysis of web-scale datasets. Communications of the ACM, 2011, 54(6): 114-123.

    Article  Google Scholar 

  23. Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In Proc. the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010, pp.251-264.

  24. Neumeyer L, Robbins B, Nair A et al. S4: Distributed stream computing platform. In Proc. the 10th IEEE International Conference on Data Mining Workshops, Dec. 2010, pp.170-177.

  25. Viglas S, Naughton J F. Rate-based query optimization for streaming information sources. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2002, pp.37-48.

  26. Shen H, Zhang Y. Improved approximate detection of duplicates for data streams over sliding windows. Journal of Computer Science and Technology, 2008, 23(6): 973-987.

    Article  Google Scholar 

  27. Li Y, Chen F H, Sun X et al. Self-adaptive resource management for large-scale shared clusters. Journal of Computer Science and Technology, 2010, 25(5): 945-957.

    Article  Google Scholar 

  28. Hunt P, Konar M, Junqueira F P et al. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proc. USENIX Annual Technical Conference, Jun. 2010.

  29. Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In Proc. USENIX Annual Technical Conference, Jun. 2014, pp.305-319.

  30. Lamport L. Paxos made simple. ACM SIGACT News, 2001, 32(4): 18-25.

    Google Scholar 

  31. Barham P, Dragovic B, Fraser K et al. Xen and the art of virtualization. ACM SIGOPS Operating Systems Review, 2003, 37(5): 164-177.

    Article  Google Scholar 

  32. Ben-Yehuda M, Day M D, Dubitzky Z et al. The turtles project: Design and implementation of nested virtualization. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp.423-436.

  33. Xiao Z, SongW, Chen Q. Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(6): 1107-1117.

    Article  Google Scholar 

  34. Kivity A, Laor D, Costa G et al. OSv — Optimizing the operating system for virtual machines. In Proc. USENIX Annual Technical Conference, June 2014, pp.61-72.

  35. Ren S, Tan L, Li C et al. Samsara: Efficient deterministic replay in multiprocessor environments with hardware virtualization extensions. In Proc. USENIX Annual Technical Conference, June 2016, pp.551-564.

  36. Chen H, Wang X, Wang Z et al. DMM: A dynamic memory mapping model for virtual machines. Science China Information Sciences, 2010, 53(6): 1097-1108.

    Article  Google Scholar 

  37. Zhao X, Yin J, Chen Z et al. vSpec: Workload-adaptive operating system specialization for virtual machines in cloud computing. Science China Information Sciences, 2016, 59(9): 92-105.

    Google Scholar 

  38. Wang X, Sun Y, Luo Y et al. Dynamic memory paravirtualization transparent to guest OS. Science China Information Sciences, 2010, 53(1): 77-88.

    Article  Google Scholar 

  39. Lu L, Zhang Y, Do T et al. Physical disentanglement in a container-based file system. In Proc. the 11th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2014, pp.81-96.

  40. Arnautov S, Trach B, Gregor F et al. SCONE: Secure Linux containers with Intel SGX. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.689-704.

  41. Banga G, Druschel P, Mogul J C. Resource containers: A new facility for resource management in server systems. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Feb. 1999, pp.45-58.

  42. Pedro G L, Alberto M, Dick E et al. Edge-centric computing: Vision and challenges. ACM SIGCOMM Computer Communication Review, 2015, 45 (5): 37-42.

  43. Shi W, Cao J, Zhang Q et al. Edge computing: Vision and challenges. IEEE Internet of Things Journal, 2016, 3(5): 637-646.

    Article  Google Scholar 

  44. Dragojević A, Narayanan D, Castro M et al. FaRM: Fast remote memory. In Proc. USENIX Symposium on Networked Systems Design and Implementation, Apr. 2014, pp.401-414.

  45. Mitchell C, Geng Y, Li J. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proc. USENIX Annual Technical Conference, June 2013, pp.103-114.

  46. Jose J, Subramoni H, Luo M et al. Memcached design on high performance RDMA capable interconnects. In Proc. International Conference on Parallel Processing, Sept. 2011, pp.743-752.

  47. Greenberg A, Hamilton J R, Jain N et al. VL2: A scalable and flexible data center network. ACM SIGCOMM Computer Communication Review, 2009, 39(6): 51-62.

    Article  Google Scholar 

  48. Paraiso F, Haderer N, Merle P et al. A federated multi-cloud PaaS infrastructure. In Proc. the 5th IEEE International Conference on Cloud Computing, Jun. 2012, pp.392-399.

  49. Eguro K, Venkatesan R. FPGAs for trusted cloud computing. In Proc. the 22nd International Conference on Field Programmable Logic and Applications, Aug. 2012, pp.63-70.

  50. Hutchings B L, Franklin R, Carver D. Assisting network intrusion detection with reconfigurable hardware. In Proc. the 10th IEEE Symposium on Field-Programmable Custom Computing Machines, Apr. 2002, pp.111-120.

  51. Chalamalasetti S R, Lim K, Wright M et al. An FPGA Memcached appliance. In Proc. ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb. 2013, pp.245-254.

  52. Huang M, Wu D, Yu C H et al. Programming and runtime support to blaze FPGA accelerator deployment at datacenter scale. In Proc. ACM Symposium on Cloud Computing, Oct. 2016, pp.456-469.

  53. Wang X M, Thota S. A resource-efficient communication architecture for chip multiprocessors on FPGAs. Journal of Computer Science and Technology, 2011, 26(3): 434-447.

    Article  Google Scholar 

  54. Dong Y, Xue M, Zheng X et al. Boosting GPU virtualization performance with hybrid shadow page tables. In Proc. USENIX Annual Technical Conference, July 2015, pp.517-528.

  55. Zhang K, Chen R, Chen H. NUMA-aware graph-structured analytics. In Proc. the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2005, pp.183-193.

  56. Mao Y, Kohler E, Morris R T. Cache craftiness for fast multicore key-value storage. In Proc. ACM European conference on Computer Systems, Apr. 2012, pp.183-196.

  57. Tu S, Zheng W, Kohler E et al. Speedy transactions in multicore in-memory databases. In Proc. ACM Symposium on Operating Systems Principles, Nov. 2013, pp.18-32.

  58. Zhang G, HornW, Sanchez D. Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In Proc. IEEE/ACM International Symposium on Microarchitecture, Dec. 2015, pp.13-25.

  59. Wang Z, Qian H, Li J et al. Using restricted transactional memory to build a scalable in-memory database. In Proc. the 9th European Conference on Computer Systems, Apr. 2014, Article No. 26.

  60. Russell R M. The CRAY-1 computer system. Communications of the ACM, 1978, 21(1): 63-72.

    Article  Google Scholar 

  61. Barik R, Zhao J, Sarkar V. Efficient selection of vector instructions using dynamic programming. In Proc. IEEE/ACM International Symposium on Microarchitecture, Dec. 2010, pp.201-212.

  62. Klimovitski A. Using SSE and SSE2: Misconceptions and reality. Intel Developer Update Magazine, Mar. 2001. http://saluc.engr.uconn.edu/refs/process/intel/sse sse2.pdf, Feb.2017.

  63. Intel I. Intel® SSE4 Programming Reference, D91561- 103, 2007. http://software.intel.com/sites/default/files/m/8/6/8/D9156103.pdf, Feb. 2017.

  64. Tian C, Zhou H, He Y et al. A dynamic Mapreduce scheduler for heterogeneous workloads. In Proc. International Conference on Grid and Cooperative Computing, Aug. 2009, pp.218-224.

  65. Sun N, Liu W, Liu H et al. Dawning-1000 PROOS distributed operating system. Journal of Computer Science and Technology, 1997, 12(2): 160-166

    Article  Google Scholar 

  66. Zhang L, Litton J, Cangialosi F et al. Picocenter: Supporting long-lived, mostly-idle applications in cloud environments. In Proc. the 11th European Conference on Computer Systems, Apr. 2016, pp.37:1-37:16.

  67. Canali C, Lancellotti R. Improving scalability of cloud monitoring through PCA-based clustering of virtual machines. Journal of Computer Science and Technology, 2014, 29(1): 38-52.

    Article  Google Scholar 

  68. Le K, Bianchini R, Zhang J et al. Reducing electricity cost through virtual machine placement in high performance computing clouds. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2011.

  69. Chun B G, Ihm S, Maniatis P et al. CloneCloud: Elastic execution between mobile device and cloud. In Proc. the 6th European Conference on Computer Systems, Apr. 2011, pp.301-314.

  70. Jin H, Deng L, Wu S et al. Live virtual machine migration with adaptive, memory compression. In Proc. IEEE International Conference on Cluster Computing and Workshops, Aug. 2009.

  71. Ye K, Jiang X, Huang D et al. Live migration of multiple virtual machines with resource reservation in cloud computing environments. In Proc. IEEE International Conference on Cloud Computing, Jul. 2011, pp.267-274.

  72. Malewicz G, Austern M H, Bik A J et al. Pregel: A system for large-scale graph processing. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146.

  73. Kyrola A, Blelloch G, Guestrin C. GraphChi: Large-scale graph computation on just a PC. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.31-46.

  74. Girod L, Mei Y, Newton R et al. XStream: A signaloriented data stream management system. In Proc. the 24th IEEE International Conference on Data Engineering, Apr. 2008, pp.1180-1189.

  75. Low Y, Bickson D, Gonzalez J et al. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.

    Article  Google Scholar 

  76. Chen R, Shi J, Chen Y et al. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. European Conference on Computer Systems, Apr. 2015.

  77. Zhang M, Wu Y, Chen K et al. Exploring the hidden dimension in graph processing. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.285-300.

  78. Zhu X, Chen W, Zheng W et al. Gemini: A computationcentric distributed graph processing system. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.301-316.

  79. Gonzalez J E, Xin R S, Dave A et al. GraphX: Graph processing in a distributed dataflow framework. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Oct. 2014, pp.599-613.

  80. Abadi M, Barham P, Chen J et al. TensorFlow: A system for large-scale machine learning. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.265-283.

  81. Nesbit K J, Moreto M, Cazorla F J et al. Multicore resource management. IEEE Micro, 2008, 28(3): 6-16.

    Article  Google Scholar 

  82. Bolte M, Sievers M, Birkenheuer G et al. Non-intrusive virtualization management using libvirt. In Proc. European Design and Automation Association Conference on Design, Automation and Test in Europe, Mar. 2010, pp.574-579.

  83. Tanenbaum A S, Kaashoek M F, van Renesse R et al. The Amoeba distributed operating system — A status report. Computer Communications, 1991, 14(6): 324-335

    Article  Google Scholar 

  84. Vavilapalli V K, Murthy A C, Douglas C et al. Apache Hadoop YARN: Yet another resource negotiator. In Proc. ACM Symposium on Cloud Computing, Oct. 2013, pp.5:1-5:16.

  85. Burns B, Grant B, Oppenheimer D et al. Borg, Omega, and Kubernetes. ACM Queue, 2016, 14(1): 70-93

    Article  Google Scholar 

  86. Zhang Z, Li C, Tao Y et al. Fuxi: A fault-tolerant resource management and job scheduling system at Internet scale. Proceedings of the VLDB Endowment, 2014, 7(13): 1393-1404

    Article  Google Scholar 

  87. Harter T, Salmon B, Liu R et al. Slacker: Fast distribution with lazy docker containers. In Proc. USENIX Conference on File and Storage Technologies, February 2016.

  88. Singh B, Srinivasan V. Containers: Challenges with the memory resource controller and its performance. In Proc. Ottawa Linux Symposium, June 2007.

  89. Nikolaev R, Back G. VirtuOS: An operating system with kernel virtualization. In Proc. ACM Symposium on Operating Systems Principles, Nov. 2013, pp.116-132.

  90. Soltesz S, Pötzl H, Fiuczynski M E et al. Containerbased operating system virtualization: A scalable, highperformance alternative to hypervisors. ACM SIGOPS Operating Systems Review, 2007, 41(3): 275-287.

    Article  Google Scholar 

  91. Steinberg U, Kauer B. NOVA: A microhypervisor-based secure virtualization architecture. In Proc. European Conference on Computer Systems, Apr. 2010, pp.209-222.

  92. Boyd-Wickizer S, Clements A T, Mao Y et al. An analysis of Linux scalability to many cores. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Oct. 2010, pp.86-93.

  93. Colmenares J A, Bird S, Eads G et al. Tessellation operating system: Building a real-time, responsive, high-throughput client OS for many-core architectures. In Proc. IEEE Hot Chips Symposium, Aug. 2011.

  94. Baumann A, Peter S, Sch¨upbach A et al. Your computer is already a distributed system. Why isn’t your OS? In Proc. the 12th Conference on Hot Topics in Operating Systems, May 2009.

  95. Wentzlaff D, Agarwal A. Factored operating systems (FOS): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 2009, 43(2): 76-85.

    Article  Google Scholar 

  96. Grandl R, Chowdhury M, Akella A et al. Altruistic scheduling in multi-resource clusters. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.65-80.

  97. Grandl R, Kandula S, Rao S et al. GRAPHENE: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.81-98.

  98. Gog I, Schwarzkopf M, Gleave A et al. Firmament: Fast, centralized cluster scheduling at scale. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.99-115.

  99. Jyothi S A, Curino C, Menache I et al. Morpheus: Towards automated SLOs for enterprise clusters. In Proc. USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.117-134.

  100. Zhou F F, Ma R H, Li J et al. Optimizations for high performance network virtualization. Journal of Computer Science and Technology, 2016, 31(1): 107-116.

    Article  Google Scholar 

  101. Tang H, Mu S, Huang J et al. Zip: An algorithm based on loser tree for common contacts searching in large graphs. Journal of Computer Science and Technology, 2015, 30(4): 799-809.

    Article  MathSciNet  Google Scholar 

  102. Ma C, Yan D, Wang Y et al. Advanced graph model for tainted variable tracking. Science China Information Sciences, 2013, 56(11): 1-12.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuo-Ning Chen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 91 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, ZN., Chen, K., Jiang, JL. et al. Evolution of Cloud Operating System: From Technology to Ecosystem. J. Comput. Sci. Technol. 32, 224–241 (2017). https://doi.org/10.1007/s11390-017-1717-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-017-1717-z

Keywords

Navigation