Advertisement

Journal of Network and Systems Management

, Volume 28, Issue 1, pp 160–192 | Cite as

Building Autonomic Elements from Video-Streaming Servers

  • Carlos CunhaEmail author
Article
  • 79 Downloads

Abstract

HTTP Streaming is nowadays the main approach for delivering video-streaming on the Internet. As a consequence of that, the widely deployed HTTP infrastructures face new challenges posed by the sensitivity of video-streaming users to service quality degradation and the specificities of video-streaming workloads. Performance issues represent one main class of problems in the server infrastructure that can result into a significant deterioration of the end-users’ quality of experience (QoE), proportional to the upfront time spent by them watching the videos. This paper addresses the development of autonomic HTTP Streaming servers organized into Autonomic Elements (AEs), the building blocks of Autonomic Computing (AC) systems. AEs are structured using container-based virtualization and are provided with monitoring, failure prediction, failure diagnosis and repair features. These features are incorporated into SHStream, a self-healing framework developed by us. SHStream relies on online learning algorithms to build and evaluate classification models dynamically for prediction and diagnosis of performance anomalies. The results of our experimental analysis have shown that: (1) failure prediction can be performed with approximately \(98\%\) of recall and \(99\%\) of precision; (2) the diagnosis activity can localize and identify the resource responsible for performance failures, without misclassifications; (3) the classifiers’ performance stabilizes using a small number of learning instances; and (4) container-based virtualization technologies enable recovery times shorter than 1 s through rebooting and shorter than 3 s using server migration techniques.

Keywords

Autonomic Computing Multimedia Self-awareness Self-recovery Machine learning Online learning System modelling Self-healing 

Notes

Acknowledgements

This research was supported by FCT-Portugal under grant SFRH/BD/35784 and Center of Studies in Education, Technologies and Health of the Polytechnic Institute of Viseu.

References

  1. 1.
    Cisco Systems: Cisco visual networking index: forecast and methodology, 2014 to 2019. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html (2015). Accessed 20 May 2018
  2. 2.
    Inc ULC SANDVINE: Global internet phenomena report (2013)Google Scholar
  3. 3.
    Real time streaming protocol. http://tools.ietf.org/html/rfc2326 (1998). Accessed 20 May 2018
  4. 4.
    Rtp: A transport protocol for real-time applications. https://tools.ietf.org/html/rfc3550 (2003). Accessed 20 May 2018
  5. 5.
    Saxena, M., Sharan, U., Fahmy, S.: Analyzing video services in web 2.0: a global perspective. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’08, pp. 39–44. ACM, New York (2008)Google Scholar
  6. 6.
    Pariag, D., Brecht, T., Harji, A., Buhr, P., Shukla, A., Cheriton, D.R.: Comparing the performance of web server architectures. In: Proceedings of the 2Nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07, pp. 231–243. ACM, New York (2007)Google Scholar
  7. 7.
    Brecht, T., Pariag, D., Gammo, L.: accept () able strategies for improving web server performance. In USENIX annual technical conference, general track, pp. 227–240 (2004)Google Scholar
  8. 8.
    Gill, P., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, IMC ’07, pp. 15–28. ACM, New York (2007)Google Scholar
  9. 9.
    Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004)CrossRefGoogle Scholar
  10. 10.
    Chandra, S., Chen, P.M.: How fail-stop are faulty programs? In: Twenty-eighth annual international symposium on fault-tolerant computing, 1998. Digest of papers, pp. 240–249 (1998)Google Scholar
  11. 11.
    Arpaci-Dusseau, R.H., Arpaci-Dusseau, A.C.: Fail-stutter fault tolerance. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 33–38 (2001)Google Scholar
  12. 12.
    Dobrian, F., Sekar, V., Awan, A., Stoica, I., Joseph, D., Ganjam, A., Zhan, J., Zhang, H.: Understanding the impact of video quality on user engagement. In: Proceedings of the ACM SIGCOMM conference, SIGCOMM ’11, pp. 362–373. ACM, New York (2011)Google Scholar
  13. 13.
    Avižienis, A., Laprie, J.C., Randell, B.: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the information society. IFIP international federation for information processing, vol. 156, pp. 91–120. Springer, New York (2004)Google Scholar
  14. 14.
    Cherkasova, L., Ozonat, K., Mi, Ningfang, Symons, J., Smirni, E.: Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In: IEEE international conference on dependable systems and networks (FTCS and DCC), 2008. DSN 2008, pp. 452–461 (2008)Google Scholar
  15. 15.
    Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks, 2002. DSN 2002, pp. 595–604 (2002)Google Scholar
  16. 16.
    Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering dynamic dependencies in enterprise environments for problem determination. In: Brunner, M., Keller, A. (eds.) Self-managing distributed systems. lecture notes in computer science, vol. 2867, pp. 221–233. Springer, Berlin (2003)Google Scholar
  17. 17.
    Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: OSDI’04: symposium on operating systems design and implementation, pp. 16–16. USENIX Association, Berkeley (2004)Google Scholar
  18. 18.
    Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Trans. Reliab. 55(3), 411–420 (2006)CrossRefGoogle Scholar
  19. 19.
    Lei, L., Vaidyanathan, K., Trivedi, K.: An approach for estimation of software aging in a web server. In: International symposium on empirical software engineering, pp. 91–100 (2002)Google Scholar
  20. 20.
    Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D.: Software rejuvenation: analysis, module and applications. In: Twenty-fifth international symposium on fault-tolerant computing, 1995. FTCS-25. Digest of papers, pp. 381–390 (1995)Google Scholar
  21. 21.
    Tan, Y., Nguyen, H., Shen, Z., Gu, X., Venkatramani, C., Rajan, D.: Prepare: predictive performance anomaly prevention for virtualized cloud systems. In: 2012 IEEE 32nd international conference on distributed computing systems (ICDCS), pp. 285–294 (2012)Google Scholar
  22. 22.
    Xiaohui, G., Wang, H.: Online anomaly prediction for robust cluster systems. In: International conference on data engineering. 1000–1011 (2009)Google Scholar
  23. 23.
    Ganek, A.G., Corbi, T.A.: The dawning of the autonomic computing era. IBM Syst. J. 42(1), 5–18 (2003)CrossRefGoogle Scholar
  24. 24.
    Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, EuroSys ’07, pp. 275–287. ACM, New York (2007)Google Scholar
  25. 25.
    Liang, Y., Zhang, Y., Jette, M., Sivasubramaniam, A., Sahoo, R.: Bluegene/l failure analysis and prediction models. In: International conference on dependable systems and networks, 2006. DSN 2006, pp. 425–434 (2006)Google Scholar
  26. 26.
    Lou, Jungang, Jiang, Yunliang, Shen, Qing, Shen, Zhangguo, Wang, Zhen, Wang, Ruiqin: Software reliability prediction via relevance vector regression. Neurocomputing 186, 66–73 (2016)CrossRefGoogle Scholar
  27. 27.
    Pham, T.-T., Défago, X., Huynh, Q.-T.: Reliability prediction for component-based software systems: dealing with concurrent and propagating errors. Science of computer programming, 97:426–457 (2015). Special issue: selected papers from the 12th international conference on quality software (QSIC 2012)Google Scholar
  28. 28.
    Tan, Y., Gu, X., Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, PODC ’10, pp. 173–182. ACM, New York (2010)Google Scholar
  29. 29.
    Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp. 426–435. ACM, New York (2003)Google Scholar
  30. 30.
    Hoffmann, G.A., Trivedi, K.S., Malek, M.: A best practice guide to resource forecasting for computing systems. IEEE Trans. Reliab. 56(4), 615–628 (2007)CrossRefGoogle Scholar
  31. 31.
    Ibidunmoye, Olumuyiwa, Rezaie, Ali-Reza, Elmroth, Erik: Adaptive anomaly detection in performance metric streams. IEEE Trans. Netw. Serv. Manag. 15(1), 217–231 (2018)CrossRefGoogle Scholar
  32. 32.
    Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010)CrossRefGoogle Scholar
  33. 33.
    Kelly, T.: detecting performance anomalies in global applications. In: Proceedings of the 2nd conference on Real, large distributed systems, vol. 2, WORLDS’05, pp. 42–47. USENIX Association, Berkeley (2005)Google Scholar
  34. 34.
    Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In: Proceedings of the, IEEE/IFIP international symposium on integrated network management, pp. 377–390 (2001)Google Scholar
  35. 35.
    Jayathilaka, H., Krintz, C., Wolski, R.: Performance monitoring and root cause analysis for cloud-hosted web applications. In: Proceedings of the 26th international conference on world wide web, WWW ’17, pp. 469–478, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)Google Scholar
  36. 36.
    Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupman, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques. Technical report, Berkeley, CA, USA (2002)Google Scholar
  37. 37.
    Candea, G., Fox, A.: Designing for high availability and measurability. In: Proceedings of the 1st workshop on evaluating and architecting system dependability (2001)Google Scholar
  38. 38.
    Candea, G., Fox, A.: Recursive restartability: turning the reboot sledgehammer into a scalpel. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 125–130 (2001)Google Scholar
  39. 39.
    Candea, G., Fox, A.: Crash-only software. In: Proceedings of the 9th conference on hot topics in operating systems, vol. 9, HOTOS’03, pp. 12–12, USENIX Association, Berkeley (2003)Google Scholar
  40. 40.
    Grottke, M., Kim, D.S., Mansharamani, R., Nambiar, M., Natella, R., Trivedi, K.S.: Recovery from software failures caused by mandelbugs. IEEE Trans. Reliab. 65(1), 70–87 (2016)CrossRefGoogle Scholar
  41. 41.
    Sultan, F., Srinivasan, K., Iyer, D., Iftode, L.: Migratory tcp: connection migration for service continuity in the internet. In: Proceedings of the 22nd international conference on distributed computing systems, 2002, pp. 469−470 (2002)Google Scholar
  42. 42.
    Zhang, R., Abdelzaher, T.F., Stankovic, J.A.: Efficient tcp connection failover in web server clusters. In: INFOCOM 2004. twenty-third annual joint conference of the IEEE computer and communications societies, vol. 2, pp. 1219–1228 (2004)Google Scholar
  43. 43.
    Singh, Kundan, Schulzrinne, Henning: Failover, load sharing and server architecture in sip telephony. Comput. Commun. 30(5), 927–942 (2007)CrossRefGoogle Scholar
  44. 44.
    Dobre, C., Pop, F., Cristea, V.: A virtualization-based approach to dependable service computing. Scalable Comput. Pract. Exp. 12(3), 337–350 (2011)Google Scholar
  45. 45.
    Tamura, Y., Sato, K., Kihara, S., Moriai, S.: Kemari: virtual machine synchronization for fault tolerance. In: Proceedings of the USENIX annual technical conference (Poster Session) (2008)Google Scholar
  46. 46.
    Bressoud, Thomas C., Schneider, Fred B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)CrossRefGoogle Scholar
  47. 47.
    Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation, NSDI’08, pp. 161–174. USENIX Association, Berkeley (2008)Google Scholar
  48. 48.
    Cunha, C.A., Moura e Silva, L.: Shstream: Self-healing framework for http video-streaming. In: 2013 13th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp. 514–521 (2013)Google Scholar
  49. 49.
    Stockhammer, T.: Dynamic adaptive streaming over http: standards and design principles. In: Proceedings of the second annual ACM conference on Multimedia systems, MMSys ’11, pp. 133–144. ACM, New York (2011)Google Scholar
  50. 50.
    Sodagar, I.: The mpeg-dash standard for multimedia streaming over the internet. IEEE Multimedia 18(4), 62–67 (2011)CrossRefGoogle Scholar
  51. 51.
    Feamster, N., Balakrishnan, H.: Packet loss recovery for streaming video. In: 12th international packet video workshop. Pittsburgh (2002)Google Scholar
  52. 52.
    Puri, R., Ramchandran, K.: Multiple description source coding using forward error correction codes. In: Conference record of the thirty-third asilomar conference on signals, systems, and computers, 1999, vol. 1, pp. 342–346 (1999)Google Scholar
  53. 53.
    Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G., et al.: Performance evaluation of virtualization technologies for server consolidation. HP laboratories technical report (2007)Google Scholar
  55. 55.
    Openvz. http://wiki.openvz.org/main_page. Accessed 20 May 2018
  56. 56.
    Barham, Paul, Dragovic, Boris, Fraser, Keir, Hand, Steven, Harris, Tim, Ho, Alex, Neugebauer, Rolf, Pratt, Ian, Warfield, Andrew: Xen and the art of virtualization. SIGOPS Oper. Syst. Rev. 37(5), 164–177 (2003)CrossRefGoogle Scholar
  57. 57.
    Hyperic system information gatherer (sigar). http://sourceforge.net/projects/sigar/files/. Accessed 20 May 2018
  58. 58.
    Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)Google Scholar
  59. 59.
    Gama, J, Medas, P, Rocha, R: Forest trees for on-line data. In: Proceedings of the 2004 ACM symposium on applied computing, SAC ’04, pp. 632–636. ACM, New York (2004)Google Scholar
  60. 60.
    Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Cambridge (2011)Google Scholar
  61. 61.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pp. 71–80. ACM, New York, (2000)Google Scholar
  62. 62.
    Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)CrossRefGoogle Scholar
  63. 63.
    Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8(3–4), 385–404 (1996)CrossRefGoogle Scholar
  64. 64.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: 30th Annual symposium on foundations of computer science, 1989, pp. 256–261 (1989)Google Scholar
  65. 65.
    Oza, Nikunj, C., Russell, S.: Online bagging and boosting. In: In artificial intelligence and statistics, pp. 105–112. Cambridge, Morgan Kaufmann (2001)Google Scholar
  66. 66.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) Computational learning theory. lecture notes in computer science, vol. 904, pp. 23–37. Springer, Berlin (1995)Google Scholar
  67. 67.
    Breiman, Leo: Bagging predictors. Mach. Learn. 24, 123–140 (1996)zbMATHGoogle Scholar
  68. 68.
    Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M., Thornton, J. (eds.) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol. 4830, pp. 90–99. Springer, Berlin (2007)CrossRefGoogle Scholar
  69. 69.
    Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97, pp. 161–169, Morgan Kaufmann Publishers Inc, San Francisco (1997)Google Scholar
  70. 70.
    Kuncheva, Ludmila I.: Classifier ensembles for changing environments. In: Fabio R., Josef K., Terry W., (Eds.). Multiple classifier systems, volume 3077 of lecture notes in computer science, pp 1–15. Springer, Berlin (2004)Google Scholar
  71. 71.
    Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM international conference on data mining, pp. 443–448 (2007)Google Scholar
  72. 72.
    rsync. http://rsync.samba.org/. Accessed 20 May 2018
  73. 73.
    Mosberger, David, Jin, Tai: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)CrossRefGoogle Scholar
  74. 74.
    García, Roberto, Pañeda, Xabiel G., García, Victor, Melendi, David, Vilas, Manuel: Statistical characterization of a real video on demand service: user behaviour and streaming-media workload analysis. Simul. Model. Pract. Theory 15(6), 672–689 (2007)CrossRefGoogle Scholar
  75. 75.
    Sripanidkulchai, K., Maggs, B., Zhang, H.: An analysis of live streaming workloads on the internet. In: IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 41–54. ACM, New York (2004)Google Scholar
  76. 76.
    Finamore, A., Mellia, M., Munafò, M.M., Torres, R., Rao, S.G.: Youtube everywhere: impact of device and infrastructure synergies on user experience. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC ’11, pp. 345–360. ACM, New York (2011)Google Scholar
  77. 77.
    Kang, X., Zhang, H., Jiang, G., Chen, H., Meng, X., Yoshihira, K.: Measurement, modeling, and analysis of internet video sharing site workload: A case study. In: IEEE international conference on web services, 2008. ICWS’08, pp. 278–285. IEEE, New York (2008)Google Scholar
  78. 78.
    Mori, T., Kawahara, R., Hasegawa, H., Shimogawa, S.: Characterizing traffic flows originating from large-scale video sharing services. In: Ricciato, F., Mellia, M., Biersack, E. (eds.) Traffic monitoring and analysis. Lecture notes in computer science, pp. 17–31. Springer, Berlin (2010)CrossRefGoogle Scholar
  79. 79.
    Adhikari, V.K., Jain, S., Chen, Y., Zhang, Z.-L.: Vivisecting youtube: an active measurement study. In: Proceedings IEEE INFOCOM, 2012, pp. 2521–2525 (2012)Google Scholar
  80. 80.
    Summers, J., Brecht, T., Eager, D., Wong, B.: Methodologies for generating http streaming video workloads to evaluate web server performance. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR ’12, pp. 2:1–2:12. ACM, New York (2012)Google Scholar
  81. 81.
    Standard performance evaluation corporation. Specweb2009 benchmark. http://www.spec.org/web2009 (2010). Accessed 20 May 2018
  82. 82.
    Stress tool. http://weather.ou.edu/~apw/projects/stress/. Accessed 20 May 2018
  83. 83.
    Hemminger, S.: Network emulation with netem. In Linux Conf Au (2005)Google Scholar
  84. 84.
    Jiang, W., Schulzrinne, H.: Modeling of packet loss and delay and their effect on real-time multimedia service quality. In: Proceedings of NOSSDAV ’2000 (2000)Google Scholar
  85. 85.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of InformaticsPolytechnics Institute of ViseuViseuPortugal
  2. 2.Department of InformaticsUniversity of CoimbraCoimbraPortugal

Personalised recommendations