Abstract
People’s computing lives are moving into the cloud, making understanding cloud availability increasingly critical. Prior studies of Internet outages have used ICMP-based pings and traceroutes. While these studies can detect network availability, we show that they can be inaccurate at estimating cloud availability. Without care, ICMP probes can underestimate availability because ICMP is not as robust as application-level measurements such as HTTP. They can overestimate availability if they measure reachability of the cloud’s edge, missing failures in the cloud’s back-end. We develop methodologies sensitive to five “nines” of reliability, and then we compare ICMP and end-to-end measurements for both cloud VM and storage services. We show case studies where one fails and the other succeeds, and our results highlight the importance of application-level retries to reach high precision. When possible, we recommend end-to-end measurement with application-level protocols to evaluate the availability of cloud services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Outages mailing list. Mailing List, http://www.outages.org
Abu-Libdeh, H., Princehouse, L., Weatherspoon, H.: RACS: A case for cloud storage diversity. In: SoCC (2010)
Amazon. AWS Service Health Dashboard, http://status.aws.amazon.com/
Choffnes, D.R., Bustamante, F.E., Ge, Z.: Crowdsourcing service-level network event monitoring. In: SIGCOMM (2010)
Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., Bowman, M.: PlanetLab: An overlay testbed for broad-coverage services. In: SIGCOMM CCR (2003)
Cunha, I., Teixeira, R., Feamster, N., Diot, C.: Measurement methods for fast and accurate blackhole identification with binary tomography. In: IMC (2009)
Cunha, I., Teixeira, R., Veitch, D., Diot, C.: Predicting and tracking internet path changes. In: SIGCOMM (2011)
Dhamdhere, A., Teixeira, R., Dovrolis, C., Diot, C.: Netdiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data. In: CoNEXT (2007)
Flach, T., Dukkipati, N., Terzis, A., Raghavan, B., Cardwell, N., Cheng, Y., Jain, A., Hao, S., Katz-Bassett, E., Govindan, R.: Reducing web latency: the virtue of gentle aggression. In: SIGCOMM (2013)
Gummadi, K.P., Madhyastha, H.V., Gribble, S.D., Levy, H.M., Wetherall, D.: Improving the reliability of Internet paths with one-hop source routing. In: OSDI (2004)
Hajjat, M., Sun, X., Sung, Y.-W.E., Maltz, D., Rao, S., Sripanidkulchai, K., Tawarmalani, M.: Cloudward bound: planning for beneficial migration of enterprise applications to the cloud. In: SIGCOMM (2010)
Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., Bannister, J.: Census and survey of the visible Internet. In: IMC (2008)
Javed, U., Cunha, I., Choffnes, D.R., Katz-Bassett, E., Krishnamurthy, A., Anderson, T.: PoiRoot: Investigating the root cause of interdomain path changes. In: SIGCOMM (2013)
Katz-Bassett, E., Madhyastha, H.V., John, J.P., Krishnamurthy, A., Wetherall, D., Anderson, T.: Studying black holes in the Internet with Hubble. In: NSDI (2008)
Katz-Bassett, E., Scott, C., Choffnes, D.R., Cunha, I., Valancius, V., Feamster, N., Madhyastha, H.V., Anderson, T., Krishnamurthy, A.: LIFEGUARD: Practical repair of persistent route failures. In: SIGCOMM (2012)
Li, A., Yang, X., Kandula, S., Zhang, M.: Cloudcmp: comparing public cloud providers. In: IMC (2010)
Lohr, S.: Amazon’s trouble raises cloud computing doubts (April 2011), http://www.nytimes.com/2011/04/23/technology/23cloud.html
Luckie, M., Hyun, Y., Huffaker, B.: Traceroute probe method and forward IP path inference. In: IMC (2008)
Madhyastha, H.V., Isdal, T., Piatek, M., Dixon, C., Anderson, T., Krishnamurthy, A., Venkataramani, A.: iPlane: An information plane for distributed services. In: OSDI (2006)
Motoyama, M., Meeder, B., Levchenko, K., Voelker, G.M., Savage, S.: Measuring online service availability using Twitter. In: WOSN (2010)
Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science grids: a viable solution? In: DADC (2008)
Paxson, V.: End-to-end internet packet dynamics. In: SIGCOMM (1997)
Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. In: SIGCOMM (2013)
Spring, N., Peterson, L., Bavier, A., Pai, V.: Using PlanetLab for network research: Myths, realities, and best practices. SIGOPS Oper. Syst. Rev. (2006)
Wood, T., Cecchet, E., Ramakrishnan, K.K., Shenoy, P., van der Merwe, J., Venkataramani, A.: Disaster recovery as a cloud service: economic benefits & deployment challenges. In: HotCloud (2010)
Wu, Z., Butkiewicz, M., Perkins, D., Katz-Bassett, E., Madhyastha, H.V.: Spanstore: Cost-effective geo-replicated storage spanning multiple cloud services. In: SOSP 2013 (2013)
Zhang, M., Zhang, C., Pai, V., Peterson, L., Wang, R.: PlanetSeer: Internet path failure monitoring and characterization in wide-area services. In: OSDI (2004)
Zhang, Z., Zhang, Y., Hu, Y.C., Mao, Z.M., Bush, R.: iSPY: Detecting IP prefix hijacking on my own. In: SIGCOMM (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, Z. et al. (2014). The Need for End-to-End Evaluation of Cloud Availability. In: Faloutsos, M., Kuzmanovic, A. (eds) Passive and Active Measurement. PAM 2014. Lecture Notes in Computer Science, vol 8362. Springer, Cham. https://doi.org/10.1007/978-3-319-04918-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-04918-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04917-5
Online ISBN: 978-3-319-04918-2
eBook Packages: Computer ScienceComputer Science (R0)