Abstract
There is a growing interest in carefully observing the reliability of the Internet’s edge. Outage information can inform our understanding of Internet reliability and planning, and it can help guide operations. Active outage detection methods provide results for more than 3M blocks, and passive methods more than 2M, but both are challenged by sparse blocks where few addresses respond or send traffic. We propose a new Full Block Scanning (FBS) algorithm to improve coverage for active scanning by providing reliable results for sparse blocks by gathering more information before making a decision. FBS identifies sparse blocks and takes additional time before making decisions about their outages, thereby addressing previous concerns about false outages while preserving strict limits on probe rates. We show that FBS can improve coverage by correcting 1.2M blocks that would otherwise be too sparse to correctly report, and potentially adding 1.7M additional blocks. FBS can be applied retroactively to existing datasets to improve prior coverage and accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
IODA: Internet outage detection & analysis. https://ioda.caida.org
Baltra, G., Heidemann, J.: Improving the optics of active outage detection (extended). Technical report ISI-TR-733, May 2019. https://www.isi.edu/%7ejohnh/PAPERS/Baltra19a.html
Dainotti, A., et al.: Lost in space: improving inference of IPv4 address space utilization. IEEE J. Sel. Areas Commun. (JSAC) 34(6), 1862–1876 (2016)
Dainotti, A., et al.: Analysis of country-wide Internet outages caused by censorship. In: Proceedings of the ACM Internet Measurement Conference, Berlin, Germany, pp. 1–18. ACM, November 2011. https://doi.org/10.1145/2068816.2068818
Madory, D.: Iraq downs internet to combat cheating...again! (2017). https://dyn.com/blog/iraq-downs-internet-to-combat-cheating-again/. Accessed 01 Aug 2019
Guillot, A., et al.: Chocolatine: outage detection for internet background radiation. In: Proceedings of the IFIP International Workshop on Traffic Monitoring and Analysis. IFIP, Paris, France, June 2019. https://clarinet.u-strasbg.fr/~pelsser/publications/Guillot-chocolatine-TMA2019.pdf
Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., Bannister, J.: Census and survey of the visible Internet. In: Proceedings of the ACM Internet Measurement Conference, Vouliagmeni, Greece, pp. 169–182. ACM, October 2008. https://doi.org/10.1145/1452520.1452542
Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it75w-20170427
MaxMind: GeoIP Geolocation Products (2017). http://www.maxmind.com/en/city
Padmanabhan, R., Dhamdhere, A., Aben, E., Claffy, K.C., Spring, N.: Reasons dynamic addresses change. In: Proceedings of the ACM Internet Measurement Conference, Santa Monica, CA, USA. ACM, November 2016. https://doi.org/10.1145/2987443.2987461
Padmanabhan, R., Schulman, A., Levin, D., Spring, N.: Residential links under the weather. In: Proceedings of the ACM Special Interest Group on Data Communication, pp. 145–158. ACM (2019)
Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding Internet reliability through adaptive probing. In: Proceedings of the ACM SIGCOMM Conference, Hong Kong, China, pp. 255–266. ACM, August 2013. https://doi.org/10.1145/2486001.2486017
Quan, L., Heidemann, J., Pradkin, Y.: When the Internet sleeps: correlating diurnal networks with external factors. In: Proceedings of the ACM Internet Measurement Conference, Vancouver, BC, Canada, pp. 87–100. ACM, November 2014. https://doi.org/10.1145/2663716.2663721
Richter, P., Padmanabhan, R., Spring, N., Berger, A., Clark, D.: Advancing the art of Internet edge outage detection. In: Proceedings of the ACM Internet Measurement Conference, Boston, Massachusetts, USA. ACM, October 2018. https://doi.org/10.1145/3278532.3278563
Schulman, A., Spring, N.: Pingin’ in the rain. In: Proceedings of the ACM Internet Measurement Conference, pp. 19–25. Berlin, Germany. ACM, November 2011. https://doi.org/10.1145/2068816.2068819
Shah, A., Fontugne, R., Aben, E., Pelsser, C., Bush, R.: Disco: fast, good, and cheap outage detection. In: Proceedings of the IEEE International Conference on Traffic Monitoring and Analysis, Dublin, Ireland, pp. 1–9. IEEE, June 2017. https://doi.org/10.23919/TMA.2017.8002902
USC/ISI ANT Project. https://ant.isi.edu/datasets/outage/index.html
Acknowledgments
We thank Yuri Pradkin for his input on the algorithms and paper.
We thank Philipp Richter and Arthur Berger for discussions about their work, and Philipp for re-running his comparison with CDN data.
The work is supported in part by the National Science Foundation, CISE Directorate, award CNS-1806785; by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number 70RSAT18CB0000014; and by Air Force Research Laboratory under agreement number FA8750-18-2-0280. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Other Block Examples
Section 2.1 described the problem of sparse blocks and why FBS is needed. Here we provide examples of other blocks where sparsity changes to illustrate when FBS is required.
The block in the left part of Fig. 6 has no activity for three weeks, then sparse use for a week, then moderate use, and back to sparse use for the last two weeks. Reverse DNS suggests this block uses DHCP, and gradual changes in use suggest the ISP is migrating users. The block was provably reachable after the first three weeks. Before then it may have been reachable but unused, a false outage because the block is inactive.
The third bar from the top (c) of the left of Fig. 6 we show that Trinocular often marks the block unknown (in red) for the week starting 2017-10-30, and again for weeks after 2017-12-12. Every address in this block has responded in the past. But for these two periods, only a few are actually used, making the block temporarily sparse. Figure 6 (left, bar b) shows how FBS is able to accurately fix Trinocular’s pitfalls in such a DHCP scenario.
Figure 6 (right) shows a block example with a lone address. This block has three phases of use: before 2017-02-16, many addresses are in use; then for about 9 days, nothing replies; then, starting on 2017-02-25 only the .1 address replies. During the last phase, Trinocular (Fig. 6 (right, bar c)) completely ignores that there is one address responding, while FBS (Fig. 6 (right, bar b)) sets block status depending on responses of this lone-address. However, LABR (Fig. 6 (right, bar a)) changes all the FBS detected down events to unknown, as there is not information to claim a down event, in contrast to what the end of phase one shows.
B Block Usage Change
As mentioned in Sect. 2.1, when blocks become temporarily sparse (showing a small A(E(b))), the number of false outages increases. On the other hand, denser blocks offer higher inference correctness.
Our prior work dynamically estimated A [13], but Richter et al. showed that block usage changes dramatically, so blocks can become overly sparse even with tracking [14].
We first show that sparse blocks cause the majority of outage events. In Fig. 7 (left) we compare the number of outages in all 4M responsive blocks with their measured A(E(b)) value during 2017q4. Blocks with a higher number of outages tend to have a lower A(E(b)) value. In particular those closer to the lower bound. Trinocular does not track blocks with long term \(A(E(B))<0.1\), however as blocks sparseness changes, this value does change during measure time.
The correlation between sparse blocks and frequent outage events is clearer when we look at a cumulative distribution. Figure 7 (right) shows the cumulative distribution of A for all 4M responsive blocks (light green, the lower line), and for blocks with 10 or more down events (the red, upper line) as measured during 2017q4. These lines are after merging observations obtained from six Trinocular vantage points. We find that 80% of blocks with 10 or more down events have an \(A<0.2\), at around the knee of the curve, and yet these sparse blocks represent only 22% of all blocks. The figure suggests a correlation between high number of down events and low A(E(b)) per block due to the faster convergence of the line representing blocks with multiple down events. (It confirms the heuristic of “more than 5 events” that was used to filter sparse Trinocular blocks in the 2017 CDN comparison [14].)
Although we observe from multiple locations, merging results from different vantage points is not sufficient to deal with sparse blocks, because these multiple sites all face the same problem of sparseness leading to inconsistent results. Addressing this problem is a goal of FBS, and it also allows us to grow coverage.
C Comparing Trinocular and FBS
In Sect. 4.2 we discuss how often FBS changes outages when compared to Trinocular. We examine two different metrics: total block down time and number of down events. Here we provide further information distribution about the distribution of these metrics.
In Fig. 8 (left) we show block distribution of Trinocular and FBS down time fraction difference. The majority of blocks (91%) have little or no change. Blocks on the left side of the figure representing 9% of the total, have a higher down time fraction when processed only with Trinocular than when processed with FBS. For example, a \(-1\) shows a block that was down for Trinocular during the whole quarter, while FBS was able to completely recover it. This outcome occurs when a historically high |E(b)| block has temporarily dropped to just a few stable addresses.
We also see a small percentage (0.5%) where FBS has a higher down fraction than Trinocular. This increase in outages fraction happens when Trinocular erroneously marks a block as UP. With more information, FBS is able to correctly change block state and more accurately reflect truth.
In Fig. 8 (right) we look to the distribution of blocks when compared by the number of down events observed in FBS and Trinocular. Similarly, the number of down events remains mostly unchanged for the majority of blocks (94%). Trinocular has more down events for 6% of blocks, and FBS shows more events for 0.1%. FBS can increase the absolute number of events in a block when breaking long events into shorter pieces.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Baltra, G., Heidemann, J. (2020). Improving Coverage of Internet Outage Detection in Sparse Blocks. In: Sperotto, A., Dainotti, A., Stiller, B. (eds) Passive and Active Measurement. PAM 2020. Lecture Notes in Computer Science(), vol 12048. Springer, Cham. https://doi.org/10.1007/978-3-030-44081-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-44081-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44080-0
Online ISBN: 978-3-030-44081-7
eBook Packages: Computer ScienceComputer Science (R0)