Skip to main content

Improving Coverage of Internet Outage Detection in Sparse Blocks

  • Conference paper
  • First Online:
Book cover Passive and Active Measurement (PAM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 12048))

Included in the following conference series:

Abstract

There is a growing interest in carefully observing the reliability of the Internet’s edge. Outage information can inform our understanding of Internet reliability and planning, and it can help guide operations. Active outage detection methods provide results for more than 3M blocks, and passive methods more than 2M, but both are challenged by sparse blocks where few addresses respond or send traffic. We propose a new Full Block Scanning (FBS) algorithm to improve coverage for active scanning by providing reliable results for sparse blocks by gathering more information before making a decision. FBS identifies sparse blocks and takes additional time before making decisions about their outages, thereby addressing previous concerns about false outages while preserving strict limits on probe rates. We show that FBS can improve coverage by correcting 1.2M blocks that would otherwise be too sparse to correctly report, and potentially adding 1.7M additional blocks. FBS can be applied retroactively to existing datasets to improve prior coverage and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. IODA: Internet outage detection & analysis. https://ioda.caida.org

  2. Baltra, G., Heidemann, J.: Improving the optics of active outage detection (extended). Technical report ISI-TR-733, May 2019. https://www.isi.edu/%7ejohnh/PAPERS/Baltra19a.html

  3. Dainotti, A., et al.: Lost in space: improving inference of IPv4 address space utilization. IEEE J. Sel. Areas Commun. (JSAC) 34(6), 1862–1876 (2016)

    Article  Google Scholar 

  4. Dainotti, A., et al.: Analysis of country-wide Internet outages caused by censorship. In: Proceedings of the ACM Internet Measurement Conference, Berlin, Germany, pp. 1–18. ACM, November 2011. https://doi.org/10.1145/2068816.2068818

  5. Madory, D.: Iraq downs internet to combat cheating...again! (2017). https://dyn.com/blog/iraq-downs-internet-to-combat-cheating-again/. Accessed 01 Aug 2019

  6. Guillot, A., et al.: Chocolatine: outage detection for internet background radiation. In: Proceedings of the IFIP International Workshop on Traffic Monitoring and Analysis. IFIP, Paris, France, June 2019. https://clarinet.u-strasbg.fr/~pelsser/publications/Guillot-chocolatine-TMA2019.pdf

  7. Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., Bannister, J.: Census and survey of the visible Internet. In: Proceedings of the ACM Internet Measurement Conference, Vouliagmeni, Greece, pp. 169–182. ACM, October 2008. https://doi.org/10.1145/1452520.1452542

  8. Internet Addresses Survey dataset, PREDICT ID: USC-LANDER/internet-address-survey-reprobing-it75w-20170427

    Google Scholar 

  9. MaxMind: GeoIP Geolocation Products (2017). http://www.maxmind.com/en/city

  10. Padmanabhan, R., Dhamdhere, A., Aben, E., Claffy, K.C., Spring, N.: Reasons dynamic addresses change. In: Proceedings of the ACM Internet Measurement Conference, Santa Monica, CA, USA. ACM, November 2016. https://doi.org/10.1145/2987443.2987461

  11. Padmanabhan, R., Schulman, A., Levin, D., Spring, N.: Residential links under the weather. In: Proceedings of the ACM Special Interest Group on Data Communication, pp. 145–158. ACM (2019)

    Google Scholar 

  12. Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding Internet reliability through adaptive probing. In: Proceedings of the ACM SIGCOMM Conference, Hong Kong, China, pp. 255–266. ACM, August 2013. https://doi.org/10.1145/2486001.2486017

  13. Quan, L., Heidemann, J., Pradkin, Y.: When the Internet sleeps: correlating diurnal networks with external factors. In: Proceedings of the ACM Internet Measurement Conference, Vancouver, BC, Canada, pp. 87–100. ACM, November 2014. https://doi.org/10.1145/2663716.2663721

  14. Richter, P., Padmanabhan, R., Spring, N., Berger, A., Clark, D.: Advancing the art of Internet edge outage detection. In: Proceedings of the ACM Internet Measurement Conference, Boston, Massachusetts, USA. ACM, October 2018. https://doi.org/10.1145/3278532.3278563

  15. Schulman, A., Spring, N.: Pingin’ in the rain. In: Proceedings of the ACM Internet Measurement Conference, pp. 19–25. Berlin, Germany. ACM, November 2011. https://doi.org/10.1145/2068816.2068819

  16. Shah, A., Fontugne, R., Aben, E., Pelsser, C., Bush, R.: Disco: fast, good, and cheap outage detection. In: Proceedings of the IEEE International Conference on Traffic Monitoring and Analysis, Dublin, Ireland, pp. 1–9. IEEE, June 2017. https://doi.org/10.23919/TMA.2017.8002902

  17. USC/ISI ANT Project. https://ant.isi.edu/datasets/outage/index.html

Download references

Acknowledgments

We thank Yuri Pradkin for his input on the algorithms and paper.

We thank Philipp Richter and Arthur Berger for discussions about their work, and Philipp for re-running his comparison with CDN data.

The work is supported in part by the National Science Foundation, CISE Directorate, award CNS-1806785; by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number 70RSAT18CB0000014; and by Air Force Research Laboratory under agreement number FA8750-18-2-0280. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo Baltra .

Editor information

Editors and Affiliations

Appendices

A Other Block Examples

Fig. 6.
figure 6

Sample blocks over time (columns). The bottom (d) shows individual address as rows, with colored dots when the address responds to Trinocular. Bar (c) shows Trinocular status (up, unknown, and down), bar (b) is Full Block Scanning, and the top bar (a), Lone Address Block Recovery. (Color figure online)

Section 2.1 described the problem of sparse blocks and why FBS is needed. Here we provide examples of other blocks where sparsity changes to illustrate when FBS is required.

The block in the left part of Fig. 6 has no activity for three weeks, then sparse use for a week, then moderate use, and back to sparse use for the last two weeks. Reverse DNS suggests this block uses DHCP, and gradual changes in use suggest the ISP is migrating users. The block was provably reachable after the first three weeks. Before then it may have been reachable but unused, a false outage because the block is inactive.

The third bar from the top (c) of the left of Fig. 6 we show that Trinocular often marks the block unknown (in red) for the week starting 2017-10-30, and again for weeks after 2017-12-12. Every address in this block has responded in the past. But for these two periods, only a few are actually used, making the block temporarily sparse. Figure 6 (left, bar b) shows how FBS is able to accurately fix Trinocular’s pitfalls in such a DHCP scenario.

Figure 6 (right) shows a block example with a lone address. This block has three phases of use: before 2017-02-16, many addresses are in use; then for about 9 days, nothing replies; then, starting on 2017-02-25 only the .1 address replies. During the last phase, Trinocular (Fig. 6 (right, bar c)) completely ignores that there is one address responding, while FBS (Fig. 6 (right, bar b)) sets block status depending on responses of this lone-address. However, LABR (Fig. 6 (right, bar a)) changes all the FBS detected down events to unknown, as there is not information to claim a down event, in contrast to what the end of phase one shows.

B Block Usage Change

As mentioned in Sect. 2.1, when blocks become temporarily sparse (showing a small A(E(b))), the number of false outages increases. On the other hand, denser blocks offer higher inference correctness.

Our prior work dynamically estimated A [13], but Richter et al. showed that block usage changes dramatically, so blocks can become overly sparse even with tracking [14].

Fig. 7.
figure 7

Blocks distributed according to the number of outages versus their A(E(b)) (left), and cumulative distribution function of the A value per block (right) as collected during 2017q4 for the whole responsive IPv4 address scope. Dataset A30. (Color figure online)

We first show that sparse blocks cause the majority of outage events. In Fig. 7 (left) we compare the number of outages in all 4M responsive blocks with their measured A(E(b)) value during 2017q4. Blocks with a higher number of outages tend to have a lower A(E(b)) value. In particular those closer to the lower bound. Trinocular does not track blocks with long term \(A(E(B))<0.1\), however as blocks sparseness changes, this value does change during measure time.

The correlation between sparse blocks and frequent outage events is clearer when we look at a cumulative distribution. Figure 7 (right) shows the cumulative distribution of A for all 4M responsive blocks (light green, the lower line), and for blocks with 10 or more down events (the red, upper line) as measured during 2017q4. These lines are after merging observations obtained from six Trinocular vantage points. We find that 80% of blocks with 10 or more down events have an \(A<0.2\), at around the knee of the curve, and yet these sparse blocks represent only 22% of all blocks. The figure suggests a correlation between high number of down events and low A(E(b)) per block due to the faster convergence of the line representing blocks with multiple down events. (It confirms the heuristic of “more than 5 events” that was used to filter sparse Trinocular blocks in the 2017 CDN comparison [14].)

Although we observe from multiple locations, merging results from different vantage points is not sufficient to deal with sparse blocks, because these multiple sites all face the same problem of sparseness leading to inconsistent results. Addressing this problem is a goal of FBS, and it also allows us to grow coverage.

C Comparing Trinocular and FBS

In Sect. 4.2 we discuss how often FBS changes outages when compared to Trinocular. We examine two different metrics: total block down time and number of down events. Here we provide further information distribution about the distribution of these metrics.

Fig. 8.
figure 8

Cumulative distribution of down fraction difference (left) and number of down events difference (right) between Trinocular and FBS for 2017q4. Dataset A30.

In Fig. 8 (left) we show block distribution of Trinocular and FBS down time fraction difference. The majority of blocks (91%) have little or no change. Blocks on the left side of the figure representing 9% of the total, have a higher down time fraction when processed only with Trinocular than when processed with FBS. For example, a \(-1\) shows a block that was down for Trinocular during the whole quarter, while FBS was able to completely recover it. This outcome occurs when a historically high |E(b)| block has temporarily dropped to just a few stable addresses.

We also see a small percentage (0.5%) where FBS has a higher down fraction than Trinocular. This increase in outages fraction happens when Trinocular erroneously marks a block as UP. With more information, FBS is able to correctly change block state and more accurately reflect truth.

In Fig. 8 (right) we look to the distribution of blocks when compared by the number of down events observed in FBS and Trinocular. Similarly, the number of down events remains mostly unchanged for the majority of blocks (94%). Trinocular has more down events for 6% of blocks, and FBS shows more events for 0.1%. FBS can increase the absolute number of events in a block when breaking long events into shorter pieces.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baltra, G., Heidemann, J. (2020). Improving Coverage of Internet Outage Detection in Sparse Blocks. In: Sperotto, A., Dainotti, A., Stiller, B. (eds) Passive and Active Measurement. PAM 2020. Lecture Notes in Computer Science(), vol 12048. Springer, Cham. https://doi.org/10.1007/978-3-030-44081-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44081-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44080-0

  • Online ISBN: 978-3-030-44081-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics