Skip to main content

Controlled Data Sharing for Collaborative Predictive Blacklisting

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2015)

Abstract

Although data sharing across organizations is often advocated as a promising way to enhance cybersecurity, collaborative initiatives are rarely put into practice owing to confidentiality, trust, and liability challenges. We investigate whether collaborative threat mitigation can be realized via controlled data sharing. With such an approach, organizations make informed decisions as to whether or not to share data, and how much. We propose using cryptographic tools for entities to estimate the benefits of collaboration and agree on what to share without having to disclose their datasets (i.e., in a privacy-preserving way). We focus on collaborative predictive blacklisting: Forecasting attack sources based on one’s logs and those contributed by other organizations. We study the impact of different sharing strategies by experimenting on a real-world dataset of two billion suspicious IP addresses collected from Dshield over two months. We find that controlled data sharing yields up to 105 % accuracy improvement on average, while also reducing the false positive rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Facebook ThreatExchange. https://threatexchange.fb.com (2015)

  2. Ackerman, S.: Privacy experts question Obama’s plan for new agency to counter cyber threats - the Guardian. http://gu.com/p/45yvz (2015)

  3. Adar, E.: User 49: anonymizing query logs. In: Query Log Analysis Workshop (2007)

    Google Scholar 

  4. Applebaum, B., Ringberg, H., Freedman, M.J., Caesar, M., Rexford, J.: Collaborative, privacy-preserving data aggregation at scale. In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 56–74. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Bilogrevic, I., Freudiger, J., De Cristofaro, E., Uzun, E.: What’s the gist? privacy-preserving aggregation of user profiles. In: Kutyłowski, M., Vaidya, J. (eds.) ICAIS 2014, Part II. LNCS, vol. 8713, pp. 128–145. Springer, Heidelberg (2014)

    Google Scholar 

  6. Blundo, C., De Cristofaro, E., Gasti, P.: EsPRESSo: Efficient privacy-preserving evaluation of sample set similarity. JCS 22(3), 355–381 (2014)

    Google Scholar 

  7. Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.: SEPIA: Privacy-preserving aggregation of multi-domain network events and statistics. In: Usenix Security (2010)

    Google Scholar 

  8. Coull, S.E., Wright, C.V., Monrose, F., Collins, M.P., Reiter, M.K.: Playing devil’s advocate: inferring sensitive information from anonymized network traces. In: NDSS (2007)

    Google Scholar 

  9. CSRIC Working Group 7.: U.S. anti-bot code of conduct for Internet service providers: barriers and metrics considerations (2013)

    Google Scholar 

  10. Damiani, E., De Capitani di Vimercati, S., Paraboschi, S., Samarati, P.: P2P-based collaborative spam detection and filtering. In: P2P (2004)

    Google Scholar 

  11. De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: Pieprzyk, J., Sadeghi, A.-R., Manulis, M. (eds.) CANS 2012. LNCS, vol. 7712, pp. 218–231. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. De Cristofaro, E., Tsudik, G.: Experimenting with fast private set intersection. In: Katzenbeisser, S., Weippl, E., Camp, L.J., Volkamer, M., Reiter, M., Zhang, X. (eds.) Trust 2012. LNCS, vol. 7344, pp. 55–73. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Freudiger, J., Rane, S., Brito, A.E., Uzun, E.: Privacy preserving data quality assessment for high-fidelity data sharing. In: WISCS (2014)

    Google Scholar 

  16. Gusfield, D., Irving, R.W.: The Stable Marriage Problem: Structure and Algorithms. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  17. Hailpern, B.T., Malkin, P.K., Schloss, R.: Collaborative server processing of content and meta-information with application to virus checking in a server network, US Patent 6,275,937 (2001)

    Google Scholar 

  18. Huang, Y., Evans, D., Katz, J.: Private set intersection: are garbled circuits better than custom protocols? In: NDSS (2012)

    Google Scholar 

  19. Huang, Y., Evans, D., Katz, J., Malka, L.: Faster secure two-party computation using garbled circuits. In: Usenix Security (2011)

    Google Scholar 

  20. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura

    Google Scholar 

  21. Katti, S., Krishnamurthy, B. Katabi, D.: Collaborating against common enemies. In: IMC (2005)

    Google Scholar 

  22. Kenneally, E., Claffy, K.: Dialing privacy and utility: a proposed data-sharing framework to advance internet research. IEEE Secur. Priv. 8(4), 31–39 (2010)

    Article  Google Scholar 

  23. Lakkaraju, K., Slagell, A.: Evaluating the utility of anonymized network traces for intrusion detection. In: Securecomm (2008)

    Google Scholar 

  24. Lincoln, P., Porras, P., Shmatikov, V.: Privacy-preserving sharing and correction of security alerts. In: Usenix Security (2004)

    Google Scholar 

  25. Locasto, M.E., Parekh, J.J., Keromytis, A.D., Stolfo, S.J.: Towards collaborative security and P2P intrusion detection. In: Information Assurance Workshop (2005)

    Google Scholar 

  26. Oikonomou, G., Mirkovic, J., Reiher, P., Robinson, M.: A framework for a collaborative DDoS defense. In: ACSAC (2006)

    Google Scholar 

  27. Pinkas, B., Schneider, T., Zohner, M.: Faster private set intersection based on OT extension. In: Usenix Security (2014)

    Google Scholar 

  28. Porras, P., Shmatikov, V.: Large-scale collection and sanitization of network security data: risks and challenges. In: New Security Paradigms Workshop (NSPW) (2006)

    Google Scholar 

  29. Pouget, F., Dacier, M., Pham, V.H.: Vh: Leurre. com: on the advantages of deploying a large scale distributed honeypot platform. In: E-Crime and Computer Conference (2005)

    Google Scholar 

  30. Red Sky Alliance. http://redskyalliance.org/

  31. SANS Technology Institute.: DShield Data. https://www.dshield.org/

  32. Slagell, A., Yurcik, W.: Sharing computer network logs for security and privacy: a motivation for new methodologies of anonymization. In: Securecomm (2005)

    Google Scholar 

  33. Soldo, F., Le, A., Markopoulou, A.: Predictive blacklisting as an implicit recommendation system. In: INFOCOM (2010)

    Google Scholar 

  34. Song, C., Qu, Z., Blumm, N., Barabási, A.-L.: Limits of predictability in human mobility. Sci. 327, 1018–1021 (2010)

    Article  MATH  Google Scholar 

  35. The White House.: Executive order promoting private sector cybersecurity information sharing (2015). http://1.usa.gov/1vISfBO

  36. Worldwide observatory of malicious behaviors and attack threats (2013). http://www.wombat-project.eu/

  37. Xu, J., Fan, J., Ammar, M.H., Moon, S.B.: Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme. In: ICNP (2002)

    Google Scholar 

  38. Yao, A.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science, FOCS, pp. 160–164 (1982)

    Google Scholar 

  39. Yegneswaran, V., Barford, P., Jha, S.: Global intrusion detection in the DOMINO overlay system. In: NDSS (2004)

    Google Scholar 

  40. Zhang, J., Porras, P.A., Ullrich, J.: Highly predictive blacklisting. In: Usenix Security (2008)

    Google Scholar 

Download references

Acknowledgments

We wish to thank DShield.org and Johannes Ullrich for providing the dataset used in our experiments, as well as Ersin Uzun, Marshall Bern, Craig Saunders, and Anton Chuvakin for their useful comments and feedback. Work done in part while Emiliano De Cristofaro was with PARC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Freudiger .

Editor information

Editors and Affiliations

A. Additional Analysis of the DShield Dataset

A. Additional Analysis of the DShield Dataset

General Statistics. We present in Fig. 5(a) the histogram of the number of attacks per day, indicating about 30 M daily attacks. We observe a significant increase around day 50 to 100 M attacks. Careful analysis reveals that a series of IP addresses starts to aggressively attack around day 50, indicating a possible DoS attack initiation.

Fig. 5.
figure 5

General DShield characteristics: (a) histogram of number of attacks per day. (b) Number of unique targets and sources.

Fig. 6.
figure 6

Fraction of days each target contributes.

Fig. 7.
figure 7

CDF of entropy of different attack parameters.

Fig. 8.
figure 8

CDF of inter-arrival time of attacks: (a) per hour, and (b) per second. All indicates the inter-arrival time of any attacks,  / 8 of common  / 8 subnetworks,  / 24 of common  / 24 subnetworks, and IP of the same IP.

Figure 5(b) shows the number of unique targets and sources over time. Detailed analysis shows a stable number of sources and targets. This stability confirms that it should be possible to predict attackers’ tactics based on past observations. An analysis of attacked ports shows that top 10 attacked ports (with more than 10M hits) are Telnet, HTTP, SSH, DNS, FTP, BGP, Active Directory, and Netbios ports. This shows a clear trend towards misuse of popular web services.

In Fig. 6, we plot the CDF of the fraction of victims that contribute logs to DShield over two months, and observe that few victims contribute daily.

Predictability. Figure 7 shows the CDF of the Shannon entropy of the different log entry elements. Since entropy correlates with predictability (following Fano’s inequality [34]), it helps estimate our ability to predict a given IP address, port number or target appearing in the logs. To obtain this figure, we estimate the probability of each victim, source or port being attacked each day. For example, for each port i, we compute:

$$\begin{aligned} \text {Pr}(\text {Port } i \text { on day } j) = \frac{\text {Attacks on Port } i \text { on day }j}{\text { Attacks on day } j} \end{aligned}$$
(1)

We compute the entropy for each day and then aggregate it overall using the CDF. We observe that ports numbers have the lower entropy distribution, indicating a small set of targeted ports: \(80\,\%\) of attacks target a set of \(2^7=128\) ports, indicating high predictability. We also observe that victims are more predictable than sources, as \(90\,\%\) of victims lie within a set of \(2^{12}=4096\) victims as compared to \(90\,\%\) of sources being in a list of \(2^{14}=16,384\) sources. Victims’ set is thus significantly smaller and more predictable than attackers’ set.

Intensity. Figure 8(a) shows the inter-arrival time of attacks in hours, and Fig. 8(b) shows the inter-arrival time of attacks in seconds. We observe that almost all attacks occur within 3-minute windows. IP addresses and  / 24 subnetworks have similar behavior. In particular, Fig. 8(b) shows that in short time intervals, \(85\,\%\) of  / 8 subnetworks have short attack inter-arrival time indicating the bursty attacks on such networks. Attackers target subnetworks for a short time and then disappear.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Freudiger, J., De Cristofaro, E., Brito, A.E. (2015). Controlled Data Sharing for Collaborative Predictive Blacklisting. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20550-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20549-6

  • Online ISBN: 978-3-319-20550-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics