Controlled Data Sharing for Collaborative Predictive Blacklisting

Freudiger, Julien; De Cristofaro, Emiliano; Brito, Alejandro E.

doi:10.1007/978-3-319-20550-2_17

Julien Freudiger¹⁶,
Emiliano De Cristofaro¹⁷ &
Alejandro E. Brito¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9148))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2921 Accesses
12 Citations

Abstract

Although data sharing across organizations is often advocated as a promising way to enhance cybersecurity, collaborative initiatives are rarely put into practice owing to confidentiality, trust, and liability challenges. We investigate whether collaborative threat mitigation can be realized via controlled data sharing. With such an approach, organizations make informed decisions as to whether or not to share data, and how much. We propose using cryptographic tools for entities to estimate the benefits of collaboration and agree on what to share without having to disclose their datasets (i.e., in a privacy-preserving way). We focus on collaborative predictive blacklisting: Forecasting attack sources based on one’s logs and those contributed by other organizations. We study the impact of different sharing strategies by experimenting on a real-world dataset of two billion suspicious IP addresses collected from Dshield over two months. We find that controlled data sharing yields up to 105 % accuracy improvement on average, while also reducing the false positive rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Facebook ThreatExchange. https://threatexchange.fb.com (2015)
Ackerman, S.: Privacy experts question Obama’s plan for new agency to counter cyber threats - the Guardian. http://gu.com/p/45yvz (2015)
Adar, E.: User 49: anonymizing query logs. In: Query Log Analysis Workshop (2007)
Google Scholar
Applebaum, B., Ringberg, H., Freedman, M.J., Caesar, M., Rexford, J.: Collaborative, privacy-preserving data aggregation at scale. In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 56–74. Springer, Heidelberg (2010)
Chapter Google Scholar
Bilogrevic, I., Freudiger, J., De Cristofaro, E., Uzun, E.: What’s the gist? privacy-preserving aggregation of user profiles. In: Kutyłowski, M., Vaidya, J. (eds.) ICAIS 2014, Part II. LNCS, vol. 8713, pp. 128–145. Springer, Heidelberg (2014)
Google Scholar
Blundo, C., De Cristofaro, E., Gasti, P.: EsPRESSo: Efficient privacy-preserving evaluation of sample set similarity. JCS 22(3), 355–381 (2014)
Google Scholar
Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.: SEPIA: Privacy-preserving aggregation of multi-domain network events and statistics. In: Usenix Security (2010)
Google Scholar
Coull, S.E., Wright, C.V., Monrose, F., Collins, M.P., Reiter, M.K.: Playing devil’s advocate: inferring sensitive information from anonymized network traces. In: NDSS (2007)
Google Scholar
CSRIC Working Group 7.: U.S. anti-bot code of conduct for Internet service providers: barriers and metrics considerations (2013)
Google Scholar
Damiani, E., De Capitani di Vimercati, S., Paraboschi, S., Samarati, P.: P2P-based collaborative spam detection and filtering. In: P2P (2004)
Google Scholar
De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: Pieprzyk, J., Sadeghi, A.-R., Manulis, M. (eds.) CANS 2012. LNCS, vol. 7712, pp. 218–231. Springer, Heidelberg (2012)
Chapter Google Scholar
De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010)
Chapter Google Scholar
De Cristofaro, E., Tsudik, G.: Experimenting with fast private set intersection. In: Katzenbeisser, S., Weippl, E., Camp, L.J., Volkamer, M., Reiter, M., Zhang, X. (eds.) Trust 2012. LNCS, vol. 7344, pp. 55–73. Springer, Heidelberg (2012)
Chapter Google Scholar
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004)
Chapter Google Scholar
Freudiger, J., Rane, S., Brito, A.E., Uzun, E.: Privacy preserving data quality assessment for high-fidelity data sharing. In: WISCS (2014)
Google Scholar
Gusfield, D., Irving, R.W.: The Stable Marriage Problem: Structure and Algorithms. MIT Press, Cambridge (1989)
MATH Google Scholar
Hailpern, B.T., Malkin, P.K., Schloss, R.: Collaborative server processing of content and meta-information with application to virus checking in a server network, US Patent 6,275,937 (2001)
Google Scholar
Huang, Y., Evans, D., Katz, J.: Private set intersection: are garbled circuits better than custom protocols? In: NDSS (2012)
Google Scholar
Huang, Y., Evans, D., Katz, J., Malka, L.: Faster secure two-party computation using garbled circuits. In: Usenix Security (2011)
Google Scholar
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura
Google Scholar
Katti, S., Krishnamurthy, B. Katabi, D.: Collaborating against common enemies. In: IMC (2005)
Google Scholar
Kenneally, E., Claffy, K.: Dialing privacy and utility: a proposed data-sharing framework to advance internet research. IEEE Secur. Priv. 8(4), 31–39 (2010)
Article Google Scholar
Lakkaraju, K., Slagell, A.: Evaluating the utility of anonymized network traces for intrusion detection. In: Securecomm (2008)
Google Scholar
Lincoln, P., Porras, P., Shmatikov, V.: Privacy-preserving sharing and correction of security alerts. In: Usenix Security (2004)
Google Scholar
Locasto, M.E., Parekh, J.J., Keromytis, A.D., Stolfo, S.J.: Towards collaborative security and P2P intrusion detection. In: Information Assurance Workshop (2005)
Google Scholar
Oikonomou, G., Mirkovic, J., Reiher, P., Robinson, M.: A framework for a collaborative DDoS defense. In: ACSAC (2006)
Google Scholar
Pinkas, B., Schneider, T., Zohner, M.: Faster private set intersection based on OT extension. In: Usenix Security (2014)
Google Scholar
Porras, P., Shmatikov, V.: Large-scale collection and sanitization of network security data: risks and challenges. In: New Security Paradigms Workshop (NSPW) (2006)
Google Scholar
Pouget, F., Dacier, M., Pham, V.H.: Vh: Leurre. com: on the advantages of deploying a large scale distributed honeypot platform. In: E-Crime and Computer Conference (2005)
Google Scholar
Red Sky Alliance. http://redskyalliance.org/
SANS Technology Institute.: DShield Data. https://www.dshield.org/
Slagell, A., Yurcik, W.: Sharing computer network logs for security and privacy: a motivation for new methodologies of anonymization. In: Securecomm (2005)
Google Scholar
Soldo, F., Le, A., Markopoulou, A.: Predictive blacklisting as an implicit recommendation system. In: INFOCOM (2010)
Google Scholar
Song, C., Qu, Z., Blumm, N., Barabási, A.-L.: Limits of predictability in human mobility. Sci. 327, 1018–1021 (2010)
Article MATH Google Scholar
The White House.: Executive order promoting private sector cybersecurity information sharing (2015). http://1.usa.gov/1vISfBO
Worldwide observatory of malicious behaviors and attack threats (2013). http://www.wombat-project.eu/
Xu, J., Fan, J., Ammar, M.H., Moon, S.B.: Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme. In: ICNP (2002)
Google Scholar
Yao, A.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science, FOCS, pp. 160–164 (1982)
Google Scholar
Yegneswaran, V., Barford, P., Jha, S.: Global intrusion detection in the DOMINO overlay system. In: NDSS (2004)
Google Scholar
Zhang, J., Porras, P.A., Ullrich, J.: Highly predictive blacklisting. In: Usenix Security (2008)
Google Scholar

Download references

Acknowledgments

We wish to thank DShield.org and Johannes Ullrich for providing the dataset used in our experiments, as well as Ersin Uzun, Marshall Bern, Craig Saunders, and Anton Chuvakin for their useful comments and feedback. Work done in part while Emiliano De Cristofaro was with PARC.

Author information

Authors and Affiliations

PARC (a Xerox Company), Palo Alto, USA
Julien Freudiger & Alejandro E. Brito
University College London, London, UK
Emiliano De Cristofaro

Authors

Julien Freudiger
View author publications
You can also search for this author in PubMed Google Scholar
Emiliano De Cristofaro
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro E. Brito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Freudiger .

Editor information

Editors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Magnus Almgren
Chalmers University of Technology, Gothenburg, Sweden
Vincenzo Gulisano
Politecnico di Milano, Milan, Italy
Federico Maggi

A. Additional Analysis of the DShield Dataset

General Statistics. We present in Fig. 5(a) the histogram of the number of attacks per day, indicating about 30 M daily attacks. We observe a significant increase around day 50 to 100 M attacks. Careful analysis reveals that a series of IP addresses starts to aggressively attack around day 50, indicating a possible DoS attack initiation.

Figure 5(b) shows the number of unique targets and sources over time. Detailed analysis shows a stable number of sources and targets. This stability confirms that it should be possible to predict attackers’ tactics based on past observations. An analysis of attacked ports shows that top 10 attacked ports (with more than 10M hits) are Telnet, HTTP, SSH, DNS, FTP, BGP, Active Directory, and Netbios ports. This shows a clear trend towards misuse of popular web services.

In Fig. 6, we plot the CDF of the fraction of victims that contribute logs to DShield over two months, and observe that few victims contribute daily.

Predictability. Figure 7 shows the CDF of the Shannon entropy of the different log entry elements. Since entropy correlates with predictability (following Fano’s inequality [34]), it helps estimate our ability to predict a given IP address, port number or target appearing in the logs. To obtain this figure, we estimate the probability of each victim, source or port being attacked each day. For example, for each port i, we compute:

$$\begin{aligned} \text {Pr}(\text {Port } i \text { on day } j) = \frac{\text {Attacks on Port } i \text { on day }j}{\text { Attacks on day } j} \end{aligned}$$

(1)

We compute the entropy for each day and then aggregate it overall using the CDF. We observe that ports numbers have the lower entropy distribution, indicating a small set of targeted ports: $80\,\%$ of attacks target a set of $2^7=128$ ports, indicating high predictability. We also observe that victims are more predictable than sources, as $90\,\%$ of victims lie within a set of $2^{12}=4096$ victims as compared to $90\,\%$ of sources being in a list of $2^{14}=16,384$ sources. Victims’ set is thus significantly smaller and more predictable than attackers’ set.

Intensity. Figure 8(a) shows the inter-arrival time of attacks in hours, and Fig. 8(b) shows the inter-arrival time of attacks in seconds. We observe that almost all attacks occur within 3-minute windows. IP addresses and / 24 subnetworks have similar behavior. In particular, Fig. 8(b) shows that in short time intervals, $85\,\%$ of / 8 subnetworks have short attack inter-arrival time indicating the bursty attacks on such networks. Attackers target subnetworks for a short time and then disappear.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freudiger, J., De Cristofaro, E., Brito, A.E. (2015). Controlled Data Sharing for Collaborative Predictive Blacklisting. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-20550-2_17
Published: 23 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20549-6
Online ISBN: 978-3-319-20550-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Controlled Data Sharing for Collaborative Predictive Blacklisting

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A. Additional Analysis of the DShield Dataset

A. Additional Analysis of the DShield Dataset

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation