Abstract
Although data sharing across organizations is often advocated as a promising way to enhance cybersecurity, collaborative initiatives are rarely put into practice owing to confidentiality, trust, and liability challenges. We investigate whether collaborative threat mitigation can be realized via controlled data sharing. With such an approach, organizations make informed decisions as to whether or not to share data, and how much. We propose using cryptographic tools for entities to estimate the benefits of collaboration and agree on what to share without having to disclose their datasets (i.e., in a privacy-preserving way). We focus on collaborative predictive blacklisting: Forecasting attack sources based on one’s logs and those contributed by other organizations. We study the impact of different sharing strategies by experimenting on a real-world dataset of two billion suspicious IP addresses collected from Dshield over two months. We find that controlled data sharing yields up to 105 % accuracy improvement on average, while also reducing the false positive rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Facebook ThreatExchange. https://threatexchange.fb.com (2015)
Ackerman, S.: Privacy experts question Obama’s plan for new agency to counter cyber threats - the Guardian. http://gu.com/p/45yvz (2015)
Adar, E.: User 49: anonymizing query logs. In: Query Log Analysis Workshop (2007)
Applebaum, B., Ringberg, H., Freedman, M.J., Caesar, M., Rexford, J.: Collaborative, privacy-preserving data aggregation at scale. In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 56–74. Springer, Heidelberg (2010)
Bilogrevic, I., Freudiger, J., De Cristofaro, E., Uzun, E.: What’s the gist? privacy-preserving aggregation of user profiles. In: Kutyłowski, M., Vaidya, J. (eds.) ICAIS 2014, Part II. LNCS, vol. 8713, pp. 128–145. Springer, Heidelberg (2014)
Blundo, C., De Cristofaro, E., Gasti, P.: EsPRESSo: Efficient privacy-preserving evaluation of sample set similarity. JCS 22(3), 355–381 (2014)
Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.: SEPIA: Privacy-preserving aggregation of multi-domain network events and statistics. In: Usenix Security (2010)
Coull, S.E., Wright, C.V., Monrose, F., Collins, M.P., Reiter, M.K.: Playing devil’s advocate: inferring sensitive information from anonymized network traces. In: NDSS (2007)
CSRIC Working Group 7.: U.S. anti-bot code of conduct for Internet service providers: barriers and metrics considerations (2013)
Damiani, E., De Capitani di Vimercati, S., Paraboschi, S., Samarati, P.: P2P-based collaborative spam detection and filtering. In: P2P (2004)
De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: Pieprzyk, J., Sadeghi, A.-R., Manulis, M. (eds.) CANS 2012. LNCS, vol. 7712, pp. 218–231. Springer, Heidelberg (2012)
De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010)
De Cristofaro, E., Tsudik, G.: Experimenting with fast private set intersection. In: Katzenbeisser, S., Weippl, E., Camp, L.J., Volkamer, M., Reiter, M., Zhang, X. (eds.) Trust 2012. LNCS, vol. 7344, pp. 55–73. Springer, Heidelberg (2012)
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004)
Freudiger, J., Rane, S., Brito, A.E., Uzun, E.: Privacy preserving data quality assessment for high-fidelity data sharing. In: WISCS (2014)
Gusfield, D., Irving, R.W.: The Stable Marriage Problem: Structure and Algorithms. MIT Press, Cambridge (1989)
Hailpern, B.T., Malkin, P.K., Schloss, R.: Collaborative server processing of content and meta-information with application to virus checking in a server network, US Patent 6,275,937 (2001)
Huang, Y., Evans, D., Katz, J.: Private set intersection: are garbled circuits better than custom protocols? In: NDSS (2012)
Huang, Y., Evans, D., Katz, J., Malka, L.: Faster secure two-party computation using garbled circuits. In: Usenix Security (2011)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura
Katti, S., Krishnamurthy, B. Katabi, D.: Collaborating against common enemies. In: IMC (2005)
Kenneally, E., Claffy, K.: Dialing privacy and utility: a proposed data-sharing framework to advance internet research. IEEE Secur. Priv. 8(4), 31–39 (2010)
Lakkaraju, K., Slagell, A.: Evaluating the utility of anonymized network traces for intrusion detection. In: Securecomm (2008)
Lincoln, P., Porras, P., Shmatikov, V.: Privacy-preserving sharing and correction of security alerts. In: Usenix Security (2004)
Locasto, M.E., Parekh, J.J., Keromytis, A.D., Stolfo, S.J.: Towards collaborative security and P2P intrusion detection. In: Information Assurance Workshop (2005)
Oikonomou, G., Mirkovic, J., Reiher, P., Robinson, M.: A framework for a collaborative DDoS defense. In: ACSAC (2006)
Pinkas, B., Schneider, T., Zohner, M.: Faster private set intersection based on OT extension. In: Usenix Security (2014)
Porras, P., Shmatikov, V.: Large-scale collection and sanitization of network security data: risks and challenges. In: New Security Paradigms Workshop (NSPW) (2006)
Pouget, F., Dacier, M., Pham, V.H.: Vh: Leurre. com: on the advantages of deploying a large scale distributed honeypot platform. In: E-Crime and Computer Conference (2005)
Red Sky Alliance. http://redskyalliance.org/
SANS Technology Institute.: DShield Data. https://www.dshield.org/
Slagell, A., Yurcik, W.: Sharing computer network logs for security and privacy: a motivation for new methodologies of anonymization. In: Securecomm (2005)
Soldo, F., Le, A., Markopoulou, A.: Predictive blacklisting as an implicit recommendation system. In: INFOCOM (2010)
Song, C., Qu, Z., Blumm, N., Barabási, A.-L.: Limits of predictability in human mobility. Sci. 327, 1018–1021 (2010)
The White House.: Executive order promoting private sector cybersecurity information sharing (2015). http://1.usa.gov/1vISfBO
Worldwide observatory of malicious behaviors and attack threats (2013). http://www.wombat-project.eu/
Xu, J., Fan, J., Ammar, M.H., Moon, S.B.: Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme. In: ICNP (2002)
Yao, A.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science, FOCS, pp. 160–164 (1982)
Yegneswaran, V., Barford, P., Jha, S.: Global intrusion detection in the DOMINO overlay system. In: NDSS (2004)
Zhang, J., Porras, P.A., Ullrich, J.: Highly predictive blacklisting. In: Usenix Security (2008)
Acknowledgments
We wish to thank DShield.org and Johannes Ullrich for providing the dataset used in our experiments, as well as Ersin Uzun, Marshall Bern, Craig Saunders, and Anton Chuvakin for their useful comments and feedback. Work done in part while Emiliano De Cristofaro was with PARC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A. Additional Analysis of the DShield Dataset
A. Additional Analysis of the DShield Dataset
General Statistics. We present in Fig. 5(a) the histogram of the number of attacks per day, indicating about 30 M daily attacks. We observe a significant increase around day 50 to 100 M attacks. Careful analysis reveals that a series of IP addresses starts to aggressively attack around day 50, indicating a possible DoS attack initiation.
Figure 5(b) shows the number of unique targets and sources over time. Detailed analysis shows a stable number of sources and targets. This stability confirms that it should be possible to predict attackers’ tactics based on past observations. An analysis of attacked ports shows that top 10 attacked ports (with more than 10M hits) are Telnet, HTTP, SSH, DNS, FTP, BGP, Active Directory, and Netbios ports. This shows a clear trend towards misuse of popular web services.
In Fig. 6, we plot the CDF of the fraction of victims that contribute logs to DShield over two months, and observe that few victims contribute daily.
Predictability. Figure 7 shows the CDF of the Shannon entropy of the different log entry elements. Since entropy correlates with predictability (following Fano’s inequality [34]), it helps estimate our ability to predict a given IP address, port number or target appearing in the logs. To obtain this figure, we estimate the probability of each victim, source or port being attacked each day. For example, for each port i, we compute:
We compute the entropy for each day and then aggregate it overall using the CDF. We observe that ports numbers have the lower entropy distribution, indicating a small set of targeted ports: \(80\,\%\) of attacks target a set of \(2^7=128\) ports, indicating high predictability. We also observe that victims are more predictable than sources, as \(90\,\%\) of victims lie within a set of \(2^{12}=4096\) victims as compared to \(90\,\%\) of sources being in a list of \(2^{14}=16,384\) sources. Victims’ set is thus significantly smaller and more predictable than attackers’ set.
Intensity. Figure 8(a) shows the inter-arrival time of attacks in hours, and Fig. 8(b) shows the inter-arrival time of attacks in seconds. We observe that almost all attacks occur within 3-minute windows. IP addresses and / 24 subnetworks have similar behavior. In particular, Fig. 8(b) shows that in short time intervals, \(85\,\%\) of / 8 subnetworks have short attack inter-arrival time indicating the bursty attacks on such networks. Attackers target subnetworks for a short time and then disappear.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Freudiger, J., De Cristofaro, E., Brito, A.E. (2015). Controlled Data Sharing for Collaborative Predictive Blacklisting. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-20550-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20549-6
Online ISBN: 978-3-319-20550-2
eBook Packages: Computer ScienceComputer Science (R0)