Abstract
Traditionally, service providers, who want to track the activities of Internet users, rely on explicit tracking techniques like HTTP cookies. From a privacy perspective behavior-based tracking is even more dangerous, because it allows service providers to track users passively, i. e., without cookies. In this case multiple sessions of a user are linked by exploiting characteristic patterns mined from network traffic.
In this paper we study the feasibility of behavior-based tracking in a real-world setting, which is unknown so far. In principle, behavior-based tracking can be carried out by any attacker that can observe the activities of users on the Internet. We design and implement a behavior-based tracking technique that consists of a Naive Bayes classifier supported by a cosine similarity decision engine. We evaluate our technique using a large-scale dataset that contains all queries received by a DNS resolver that is used by more than 2100 concurrent users on average per day. Our technique is able to correctly link 88.2 % of the surfing sessions on a day-to-day basis. We also discuss various countermeasures that reduce the effectiveness of our technique.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adamic, L., Huberman, B.: Zipf’s Law and the Internet. Glottometrics 3(1), 143–150 (2002)
Ayenson, M., Wambach, D.J., Soltani, A., Good, N., Hoofnagle, C.J.: Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning (2011), http://ssrn.com/abstract=1898390
Beesley, K.R.: Language identifier: A computer program for automatic natural-language identification of on-line text. In: Language at Crossroads: Proceedings of the 29th Annual Conference of the American Translators Association, pp. 12–16 (1988)
Berthold, O., Federrath, H., Köpsell, S.: Web MIXes: A System for Anonymous and Unobservable Internet Access. In: Federrath, H. (ed.) Anonymity 2000. LNCS, vol. 2009, pp. 115–129. Springer, Heidelberg (2001)
Castillo-Perez, S., García-Alfaro, J.: Evaluation of Two Privacy–Preserving Protocols for the DNS. In: Proceedings of the Sixth International Conference on Information Technology: New Generations, Washington, DC, USA, pp. 411–416 (2009)
Cavnar, W.B., Trenkle, J.M.: N-Gram-Based Text Categorization. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)
Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private Information Retrieval. J. ACM 45(6), 965–981 (1998)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Damashek, M.: Gauging Similarity with n-Grams: Language-Independent Categorization of Text. Science 267(5199), 843–848 (1995)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Dingledine, R., Mathewson, N., Syverson, P.F.: Tor: The Second–Generation Onion Router. In: Proceedings of the 13th USENIX Security Symposium, pp. 303–320 (2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Herrmann, D., Gerber, C., Banse, C., Federrath, H.: Analyzing Characteristic Host Access Patterns for Re-identification of Web User Sessions. In: Järvinen, K. (ed.) NordSec 2010. LNCS, vol. 7127, pp. 136–154. Springer, Heidelberg (2012)
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1143. Morgan Kaufmann (1995)
Kumpošt, M., Matyáš, V.: User Profiling and Re-identification: Case of University-Wide Network Analysis. In: Fischer-Hübner, S., Lambrinoudakis, C., Pernul, G. (eds.) TrustBus 2009. LNCS, vol. 5695, pp. 1–10. Springer, Heidelberg (2009)
Kushilevitz, E., Ostrovsky, R.: Replication is Not Needed: Single Database, Computationally-Private Information Retrieval. In: Proceedings of the 38th annual IEEE Symposium on Foundations of Computer Science, pp. 364–373. IEEE Computer Society (1997)
Lu, Y., Tsudik, G.: Towards Plugging Privacy Leaks in the Domain Name System. In: Proceedings of the Tenth International Conference on Peer–to–Peer Computing (P2P), pp. 1–10. IEEE (2010)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Padmanabhan, B., Yang, Y.: Clickprints on the Web: Are there signatures in Web Browsing Data? (October 2006), http://knowledge.wharton.upenn.edu/papers/1323.pdf
Raghavan, B., Kohno, T., Snoeren, A.C., Wetherall, D.: Enlisting ISPs to Improve Online Privacy: IP Address Mixing by Default. In: Goldberg, I., Atallah, M.J. (eds.) PETS 2009. LNCS, vol. 5672, pp. 143–163. Springer, Heidelberg (2009)
Rieck, K., Laskov, P.: Language Models for Detection of Unknown Attacks in Network Traffic. Journal in Computer Virology 2(4), 243–256 (2007)
White, T.: Hadoop – The Definitive Guide: Storage and Analysis at Internet Scale, 2nd edn. O’Reilly (2011)
Witten, I.H., Frank, E.: Data Mining. Practical Machine Learning Tools and Techniques. Elsevier, San Francisco (2005)
Xie, Y., Yu, F., Abadi, M.: De-anonymizing the internet using unreliable IDs. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, pp. 75–86. ACM, New York (2009)
Xie, Y., Yu, F., Achan, K., Gillum, E., Goldszmidt, M., Wobber, T.: How dynamic are IP addresses? In: Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2007), pp. 301–312. ACM, New York (2007)
Yang, Y.: Web user behavioral profiling for user identification. Decision Support Systems 49, 261–271 (2010)
Yang, Y., Padmanabhan, B.: Toward user patterns for online security: Observation time and online user identification. Decision Support Systems 48, 548–558 (2008)
Zhao, F., Hori, Y., Sakurai, K.: Analysis of Existing Privacy–Preserving Protocols in Domain Name System. IEICE Transactions 93-D(5), 1031–1043 (2010)
Zipf, G.K.: The psycho-biology of language. An introduction to dynamic philology, 2nd edn. M.I.T. Press, Cambridge (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Banse, C., Herrmann, D., Federrath, H. (2012). Tracking Users on the Internet with Behavioral Patterns: Evaluation of Its Practical Feasibility. In: Gritzalis, D., Furnell, S., Theoharidou, M. (eds) Information Security and Privacy Research. SEC 2012. IFIP Advances in Information and Communication Technology, vol 376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30436-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-30436-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30435-4
Online ISBN: 978-3-642-30436-1
eBook Packages: Computer ScienceComputer Science (R0)