The Princeton Web Transparency and Accountability Project
- 7 Citations
- 2.1k Downloads
Abstract
When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication. This chapter discusses the findings and experiences of the Princeton Web Transparency Project (https://webtap.princeton.edu/), which continually monitors the web to uncover what user data companies collect, how they collect it, and what they do with it. We do this via a largely automated monthly “census” of the top 1 million websites, in effect “tracking the trackers”. Our tools and findings have proven useful to regulators and investigatory journalists, and have led to greater public awareness, the cessation of some privacy-infringing practices, and the creation of new consumer privacy tools. But the work raises many new questions. For example, should we hold websites accountable for the privacy breaches caused by third parties? The chapter concludes with a discussion of such tricky issues and makes recommendations for public policy and regulation of privacy.
Keywords
Price Discrimination Federal Trade Commission Real Identity Word Embedding Online TrackingNotes
Acknowledgements
Numerous graduate and undergraduate students and collaborators have contributed to the WebTAP project and to the findings reported here. In particular, Steven Englehardt is the primary student investigator and the lead developer of the OpenWPM measurement tool. We are grateful to Brian Kernighan, Vincent Toubiana, and the anonymous reviewer for useful feedback on a draft.
WebTAP is supported by NSF grant CNS 1526353, a grant from the Data Transparency Lab, and by Amazon AWS Cloud Credits for Research.
References
- 1.Crevier, D.: AI: The Tumultuous History of the Search for Artificial Intelligence. Basic Books, New York (1993)Google Scholar
- 2.Engle Jr., RL, Flehinger, B.J.: Why expert systems for medical diagnosis are not being generally used: a valedictory opinion. Bull. N. Y. Acad. Med. 63 (2), 193 (1987)Google Scholar
- 3.Vance, A.: This tech bubble is different. http://www.bloomberg.com/ (2011)
- 4.Angwin, J.: Machine bias: Risk assessments in criminal sentencing. ProPublica. https://www.propublica.org/ (2016)
- 5.Levin, S.: A beauty contest was judged by AI and the robots didn’t like dark skin. https://www.theguardian.com/ (2016)
- 6.Solove, D.J.: Privacy and power: computer databases and metaphors for information privacy. Stanford Law Rev. 53, 1393–1462 (2001)CrossRefGoogle Scholar
- 7.Marthews, A., Tucker, C.: Government surveillance and internet search behavior. Available at SSRN 2412564 (2015)Google Scholar
- 8.Hannak, A., Soeller, G., Lazer, D., Mislove, A., Wilson, C.: Measuring price discrimination and steering on e-commerce web sites. In: Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 305–318. ACM, New York (2014)Google Scholar
- 9.Calo, R.: Digital market manipulation. George Washington Law Rev. 82, 995–1051 (2014)Google Scholar
- 10.Mayer, J.R., Mitchell, J.C.: Third-party web tracking: policy and technology. In: 2012 IEEE Symposium on Security and Privacy, pp. 413–427. IEEE, New York (2012)Google Scholar
- 11.Angwin, J.: The web’s new gold mine: your secrets. ProPublica http://www.wsj.com/ (2010)
- 12.Lerner, A., Simpson, A.K., Kohno, T., Roesner, F.: Internet Jones and the raiders of the lost trackers: an archaeological study of web tracking from 1996 to 2016. In: 25th USENIX Security Symposium (USENIX Security 16) (2016)Google Scholar
- 13.Laperdrix, P., Rudametkin, W., Baudry, B.: Beauty and the beast: diverting modern web browsers to build unique browser fingerprints. In: 37th IEEE Symposium on Security and Privacy (S&P 2016) (2016)Google Scholar
- 14.Eckersley, P.: How unique is your web browser? In: International Symposium on Privacy Enhancing Technologies Symposium, pp. 1–18. Springer, Cambridge (2010)Google Scholar
- 15.Acar, G., Van Alsenoy, B., Piessens, F., Diaz, C., Preneel, B.: Facebook tracking through social plug-ins. Technical Report prepared for the Belgian Privacy Commission. https://securehomes.esat.kuleuven.be/~gacar/fb_tracking/fb_plugins.pdf (2015)
- 16.Starov, O., Gill, P., Nikiforakis, N.: Are you sure you want to contact us? quantifying the leakage of PII via website contact forms. In: Proceedings on Privacy Enhancing Technologies, vol. 2016(1), pp. 20–33 (2016)Google Scholar
- 17.Krishnamurthy, B., Naryshkin K, Wills C Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web, vol. 2, pp. 1–10 (2011)Google Scholar
- 18.Su, J., Shukla, A., Goel, S., Narayanan, A.: De-anonymizing web browsing data with social networks, Manuscript (2017)Google Scholar
- 19.Barocas, S., Nissenbaum, H.: Big data’s end run around procedural privacy protections. Commun. ACM 57 (11), 31–33 (2014)CrossRefGoogle Scholar
- 20.Shilton, K., Greene, D.: Because privacy: defining and legitimating privacy in IOS development. In: Conference 2016 Proceedings (2016)Google Scholar
- 21.Storey, G., Reisman, D., Mayer, J., Narayanan, A.: The future of ad blocking: analytical framework and new techniques, Manuscript (2016)Google Scholar
- 22.Narayanan, A.: Can Facebook really make ads unblockable? https://freedom-to-tinker.com/ (2016)
- 23.Storey, G.: Facebook ad highlighter. https://chrome.google.com/webstore/detail/facebook-ad-highlighter/mcdgjlkefibpdnepeljmlfkbbbpkoamf?hl=en (2016)
- 24.Reisman, D.: A peek at A/B testing in the wild. https://freedom-to-tinker.com/ (2016)
- 25.Acar, G., Juarez, M., Nikiforakis, N., Diaz, C., Gürses, S., Piessens, F., Preneel, B.: Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1129–1140. ACM, New York (2013)Google Scholar
- 26.Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer & Communications Security (2016)Google Scholar
- 27.Selenium, H.Q.: Selenium browser automation FAQ. https://code.google.com/p/selenium/wiki/FrequentlyAskedQuestions (2016)
- 28.Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A., Diaz, C.: (2014) The web never forgets. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS’14. doi:10.1145/2660267.2660347Google Scholar
- 29.Mowery, K., Shacham, H.: Pixel perfect: fingerprinting canvas in html5. In: Proceedings of W2SP (2012)Google Scholar
- 30.(Valve), V.V.: Fingerprintjs2 — modern & flexible browser fingerprinting library, a successor to the original fingerprintjs. https://github.com/Valve/fingerprintjs2 (2016)
- 31.Olejnik, Ł., Acar, G., Castelluccia, C., Diaz, C.: The leaking battery. In: International Workshop on Data Privacy Management, pp. 254–263. Springer, New York (2015)Google Scholar
- 32.Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16 (2016)Google Scholar
- 33.Doty, N.: Mitigating browser fingerprinting in web specifications. https://w3c.github.io/fingerprinting-guidance/ (2016)
- 34.Soltani, A., Peterson, A., Gellman, B.: NSA uses Google cookies to pinpoint targets for hacking. https://www.washingtonpost.com/ (2013)
- 35.Englehardt, S., Reisman, D., Eubank, C., Zimmerman, P., Mayer, J., Narayanan, A., Felten, E.W.: Cookies that give you away. In: Proceedings of the 24th International Conference on World Wide Web - WWW’15. doi:10.1145/2736277.2741679 (2015)Google Scholar
- 36.Angwin, J.: Google has quietly dropped ban on personally identifiable web tracking. ProPublica https://www.propublica.org (2016)
- 37.Reitman, R.: What actually changed in Google’s privacy policy. Electronic Frontier Foundation https://www.eff.org (2012)
- 38.Simonite, T.: Facebook’s like buttons will soon track your web browsing to target ads. MIT Technology Review https://www.technologyreview.com/ (2015)
- 39.Federal Trade Commission: Cross-device tracking. https://www.ftc.gov/news-events/events-calendar/2015/11/cross-device-tracking (2015)
- 40.Maggi, F., Mavroudis, V.: Talking behind your back attacks & countermeasures of ultrasonic cross-device tracking. https://www.blackhat.com/docs/eu-16/materials/eu-16-Mavroudis-Talking-Behind-Your-Back-Attacks-And-Countermeasures-Of-Ultrasonic-Cross-Device-Tracking.pdf, blackhat (2016)
- 41.Angwin, J.: Why online tracking is getting creepier. ProPublica https://www.propublica.org/ (2014)
- 42.Vallina-Rodriguez, N., Sundaresan, S., Kreibich, C., Paxson, V.: Header enrichment or ISP enrichment? Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization - HotMiddlebox’15 (2015). doi:10.1145/2785989.2786002Google Scholar
- 43.Disconnect: Disconnect blocks new tracking device that makes your computer draw a unique image. https://blog.disconnect.me/disconnect-blocks-new-tracking-device-that-makes-your-computer-draw-a-unique-image/ (2016)
- 44.Foundation, E.F.: Privacy badger. https://www.eff.org/privacybadger (2016)
- 45.Thaler, R.H., Sunstein, C.R.: Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, New Haven (2008)Google Scholar
- 46.Fleishman, G.: Hands-on with content blocking safari extensions in iOS 9. Macworld http://www.macworld.com/ (2015)
- 47.Chromium-Blink: Owp storage team sync. https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/CT_eDVIdJv0 (2016)
- 48.Lynch, B.: Do not track in the windows 8 setup experience - microsoft on the issues. Microsoft on the Issues. https://blogs.microsoft.com/ (2012)
- 49.Hern, A.: Firefox disables loophole that allows sites to track users via battery status. The Guardian. https://www.theguardian.com/ (2016)
- 50.Mozilla: Tracking protection in private browsing. https://support.mozilla.org/en-US/kb/tracking-protection-pbm (2015)
- 51.Mozilla: Security/contextual identity project/containers. https://wiki.mozilla.org/Security/Contextual_Identity_Project/Containers (2016)
- 52.Federal Trade Commission: Google will pay $22.5 million to settle FTC charges it misrepresented privacy assurances to users of apple’s safari internet browser. https://www.ftc.gov/news-events/press-releases/2012/08/google-will-pay-225-million-settle-ftc-charges-it-misrepresented (2012)
- 53.Federal Trade Commission: Children’s online privacy protection rule (“coppa”). https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule (2016)
- 54.New York State Office of the Attorney General: A.G. schneiderman announces results of “operation child tracker,” ending illegal online tracking of children at some of nation’s most popular kids’ websites. http://www.ag.ny.gov/press-release/ag-schneiderman-announces-results-operation-child-tracker-ending-illegal-online (2016)
- 55.American Civil Liberties Union: Sandvig v. Lynch. https://www.aclu.org/legal-document/sandvig-v-lynch-complaint-0 (2016)
- 56.Eubank, C., Melara, M., Perez-Botero, D., Narayanan, A.: Shining the floodlights on mobile web tracking – a privacy survey. http://www.w2spconf.com/2013/papers/s2p2.pdf (2013)
- 57.CMU CHIMPS Lab: Privacy grade: grading the privacy of smartphone apps. http://www.privacygrade.org (2015)
- 58.Vanrykel, E., Acar, G., Herrmann, M., Diaz, C.: Leaky birds: exploiting mobile application traffic for surveillance. In: Financial Cryptography and Data Security 2016 (2016)Google Scholar
- 59.Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., Geambasu, R.: Xray: enhancing the web’s transparency with differential correlation. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 49–64 (2014)Google Scholar
- 60.Lecuyer, M., Spahn, R., Spiliopolous, Y., Chaintreau, A., Geambasu, R., Hsu, D.: Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 554–566. ACM, New York (2015)Google Scholar
- 61.Tschantz, M.C., Datta, A., Datta, A., Wing, J.M.: A methodology for information flow experiments. In: 2015 IEEE 28th Computer Security Foundations Symposium, pp. 554–568. IEEE, New York (2015)Google Scholar
- 62.Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence. In: Proceedings of 37th IEEE Symposium on Security and Privacy (2016)Google Scholar
- 63.Chen, L., Mislove, A., Wilson, C.: Peeking beneath the hood of Uber. In: Proceedings of the 2015 ACM Conference on Internet Measurement Conference, pp. 495–508. ACM, New York (2015)Google Scholar
- 64.Valentino-Devries, J., Singer-Vine, J., Soltani, A.: Websites vary prices, deals based on users’ information. Wall Street J. (2012). https://www.wsj.com/articles/SB10001424127887323777204578189391813881534
- 65.Guide ASU: Ui/application exerciser monkey. https://developer.android.com/studio/test/monkey.html (2016)
- 66.Rastogi, V., Chen, Y., Enck, W.: AppsPlayground: automatic security analysis of smartphone applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 209–220. ACM, New York (2013)Google Scholar
- 67.Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. 32 (2), 5 (2014)CrossRefGoogle Scholar
- 68.Ren, J., Rao, A., Lindorfer, M., Legout, A., Choffnes, D.: Recon: revealing and controlling privacy leaks in mobile network traffic (2015). arXiv preprint arXiv:150700255Google Scholar
- 69.Razaghpanah, A., Vallina-Rodriguez, N., Sundaresan, S., Kreibich, C., Gill, P., Allman, M., Paxson, V.: Haystack: in situ mobile traffic analysis in user space (2015). arXiv preprint arXiv:151001419Google Scholar
- 70.Sweeney, L.: Discrimination in online ad delivery. Queue 11 (3), 10 (2013)CrossRefGoogle Scholar
- 71.Caliskan-Islam, A., Bryson, J., Narayanan, A.: Semantics derived automatically from language corpora necessarily contain human biases (2016). Arxiv https://arxiv.org/abs/1608.07187