The Princeton Web Transparency and Accountability Project

  • Arvind NarayananEmail author
  • Dillon Reisman
Part of the Studies in Big Data book series (SBD, volume 32)


When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication. This chapter discusses the findings and experiences of the Princeton Web Transparency Project (, which continually monitors the web to uncover what user data companies collect, how they collect it, and what they do with it. We do this via a largely automated monthly “census” of the top 1 million websites, in effect “tracking the trackers”. Our tools and findings have proven useful to regulators and investigatory journalists, and have led to greater public awareness, the cessation of some privacy-infringing practices, and the creation of new consumer privacy tools. But the work raises many new questions. For example, should we hold websites accountable for the privacy breaches caused by third parties? The chapter concludes with a discussion of such tricky issues and makes recommendations for public policy and regulation of privacy.


Price Discrimination Federal Trade Commission Real Identity Word Embedding Online Tracking 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Numerous graduate and undergraduate students and collaborators have contributed to the WebTAP project and to the findings reported here. In particular, Steven Englehardt is the primary student investigator and the lead developer of the OpenWPM measurement tool. We are grateful to Brian Kernighan, Vincent Toubiana, and the anonymous reviewer for useful feedback on a draft.

WebTAP is supported by NSF grant CNS 1526353, a grant from the Data Transparency Lab, and by Amazon AWS Cloud Credits for Research.


  1. 1.
    Crevier, D.: AI: The Tumultuous History of the Search for Artificial Intelligence. Basic Books, New York (1993)Google Scholar
  2. 2.
    Engle Jr., RL, Flehinger, B.J.: Why expert systems for medical diagnosis are not being generally used: a valedictory opinion. Bull. N. Y. Acad. Med. 63 (2), 193 (1987)Google Scholar
  3. 3.
    Vance, A.: This tech bubble is different. (2011)
  4. 4.
    Angwin, J.: Machine bias: Risk assessments in criminal sentencing. ProPublica. (2016)
  5. 5.
    Levin, S.: A beauty contest was judged by AI and the robots didn’t like dark skin. (2016)
  6. 6.
    Solove, D.J.: Privacy and power: computer databases and metaphors for information privacy. Stanford Law Rev. 53, 1393–1462 (2001)CrossRefGoogle Scholar
  7. 7.
    Marthews, A., Tucker, C.: Government surveillance and internet search behavior. Available at SSRN 2412564 (2015)Google Scholar
  8. 8.
    Hannak, A., Soeller, G., Lazer, D., Mislove, A., Wilson, C.: Measuring price discrimination and steering on e-commerce web sites. In: Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 305–318. ACM, New York (2014)Google Scholar
  9. 9.
    Calo, R.: Digital market manipulation. George Washington Law Rev. 82, 995–1051 (2014)Google Scholar
  10. 10.
    Mayer, J.R., Mitchell, J.C.: Third-party web tracking: policy and technology. In: 2012 IEEE Symposium on Security and Privacy, pp. 413–427. IEEE, New York (2012)Google Scholar
  11. 11.
    Angwin, J.: The web’s new gold mine: your secrets. ProPublica (2010)
  12. 12.
    Lerner, A., Simpson, A.K., Kohno, T., Roesner, F.: Internet Jones and the raiders of the lost trackers: an archaeological study of web tracking from 1996 to 2016. In: 25th USENIX Security Symposium (USENIX Security 16) (2016)Google Scholar
  13. 13.
    Laperdrix, P., Rudametkin, W., Baudry, B.: Beauty and the beast: diverting modern web browsers to build unique browser fingerprints. In: 37th IEEE Symposium on Security and Privacy (S&P 2016) (2016)Google Scholar
  14. 14.
    Eckersley, P.: How unique is your web browser? In: International Symposium on Privacy Enhancing Technologies Symposium, pp. 1–18. Springer, Cambridge (2010)Google Scholar
  15. 15.
    Acar, G., Van Alsenoy, B., Piessens, F., Diaz, C., Preneel, B.: Facebook tracking through social plug-ins. Technical Report prepared for the Belgian Privacy Commission. (2015)
  16. 16.
    Starov, O., Gill, P., Nikiforakis, N.: Are you sure you want to contact us? quantifying the leakage of PII via website contact forms. In: Proceedings on Privacy Enhancing Technologies, vol. 2016(1), pp. 20–33 (2016)Google Scholar
  17. 17.
    Krishnamurthy, B., Naryshkin K, Wills C Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web, vol. 2, pp. 1–10 (2011)Google Scholar
  18. 18.
    Su, J., Shukla, A., Goel, S., Narayanan, A.: De-anonymizing web browsing data with social networks, Manuscript (2017)Google Scholar
  19. 19.
    Barocas, S., Nissenbaum, H.: Big data’s end run around procedural privacy protections. Commun. ACM 57 (11), 31–33 (2014)CrossRefGoogle Scholar
  20. 20.
    Shilton, K., Greene, D.: Because privacy: defining and legitimating privacy in IOS development. In: Conference 2016 Proceedings (2016)Google Scholar
  21. 21.
    Storey, G., Reisman, D., Mayer, J., Narayanan, A.: The future of ad blocking: analytical framework and new techniques, Manuscript (2016)Google Scholar
  22. 22.
    Narayanan, A.: Can Facebook really make ads unblockable? (2016)
  23. 23.
  24. 24.
    Reisman, D.: A peek at A/B testing in the wild. (2016)
  25. 25.
    Acar, G., Juarez, M., Nikiforakis, N., Diaz, C., Gürses, S., Piessens, F., Preneel, B.: Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1129–1140. ACM, New York (2013)Google Scholar
  26. 26.
    Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer & Communications Security (2016)Google Scholar
  27. 27.
    Selenium, H.Q.: Selenium browser automation FAQ. (2016)
  28. 28.
    Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A., Diaz, C.: (2014) The web never forgets. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS’14. doi:10.1145/2660267.2660347Google Scholar
  29. 29.
    Mowery, K., Shacham, H.: Pixel perfect: fingerprinting canvas in html5. In: Proceedings of W2SP (2012)Google Scholar
  30. 30.
    (Valve), V.V.: Fingerprintjs2 — modern & flexible browser fingerprinting library, a successor to the original fingerprintjs. (2016)
  31. 31.
    Olejnik, Ł., Acar, G., Castelluccia, C., Diaz, C.: The leaking battery. In: International Workshop on Data Privacy Management, pp. 254–263. Springer, New York (2015)Google Scholar
  32. 32.
    Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16 (2016)Google Scholar
  33. 33.
    Doty, N.: Mitigating browser fingerprinting in web specifications. (2016)
  34. 34.
    Soltani, A., Peterson, A., Gellman, B.: NSA uses Google cookies to pinpoint targets for hacking. (2013)
  35. 35.
    Englehardt, S., Reisman, D., Eubank, C., Zimmerman, P., Mayer, J., Narayanan, A., Felten, E.W.: Cookies that give you away. In: Proceedings of the 24th International Conference on World Wide Web - WWW’15. doi:10.1145/2736277.2741679 (2015)Google Scholar
  36. 36.
    Angwin, J.: Google has quietly dropped ban on personally identifiable web tracking. ProPublica (2016)
  37. 37.
    Reitman, R.: What actually changed in Google’s privacy policy. Electronic Frontier Foundation (2012)
  38. 38.
    Simonite, T.: Facebook’s like buttons will soon track your web browsing to target ads. MIT Technology Review (2015)
  39. 39.
    Federal Trade Commission: Cross-device tracking. (2015)
  40. 40.
    Maggi, F., Mavroudis, V.: Talking behind your back attacks & countermeasures of ultrasonic cross-device tracking., blackhat (2016)
  41. 41.
    Angwin, J.: Why online tracking is getting creepier. ProPublica (2014)
  42. 42.
    Vallina-Rodriguez, N., Sundaresan, S., Kreibich, C., Paxson, V.: Header enrichment or ISP enrichment? Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization - HotMiddlebox’15 (2015). doi:10.1145/2785989.2786002Google Scholar
  43. 43.
    Disconnect: Disconnect blocks new tracking device that makes your computer draw a unique image. (2016)
  44. 44.
    Foundation, E.F.: Privacy badger. (2016)
  45. 45.
    Thaler, R.H., Sunstein, C.R.: Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, New Haven (2008)Google Scholar
  46. 46.
    Fleishman, G.: Hands-on with content blocking safari extensions in iOS 9. Macworld (2015)
  47. 47.
  48. 48.
    Lynch, B.: Do not track in the windows 8 setup experience - microsoft on the issues. Microsoft on the Issues. (2012)
  49. 49.
    Hern, A.: Firefox disables loophole that allows sites to track users via battery status. The Guardian. (2016)
  50. 50.
    Mozilla: Tracking protection in private browsing. (2015)
  51. 51.
    Mozilla: Security/contextual identity project/containers. (2016)
  52. 52.
    Federal Trade Commission: Google will pay $22.5 million to settle FTC charges it misrepresented privacy assurances to users of apple’s safari internet browser. (2012)
  53. 53.
    Federal Trade Commission: Children’s online privacy protection rule (“coppa”). (2016)
  54. 54.
    New York State Office of the Attorney General: A.G. schneiderman announces results of “operation child tracker,” ending illegal online tracking of children at some of nation’s most popular kids’ websites. (2016)
  55. 55.
    American Civil Liberties Union: Sandvig v. Lynch. (2016)
  56. 56.
    Eubank, C., Melara, M., Perez-Botero, D., Narayanan, A.: Shining the floodlights on mobile web tracking – a privacy survey. (2013)
  57. 57.
    CMU CHIMPS Lab: Privacy grade: grading the privacy of smartphone apps. (2015)
  58. 58.
    Vanrykel, E., Acar, G., Herrmann, M., Diaz, C.: Leaky birds: exploiting mobile application traffic for surveillance. In: Financial Cryptography and Data Security 2016 (2016)Google Scholar
  59. 59.
    Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., Geambasu, R.: Xray: enhancing the web’s transparency with differential correlation. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 49–64 (2014)Google Scholar
  60. 60.
    Lecuyer, M., Spahn, R., Spiliopolous, Y., Chaintreau, A., Geambasu, R., Hsu, D.: Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 554–566. ACM, New York (2015)Google Scholar
  61. 61.
    Tschantz, M.C., Datta, A., Datta, A., Wing, J.M.: A methodology for information flow experiments. In: 2015 IEEE 28th Computer Security Foundations Symposium, pp. 554–568. IEEE, New York (2015)Google Scholar
  62. 62.
    Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence. In: Proceedings of 37th IEEE Symposium on Security and Privacy (2016)Google Scholar
  63. 63.
    Chen, L., Mislove, A., Wilson, C.: Peeking beneath the hood of Uber. In: Proceedings of the 2015 ACM Conference on Internet Measurement Conference, pp. 495–508. ACM, New York (2015)Google Scholar
  64. 64.
    Valentino-Devries, J., Singer-Vine, J., Soltani, A.: Websites vary prices, deals based on users’ information. Wall Street J. (2012).
  65. 65.
    Guide ASU: Ui/application exerciser monkey. (2016)
  66. 66.
    Rastogi, V., Chen, Y., Enck, W.: AppsPlayground: automatic security analysis of smartphone applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 209–220. ACM, New York (2013)Google Scholar
  67. 67.
    Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. 32 (2), 5 (2014)CrossRefGoogle Scholar
  68. 68.
    Ren, J., Rao, A., Lindorfer, M., Legout, A., Choffnes, D.: Recon: revealing and controlling privacy leaks in mobile network traffic (2015). arXiv preprint arXiv:150700255Google Scholar
  69. 69.
    Razaghpanah, A., Vallina-Rodriguez, N., Sundaresan, S., Kreibich, C., Gill, P., Allman, M., Paxson, V.: Haystack: in situ mobile traffic analysis in user space (2015). arXiv preprint arXiv:151001419Google Scholar
  70. 70.
    Sweeney, L.: Discrimination in online ad delivery. Queue 11 (3), 10 (2013)CrossRefGoogle Scholar
  71. 71.
    Caliskan-Islam, A., Bryson, J., Narayanan, A.: Semantics derived automatically from language corpora necessarily contain human biases (2016). Arxiv

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Princeton UniversityPrincetonUSA

Personalised recommendations