Homeland Insecurity

Data Mining, Privacy, Disclosure Limitation, and the Hunt for Terrorists
  • Stephen E. Fienberg
Part of the Integrated Series In Information Systems book series (ISIS, volume 18)

Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. There have also been claims that prospective datamining could be used to find the “signature” of terrorist cells embedded in larger networks. We present an overview of why the public has concerns about such activities and describe some proposals for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literatures on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of “selective revelation” and their confidentiality implications.


Privacy Protection Homeland Security Terrorist Cell Record Pair National Security Agency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal R. Evfimievski A, and Srikant R (2003) Information sharing across private databases. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CAGoogle Scholar
  2. Bilenko M, Mooney R, Cohen WW, Ravikumar P, Fienberg, SE (2003) Adaptive name-matching in information integration. IEEE Intelligent Systems 18: 16-23CrossRefGoogle Scholar
  3. Bishop YMM, Fienberg SE, Holland PW (1975) Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge MA. Reprinted by Springer-Verlag, 2007Google Scholar
  4. Clarke R (1988) Information technology and dataveillance. Communications of the ACM 31:498-512CrossRefGoogle Scholar
  5. Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. Statistical Journal of the United Nations ECE 18: 363-371Google Scholar
  6. Dobra A, Fienberg SE (2003) Bounding entries in multi-way contingency tables given a set of marginal totals. In Y Haitovsky, HR Lerche, and Y Ritov, eds., Foundations of Statistical Inference: Proceedings of the Shoresh Conference 2000, Springer-Verlag, Berlin, 3-16Google Scholar
  7. Domingo-Ferrer JM, Mateo-Sanz JM, and S\'anchez del Castillo, RX (2000). Cryptographic techniques in statistical data protection. Proceedings of the Joint UN/ECE-Eurostat Work Session on Statistical Data Confidentiality, Office for Official Publications of the European Communities, Luxembourg, 159-166Google Scholar
  8. Domingo-Ferrer J, Torra V (2003) Statistical data protection in statistical microdata protection via advanced record linkage. Statistics and Computing 13: 343-354CrossRefGoogle Scholar
  9. Duncan GT (2001) Confidentiality and statistical disclosure limitation. International Encyclopedia of Social & Behavioral Sciences, Elsevier, Amsterdam, 2521-2525Google Scholar
  10. Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001) Disclosure limitation methods and information loss for tabular data. In P Doyle, J Lane, J Theeuwes, and L Zayatz, eds., Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier, Amsterdam, 135-166Google Scholar
  11. Duncan GT, Keller-McNulty SA, Stokes SL (2004) Database security and confidentiality: Examining disclosure risk vs. data utility through the R-U confidentiality map. Technical Report Number 142, National Institute of Statistical Sciences, March, 2004Google Scholar
  12. Duncan GT, Stokes SL (2004) Disclosure risk vs. data utility: The R-U confidentiality map as applied to topcoding. Chance,17(No. 3): 16-20Google Scholar
  13. Dwork C, Nissim K (2004) Privacy-preserving data mining in vertically partitioned databases. Proc. CRYPTO 2004, 24th International Conference on Cryptology, University of California, Santa BarbaraGoogle Scholar
  14. Fellegi IP, Sunter AB (1969) A theory for record linkage. Journal of the American Statistical Association, 64:1183-1210CrossRefGoogle Scholar
  15. Fienberg SE (2005a) Confidentiality and disclosure limitation. Encyclopedia of Social Measurement, Elsevier, Amsterdam, 463-469Google Scholar
  16. Fienberg SE (2005b)Homeland insecurity: Datamining, terrorism detection, and confidentiality. Bulletin of the International Statistical Institute, 55th Session: Sydney 2005Google Scholar
  17. Fienberg SE, Shmueli G (2005) Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Statistics in Medicine 24: 513-529CrossRefGoogle Scholar
  18. Fienberg SE, Slavkovic AB (2004) Making the release of confidential data from multi-way tables count. Chance 17(No. 3): 5-10Google Scholar
  19. Fienberg SE, Slavkovic AB (2005) Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11:155-180CrossRefGoogle Scholar
  20. Getoor L, Diehl CP (2005)Introduction: Special issue on link mining. SIGKDD Explorations, 7(2):76-83CrossRefGoogle Scholar
  21. Gopal R, Garfinkel R, Goes P (2002) Confidentiality via camouflage. Operations Research 50:501-516CrossRefGoogle Scholar
  22. ISAT-Information Science and Technology Study Group on Security and Privacy (chair: JD Tygar) (2002) Security With Privacy. December 13, 2002 BriefingGoogle Scholar
  23. Jaro MA (1995) Probabilistic linkage of large public health data files. Statistics in Medicine 14:491-498CrossRefGoogle Scholar
  24. Karr AF, Lin X, Sanil AP, Reiter, JP (2006) Secure statistical analysis of distributed databases. In D Olwell and AG Wilson, eds., Statistical Methods in Counterterrorism, Springer-Verlag, New York, in pressGoogle Scholar
  25. Krebs VE (2005) Mapping networks of terrorist cells. Connections 24(3): 43-52 Google Scholar
  26. Kreimer SF (2004) Watching the watchers: Surveillance, transparency, and political freedom in the war on terror. Journal of Constitutional Law 7: 133-181Google Scholar
  27. Larsen MD, Rubin DB (2001) Alternative automated record linkage using mixture models. Journal of the American Statistical Association 79: 32-41CrossRefGoogle Scholar
  28. Li Y, Tygar JD, Hellerstein JM (2005) Private matching. Chapter 3 in D. Lee, S. Shieh, and J.D. Tygar, eds., Computer Security in the 21st Century, Springer-Verlag, New York, 25-50CrossRefGoogle Scholar
  29. Lunt T (2003) Protecting privacy in terrorist tracking applications. Presentation to Department of Defense Technology and Privacy Advisory Committee, September 29, 2003
  30. Lunt T, Staddon J, Balfanz D, Durfee G, Uribe T, and others (2005) Protecting privacy in terrorist tracking applications. Powerpoint presentation.
  31. Muralidhar KR, Parsa K, Sarathy R (2001) An improved security requirement for data perturbation with implications for e-commerce. Decision Sciences 32: 683-698CrossRefGoogle Scholar
  32. Popp, R, Poindexter, J (2006) Countering terrorism through information and privacy protection technologies. IEEE Security & Privacy, 4 (6): 18-27CrossRefGoogle Scholar
  33. Privacy Office (2006) Report to the Public Concerning the Multistate Anti-Terrorism Information Exchange (MATRIX) Pilot Project. U. S. Department of Homeland Security, December, 2006Google Scholar
  34. Relyea HC, Seifert JW (2005).Information Sharing for Homeland Security: A Brief Overview. Congressional Research Service, Library of Congress (Updated January 10, 2005)Google Scholar
  35. Secure Flight Working Group (2005) Report of Secure Flight Working Group. Presented to the Transportation Security Administration, September 19, 2005Google Scholar
  36. Schneier, B (2006) We're giving up privacy and getting little in return, Minneapolis Star Tribune, May 31, 2006,
  37. Senator, TE (2005) Link mining applications: Progress and challenges. SIGKDD Explorations, 7(2): 76-83CrossRefGoogle Scholar
  38. Sweeney L (2005a) Privacy-preserving bio-terrorism surveillance. AAAI Spring Symposium, AI Technologies for Homeland SecurityGoogle Scholar
  39. Sweeney L (2005b) Privacy-preserving surveillance using selective revelation. Carnegie Mellon University, LIDAP Working Paper 15, February 2005. (PDF)Google Scholar
  40. Sweeney L (2005c) Privacy-enhanced linking. SIGKDD Explorations, 7(2): 72-75CrossRefGoogle Scholar
  41. Tygar JD (2003a) Privacy architectures. Presentation at Microsoft Research, June 18, 2003.
  42. Tygar JD (2003b) Privacy in sensor webs and distributed information systems. In M Okada, B Pierce, A Scedrov, H Tokuda, A Yonezawa, eds., Software Security, Springer-Verlag, New York, 84-95CrossRefGoogle Scholar
  43. U.S. Department of Defense Technology and Privacy Advisory Committee (TAPAC) (2004) Safeguarding Privacy in the Fight Against Terrorism.Google Scholar
  44. U.S. General Accounting Office (2004) Data Mining: Federal Efforts Cover A Wide Range of Uses. GAO-04-548, Report to the Ranking Minority Member, Subcommittee on Financial Management, the Budget, and International Security, Committee on Governmental Affairs, U.S. Senate, Washington, DCGoogle Scholar
  45. U.S. Government Accountability Office (2006a) Information Sharing: The Federal Government Needs to Establish Policies and Processes for Sharing Terrorism-Related and Sensitive but Unclassified Information. GAO-06-385 March 17, 2006. U.S. Government Printing Office, Washington, DCGoogle Scholar
  46. U.S. Government Accountability Office (2006b) Privacy: Preventing and Responding to Improper Disclosures of Personal Information. GAO-06-833T J une8,2006. U.S. Government Printing Office, Washington, DCGoogle Scholar
  47. U.S. Government Accountability Office (2007) Datamining: Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks. GAO-07-293 February 2007 U.S. Government Printing Office, Washington, DCGoogle Scholar
  48. Winkler WE (2002) Record Linkage and Bayesian Networks. Proceedings of the Section on Survey Research Methods, American Statistical Association, CD-ROMGoogle Scholar
  49. Winkler WE (2005) Data Quality in Data Warehouses. In J Wang, ed., Encyclopedia of Data Warehousing and Data Mining, Idea Group Publishing, Hershey, PAGoogle Scholar
  50. Yang Z, Zhong S, Wright RN (2005) Privacy-Enhancing k-anonymization of customer data. 24th ACM SIGMOD International Conference on Management of Data/Principles of Database Systems (PODS 2005)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Stephen E. Fienberg
    • 1
  1. 1.Department of Statistics, Machine Learning DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations