Data Distortion Methods and Metrics in a Terrorist Analysis System

  • Shuting Xu
  • Jun Zhang
Part of the Integrated Series In Information Systems book series (ISIS, volume 18)

Preserving privacy is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserving privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. A sparsified Singular Value Decomposition (SVD) method for data distortion is introduced in this chapter. A few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection are also explained in detail. The experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.


Support Vector Machine Singular Value Decomposition Privacy Protection Data Mining Technique Real World Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, CaliforniaGoogle Scholar
  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TexasGoogle Scholar
  3. Agrawal R, Evfimievski A, Srikant R (2003) Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, pp 86 - 97Google Scholar
  4. Berry MW (1992) Large scale singular value decompositions. Int J Supercomput Applic High Perf Comput, 6:13 - 49Google Scholar
  5. Berry MW, Drmac Z, Jessup ER (1999) Matrix, vector space, and information retrieval. SIAM Rev, 41:335 - 362.CrossRefGoogle Scholar
  6. Burges C (1998) A Tutorial on Support Vector Machine for Pattern Recognition. Kluwer Academic PublishersGoogle Scholar
  7. Campbell C (2002) Kernel methods: A survey of current techniques. Neurocomputing, 48:63 - 84.CrossRefGoogle Scholar
  8. Datta S, Kargupta H, Sivakumar K (2003) Homeland defense, privacy-sensitive data mining, and random value distortion. In: Proceedings of the 2003 Workshop on Data Mining for Counter Terrorism and Security, San Francisco, CAGoogle Scholar
  9. Deerwester S, Dumais S, Furnas G, Landauer T, Harsgman R (1990) Indexing by latent semantic analysis. J Amer Soc Infor Sci, 41:391 - 407CrossRefGoogle Scholar
  10. Dempsey JX, Rosenzweig P (2004) Technologies that can protect privacy as information is shared to combat terrorism. Legal Memorandum #11, The Heritage Foundation, May 26, 2004. Available at
  11. Eckart C, Young G (1936) The approximation of one matrix by another of low rank, Psychometrika, 1(1936): 211-218CrossRefGoogle Scholar
  12. Estvill-Castro V, Brankovic L, Dowe DL (1999) Privacy in data mining. Australian Computer Society, NSW Branch, Australia. Available at
  13. Frankes W, Baeza-Yates R (1992) Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  14. Gao J, Zhang J (2003) Sparsification strategies in latent semantic indexing. In: Berry MW, Pottenger WM (eds), Proceedings of the 2003 Text Mining Workshop, San Francisco, CA, pp. 93 - 103Google Scholar
  15. Gao J, Zhang J (2005) Clustered SVD strategies in latent semantic indexing. Information Processing and Management, 41(5):1051--1063CrossRefGoogle Scholar
  16. Gilburd B, Schuster A, Wolff R (2004) K-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WAGoogle Scholar
  17. Golub GH, Loan CF van (1996) Matrix Computations, John Hopkins Univ, 3rd ednGoogle Scholar
  18. Joachims T (1999) Making large-scale SVM learning practical, Schölkopf B, Burges C, Smola A (eds), Advances in Kernel Methods - Support Vector Learning, MIT-PressGoogle Scholar
  19. Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Transactions on Database Systems, 10:395--411CrossRefGoogle Scholar
  20. Li Y, Gong S, Liddell H (2000) Support vector regression and classification based multiview face detection and recognition. In: Proc of the IEEE International Conference on Automatic Face and Gesture Recognition (FGR'00), Grenoble, FranceGoogle Scholar
  21. Mirsky L (1960) Symmetric gauge functions and unitarily invariant norms, Quart. J Math, 11:50-59CrossRefGoogle Scholar
  22. Skillicorn DB (2003) Clusters within clusters: SVD and counterterrorism. In: Proceedings of 2003 Workshop on Data Mining for Counter Terrorism and Security, San Francisco, CAGoogle Scholar
  23. Skillicorn DB (2004) Social network analysis via matrix decompositions: applications to al Qaeda. Technical Report, School of Computing, Queen's University, CanadaGoogle Scholar
  24. Skillicorn DB, Vats N (2004) Novel information discovery for intelligence and counterterrorism. Technical Report 2004-488, School of Computing, Queen's University, CanadaGoogle Scholar
  25. Sun A, Naing M, Lim EP, Lam W (2003) Using support vector machines for terrorism information extraction. Lecture Notes in Computer Science, Vol. 2665, pp. 1 - 12CrossRefGoogle Scholar
  26. Sweeney L (2002) K-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:557 - 570CrossRefGoogle Scholar
  27. Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci & Tech Law Rev, 5:1 - 83Google Scholar
  28. Tether T (2003) Statement before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census, Committee on Government Reform, U.S. House of Representatives, Available at:\_hr/050603tether.html
  29. Vapnik VN (1998) Statistical Learning Theory. John Wiley & Sons, New YorkGoogle Scholar
  30. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD, 33:50 - 57CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Shuting Xu
    • 1
  • Jun Zhang
    • 2
  1. 1.Department of Computer Information SystemsVirginia State UniversityPetersburgUSA
  2. 2.Department of Computer ScienceUniversity of KentuckyLexingtonUSA

Personalised recommendations