Advertisement

Interval Approach to Preserving Privacy in Statistical Databases: Related Challenges and Algorithms of Computational Statistics

  • Luc Longpré
  • Gang Xiang
  • Vladik Kreinovich
  • Eric Freudenthal
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1)

Abstract

In many practical situations, it is important to store large amounts of data and to be able to statistically process the data. A large part of the data is confidential, so while we welcome statistical data processing, we do not want to reveal sensitive individual data. If we allow researchers to ask all kinds of statistical queries, this can lead to violation of people’s privacy. A sure-proof way to avoid these privacy violations is to store ranges of values (e.g., between 40 and 50 for age) instead of the actual values. This idea solves the privacy problem, but it leads to a computational challenge: traditional statistical algorithms need exact data, but now we only know data with interval uncertainty. In this paper, we describe new algorithms designed for processing such interval data.

Keywords

privacy statistical databases interval uncertainty computational statistics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cowell, F.A.: Grouping bounds for inequality measures under alternative informational assumptions. J. of Econometrics 48, 1–14 (1991)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Dalenius, T.: Finding a needle in a haystack — or identifying anonymous census record. Journal of Official Statistics 2(2), 329–336 (1986)Google Scholar
  3. 3.
    Dantsin, E., Kreinovich, V., Wolpert, A., Xiang, G.: Population variance under interval uncertainty: a new algorithm. Reliable Computing 12(4), 273–280 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Denning, D.: Cryptography and Data Security. Addison-Wesley, Reading, MA (1982)zbMATHGoogle Scholar
  5. 5.
    Duncan, G., Lambert, D.: The risk of disclosure for microdata. In: Proc. of the Bureau of the Census Third Annual Research Conference, Bureau of the Census, Washington, DC, pp. 263–274 (1987)Google Scholar
  6. 6.
    Duncan, G., Mukherjee, S.: Microdata disclosure limitation in statistical databases: query size and random sample query control In: Prof. 1991 IEEE Symposium on Research in Security and Privacy, Oakland, CA, May 20–22, 1991 (1991)Google Scholar
  7. 7.
    Fellegi, I.: On the question of statistical confidentiality. Journal of the American Statistical Association, 7–18 (1972)Google Scholar
  8. 8.
    Ferson, S., Ginzburg, L., Kreinovich, V., Longpré, L., Aviles, M.: Computing variance for interval data is NP-hard. ACM SIGACT News 33(2), 108–118 (2002)CrossRefGoogle Scholar
  9. 9.
    Jaulin, L., Kieffer, M., Didrit, O., Walter, E.: Applied Interval Analysis, Springer-Verlag, London (2001)zbMATHGoogle Scholar
  10. 10.
    Kim, J.: A method for limiting disclosure of microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 370–374 (1986)Google Scholar
  11. 11.
    Kirkendall, N., et al.: Report on Statistical Disclosure Limitations Methodology, Office of Management and Budget, Washington, DC, Statistical Policy Working Paper No. 22 (1994)Google Scholar
  12. 12.
    Kreinovich, V., Longpré, L., Starks, S.A., Xiang, G., Beck, J., Kandathi, R., Nayak, A., Ferson, S., Hajagos, J.: Interval versions of statistical techniques, with applications to environmental analysis, bioinformatics, and privacy in statistical databases. Journal of Computational and Applied Mathematics 199(2), 418–423 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Kreinovich, V., Xiang, G., Starks, S.A., Longpré, L., Ceberio, M., Araiza, R., Beck, J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.: Towards combining probabilistic and interval uncertainty in engineering calculations. Reliable Computing 12(6), 471–501 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Langewisch, A.T., Choobineh, F.F.: Mean and variance bounds and propagation for ill-specified random variables. IEEE Trans. SMC 34(4), 494–506 (2004)Google Scholar
  15. 15.
    Morgenstern, M.: Security and inference in multilevel database and knowledge base systems. In: Proc. of the ACM SIGMOD Conference, pp. 357–373 (1987)Google Scholar
  16. 16.
    Nguyen, H.T., Kreinovich, V., Gorodetski, V.I., Nesterov, V.M., Touloupiev, A.L.: Applications of interval-valued degrees of belief: a survey. In: Touloupiev, A. (ed.) Information Technologies and Intellectual Methods, vol. 3 (IT&IM’3), St. Petersburg Institute for Information and Automation of Russian Academy of Sciences (SPIIRAS), pp. 6–61 (in Russian) (1999)Google Scholar
  17. 17.
    Office of Technology Assessment, Protecting privacy in computerized medical information, US Government Printing Office, Washington, DC (1993)Google Scholar
  18. 18.
    Palley, M., Siminoff, J.: Regression methodology based disclosure of a statistical database. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 382–387 (1986)Google Scholar
  19. 19.
    Rabinovich, S.: Measurement Errors and Uncertainties, Springer, N. Y. (2005)Google Scholar
  20. 20.
    Su, T., Ozsoyoglu, G.: Controlling FD and MVD inference in multilevel relational database systems. IEEE Transactions on Knowledge and Data Engineering 3, 474–485 (1991)CrossRefGoogle Scholar
  21. 21.
    Sweeney, L.: Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine and Ethics 25, 98–110 (1997)CrossRefGoogle Scholar
  22. 22.
    Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Lin, T.Y., Qian, S. (eds.) Database Security XI: Status and Prospects, Elsevier, Amsterdam (1998)Google Scholar
  23. 23.
    Vavasis, S.A.: Nonlinear Optimization. Oxford University Press, N.Y. (1991)zbMATHGoogle Scholar
  24. 24.
    Willenborg, L., De Waal, T.: Statistical disclosure control in practice. Springer Verlag, New York (1996)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Luc Longpré
    • 1
  • Gang Xiang
    • 1
  • Vladik Kreinovich
    • 1
  • Eric Freudenthal
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at El PasoEl PasoUSA

Personalised recommendations