Skip to main content

Computing Covariance and Correlation in Optimally Privacy-Protected Statistical Databases: Feasible Algorithms

  • Conference paper
  • 1238 Accesses

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 312))

Abstract

In many real-life situations, e.g., in medicine, it is necessary to process data while preserving the patients’ confidentiality. One of the most efficient methods of preserving privacy is to replace the exact values with intervals that contain these values. For example, instead of an exact age, a privacy-protected database only contains the information that the age is, e.g., between 10 and 20, or between 20 and 30, etc. Based on this data, it is important to compute correlation and covariance between different quantities. For privacy-protected data, different values from the intervals lead, in general, to different estimates for the desired statistical characteristic. Our objective is then to compute the range of possible values of these estimates.

Algorithms for effectively computing such ranges have been developed for situations when intervals come from the original surveys, e.g., when a person fills in whether his or her age is between 10 or 20, between 20 and 30, etc. These intervals, however, do not always lead to an optimal privacy protection; it turns out that more complex, computer-generated “intervalization” can lead to better privacy under the same accuracy, or, alternatively, to more accurate estimates of statistical characteristics under the same privacy constraints. In this paper, we extend the existing efficient algorithms for computing covariance and correlation based on privacy-protected data to this more general case of interval data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: A Framework for Efficient Data Anonymization under Privacy and Accuracy Constraints. ACM Transactions on Database Systems 34(2), Article 9 (2009)

    Google Scholar 

  2. Jalal-Kamali, A., Kreinovich, V.: Estimating Correlation under Interval Uncertainty. Mechanical Systems and Signal Processing 37, 43–53 (2013)

    Article  Google Scholar 

  3. Jalal-Kamali, A., Kreinovich, V., Longpré, L.: Estimating Covariance for Privacy Case under Interval (and Fuzzy) Uncertainty. In: Yager, R.R., Reformat, M., Shahbazova, S., Ovchinnikov, S. (eds.) Proceedings of the World Conference on Soft Computing, San Francisco, CA, May 23-26 (2011)

    Google Scholar 

  4. Kreinovich, V., Longpré, L., Starks, S.A., Xiang, G., Beck, J., Kandathi, R., Nayak, A., Ferson, S., Hajagos, J.: Interval Versions of Statistical Techniques, with Applications to Environmental Analysis, Bioinformatics, and Privacy in Statistical Databases. Journal of Computational and Applied Mathematics 199(2), 418–423 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Kreinovich, V., Xiang, G., Starks, S.A., Longpré, L., Ceberio, M., Araiza, R., Beck, J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.: Towards combining probabilistic and interval uncertainty in engineering calculations: algorithms for computing statistics under interval uncertainty, and their computational complexity. Reliable Computing 12(6), 471–501 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Nguyen, H.T., Kreinovich, V., Wu, B., Xiang, G.: Computing Statistics under Interval and Fuzzy Uncertainty. SCI, vol. 393. Springer, Heidelberg (2012)

    Book  MATH  Google Scholar 

  7. Papadimitriou, C.H.: Computational Complexity. Addison Wesley, San Diego (1994)

    MATH  Google Scholar 

  8. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, Boca Raton (2011)

    MATH  Google Scholar 

  9. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based System 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Xiang, G., Ferson, S., Ginzburg, L., Longpré, L., Mayorga, E., Kosheleva, O.: Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics: Fuzzy-Motivated Approach. In: Proceedings of the Joint World Congress of the International Fuzzy Systems Association and Annual Conference of the North American Fuzzy Information Processing Society IFSA/NAFIPS 2013, Edmonton, Canada, June 24-28, pp. 611–616 (2013)

    Google Scholar 

  11. Xiang, G., Kreinovich, V.: Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics. In: Proceedings of the IEEE Symposium on Computational Intelligence for Engineering Solutions CIES 2013, Singapore, April 16-19, pp. 163–170 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Day .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Day, J., Jalal-Kamali, A., Kreinovich, V. (2014). Computing Covariance and Correlation in Optimally Privacy-Protected Statistical Databases: Feasible Algorithms. In: Jamshidi, M., Kreinovich, V., Kacprzyk, J. (eds) Advance Trends in Soft Computing. Studies in Fuzziness and Soft Computing, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-03674-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03674-8_35

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03673-1

  • Online ISBN: 978-3-319-03674-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics