Skip to main content

Understanding the Effects of Mitigation on De-identified Data

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices (IEA/AIE 2021)

Abstract

Machine learning algorithms can play a significant role in peoples lives. The data used in these algorithms often contains sensitive information and can have inherent biases. We investigate the effects of the interaction between mechanisms designed to preserve privacy and to mitigate biases. The mechanisms employed for this investigation were k-anonymity for de-identification and re-weighting for bias mitigation. The experiments were threefold. First, we investigated the effects of mitigation on de-identified data. Second, we measured the effects of three data parameters: class imbalance ratio, privileged positive outcome ratio and unprivileged positive outcome ratio. Third, we assessed the utility of the mitigation mechanism in a healthcare context. Using real-world data, we tested the effects of different levels of de-identification. We primarily utilised three measures to indicate the procedures’ effects. First, for accuracy, we analysed simple accuracy and balanced accuracy rate. Second, we measured the number of positive outcomes for the privileged and unprivileged class and the disparate impact for fairness. Third, for utility, we measured the recall rate as well as a novel metric; recall ratio. We display these two metrics based on the classification threshold to indicate the trade-off between achieving high true positives while limiting overall positive outcomes. This trade-off is analogous to a medical testing scenario where the objective is to have high accuracy of true positives and minimise cost given overall percent positives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Medical expenditure panel survey (MEPS) (2017). http://archive.ics.uci.edu/ml

  2. Bellamy, R.K., et al.: AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4–5), 4-1 (2019). https://doi.org/10.1147/JRD.2019.2942287

  3. Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K.N., Varshney, K.R.: Optimized pre-processing for discrimination prevention. In: Advances in Neural Information Processing Systems, pp. 3992–4001 (2017)

    Google Scholar 

  4. Chester, A., Koh, Y.S., Wicker, J., Sun, Q., Lee, J.: Balancing utility and fairness against privacy in medical data. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1226–1233. IEEE (2020)

    Google Scholar 

  5. Cummings, R., Kimpara, D., Gupta, V., Morgenstern, J.: On the compatibility of privacy and fairness. In: ACM UMAP 2019 Adjunct - Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization (FairUMAP), pp. 309–315 (2019). https://doi.org/10.1145/1122445.1122456

  6. Dua, D., Graff, C.: UCI machine learning repository (2017). https://www.meps.ahrq.gov/mepsweb/

  7. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  8. Ekstrand, M.D., Joshaghani, R., Mehrpouyan, H.: Privacy for all: ensuring fair and equitable privacy protections. Mach. Learn. Res. 81, 1–13 (2018)

    Google Scholar 

  9. Fung, B.C.M., Wang, K.E., Chen, R.U.I., Yu, P.S.: Privacy-preserving data publishing : a survey of recent developments. ACM Comput. Surv. (Csur) 42(4), 1–53 (2010). https://doi.org/10.1145/1749603.1749605

    Article  Google Scholar 

  10. Iosifidis, V., Ntoutsi, E.: Fabboo-online fairness-aware learning under class imbalance. In: Discovery Science (2020)

    Google Scholar 

  11. Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)

    Article  Google Scholar 

  12. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp. 202–207 (1996)

    Google Scholar 

  13. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional K-anonymity. In: International Conference on Data Engineering, p. 25 (2006)

    Google Scholar 

  14. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. arXiv (2019)

    Google Scholar 

  15. Pujol, D., McKenna, R., Kuppam, S., Hay, M., Machanavajjhala, A., Miklau, G.: Fair decision making using privacy-protected data. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 189–199 (2020)

    Google Scholar 

  16. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333 (2013)

    Google Scholar 

Download references

Acknowledgments

This research was supported by Precision Driven Health (PDH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Chester .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chester, A., Koh, Y.S., Lee, J. (2021). Understanding the Effects of Mitigation on De-identified Data. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79457-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79456-9

  • Online ISBN: 978-3-030-79457-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics