Understanding the Effects of Mitigation on De-identified Data

Chester, Andrew; Koh, Yun Sing; Lee, Junjae

doi:10.1007/978-3-030-79457-6_12

Andrew Chester¹²,
Yun Sing Koh¹² &
Junjae Lee¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12798))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1621 Accesses

Abstract

Machine learning algorithms can play a significant role in peoples lives. The data used in these algorithms often contains sensitive information and can have inherent biases. We investigate the effects of the interaction between mechanisms designed to preserve privacy and to mitigate biases. The mechanisms employed for this investigation were k-anonymity for de-identification and re-weighting for bias mitigation. The experiments were threefold. First, we investigated the effects of mitigation on de-identified data. Second, we measured the effects of three data parameters: class imbalance ratio, privileged positive outcome ratio and unprivileged positive outcome ratio. Third, we assessed the utility of the mitigation mechanism in a healthcare context. Using real-world data, we tested the effects of different levels of de-identification. We primarily utilised three measures to indicate the procedures’ effects. First, for accuracy, we analysed simple accuracy and balanced accuracy rate. Second, we measured the number of positive outcomes for the privileged and unprivileged class and the disparate impact for fairness. Third, for utility, we measured the recall rate as well as a novel metric; recall ratio. We display these two metrics based on the classification threshold to indicate the trade-off between achieving high true positives while limiting overall positive outcomes. This trade-off is analogous to a medical testing scenario where the objective is to have high accuracy of true positives and minimise cost given overall percent positives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Medical expenditure panel survey (MEPS) (2017). http://archive.ics.uci.edu/ml
Bellamy, R.K., et al.: AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4–5), 4-1 (2019). https://doi.org/10.1147/JRD.2019.2942287
Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K.N., Varshney, K.R.: Optimized pre-processing for discrimination prevention. In: Advances in Neural Information Processing Systems, pp. 3992–4001 (2017)
Google Scholar
Chester, A., Koh, Y.S., Wicker, J., Sun, Q., Lee, J.: Balancing utility and fairness against privacy in medical data. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1226–1233. IEEE (2020)
Google Scholar
Cummings, R., Kimpara, D., Gupta, V., Morgenstern, J.: On the compatibility of privacy and fairness. In: ACM UMAP 2019 Adjunct - Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization (FairUMAP), pp. 309–315 (2019). https://doi.org/10.1145/1122445.1122456
Dua, D., Graff, C.: UCI machine learning repository (2017). https://www.meps.ahrq.gov/mepsweb/
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Ekstrand, M.D., Joshaghani, R., Mehrpouyan, H.: Privacy for all: ensuring fair and equitable privacy protections. Mach. Learn. Res. 81, 1–13 (2018)
Google Scholar
Fung, B.C.M., Wang, K.E., Chen, R.U.I., Yu, P.S.: Privacy-preserving data publishing : a survey of recent developments. ACM Comput. Surv. (Csur) 42(4), 1–53 (2010). https://doi.org/10.1145/1749603.1749605
Article Google Scholar
Iosifidis, V., Ntoutsi, E.: Fabboo-online fairness-aware learning under class imbalance. In: Discovery Science (2020)
Google Scholar
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)
Article Google Scholar
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp. 202–207 (1996)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional K-anonymity. In: International Conference on Data Engineering, p. 25 (2006)
Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. arXiv (2019)
Google Scholar
Pujol, D., McKenna, R., Kuppam, S., Hay, M., Machanavajjhala, A., Miklau, G.: Fair decision making using privacy-protected data. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 189–199 (2020)
Google Scholar
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333 (2013)
Google Scholar

Download references

Acknowledgments

This research was supported by Precision Driven Health (PDH).

Author information

Authors and Affiliations

The University of Auckland, Auckland, New Zealand
Andrew Chester & Yun Sing Koh
Orion Health, Auckland, New Zealand
Junjae Lee

Authors

Andrew Chester
View author publications
You can also search for this author in PubMed Google Scholar
Yun Sing Koh
View author publications
You can also search for this author in PubMed Google Scholar
Junjae Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Chester .

Editor information

Editors and Affiliations

i-SOMET Incorporate Association, Morioka, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Texas State University San Marcos, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chester, A., Koh, Y.S., Lee, J. (2021). Understanding the Effects of Mitigation on De-identified Data. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-79457-6_12
Published: 19 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics