Abstract
Machine learning algorithms can play a significant role in peoples lives. The data used in these algorithms often contains sensitive information and can have inherent biases. We investigate the effects of the interaction between mechanisms designed to preserve privacy and to mitigate biases. The mechanisms employed for this investigation were k-anonymity for de-identification and re-weighting for bias mitigation. The experiments were threefold. First, we investigated the effects of mitigation on de-identified data. Second, we measured the effects of three data parameters: class imbalance ratio, privileged positive outcome ratio and unprivileged positive outcome ratio. Third, we assessed the utility of the mitigation mechanism in a healthcare context. Using real-world data, we tested the effects of different levels of de-identification. We primarily utilised three measures to indicate the procedures’ effects. First, for accuracy, we analysed simple accuracy and balanced accuracy rate. Second, we measured the number of positive outcomes for the privileged and unprivileged class and the disparate impact for fairness. Third, for utility, we measured the recall rate as well as a novel metric; recall ratio. We display these two metrics based on the classification threshold to indicate the trade-off between achieving high true positives while limiting overall positive outcomes. This trade-off is analogous to a medical testing scenario where the objective is to have high accuracy of true positives and minimise cost given overall percent positives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Medical expenditure panel survey (MEPS) (2017). http://archive.ics.uci.edu/ml
Bellamy, R.K., et al.: AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4–5), 4-1 (2019). https://doi.org/10.1147/JRD.2019.2942287
Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K.N., Varshney, K.R.: Optimized pre-processing for discrimination prevention. In: Advances in Neural Information Processing Systems, pp. 3992–4001 (2017)
Chester, A., Koh, Y.S., Wicker, J., Sun, Q., Lee, J.: Balancing utility and fairness against privacy in medical data. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1226–1233. IEEE (2020)
Cummings, R., Kimpara, D., Gupta, V., Morgenstern, J.: On the compatibility of privacy and fairness. In: ACM UMAP 2019 Adjunct - Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization (FairUMAP), pp. 309–315 (2019). https://doi.org/10.1145/1122445.1122456
Dua, D., Graff, C.: UCI machine learning repository (2017). https://www.meps.ahrq.gov/mepsweb/
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Ekstrand, M.D., Joshaghani, R., Mehrpouyan, H.: Privacy for all: ensuring fair and equitable privacy protections. Mach. Learn. Res. 81, 1–13 (2018)
Fung, B.C.M., Wang, K.E., Chen, R.U.I., Yu, P.S.: Privacy-preserving data publishing : a survey of recent developments. ACM Comput. Surv. (Csur) 42(4), 1–53 (2010). https://doi.org/10.1145/1749603.1749605
Iosifidis, V., Ntoutsi, E.: Fabboo-online fairness-aware learning under class imbalance. In: Discovery Science (2020)
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp. 202–207 (1996)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional K-anonymity. In: International Conference on Data Engineering, p. 25 (2006)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. arXiv (2019)
Pujol, D., McKenna, R., Kuppam, S., Hay, M., Machanavajjhala, A., Miklau, G.: Fair decision making using privacy-protected data. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 189–199 (2020)
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333 (2013)
Acknowledgments
This research was supported by Precision Driven Health (PDH).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chester, A., Koh, Y.S., Lee, J. (2021). Understanding the Effects of Mitigation on De-identified Data. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-79457-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)