Biased Embeddings from Wild Data: Measuring, Understanding and Removing

Sutton, Adam; Lansdall-Welfare, Thomas; Cristianini, Nello

doi:10.1007/978-3-030-01768-2_27

Adam Sutton¹⁶,
Thomas Lansdall-Welfare¹⁶ &
Nello Cristianini¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11191))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1416 Accesses
7 Citations

Abstract

Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered “from the wild” and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Baby names were taken from http://bit.ly/2Dmqjco, separated into two gendered lists.

References

Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: theres software used across the country to predict future criminals. and its biased against blacks. ProPublica, May 23 2016 (2016)
Google Scholar
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)
Google Scholar
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)
Article Google Scholar
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Flaounas, I., Ali, O., Lansdall-Welfare, T., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Research methods in the age of digital journalism: massive-scale automated analysis of news-contenttopics, style and gender. Dig. Journal. 1(1), 102–116 (2013)
Google Scholar
Flores, A.W., Bechtel, K., Lowenkamp, C.T.: False positives, false negatives, and false analyses: a rejoinder to machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks. Fed. Probat. 80, 38 (2016)
Google Scholar
Fong, R., Vedaldi, A.: Net2Vec: quantifying and explaining how concepts are encoded by filters in deep neural networks. arXiv preprint arXiv:1801.03454 (2018)
Greenwald, A.G., McGhee, D.E., Schwartz, J.L.: Measuring individual differences in implicit cognition: the implicit association test. J. Personal. Soc. Psychol. 74(6), 1464 (1998)
Article Google Scholar
Jia, S., Lansdall-Welfare, T., Cristianini, N.: Freudian slips: analysing the internal representations of a neural network from its mistakes. In: Advances in Intelligent Data Analysis XVI, pp. 138–148 (2017)
Chapter Google Scholar
Jia, S., Lansdall-Welfare, T., Sudhahar, S., Carter, C., Cristianini, N.: Women are seen more than heard in online newspapers. PLOS ONE 11(2), 1–11 (2016). https://doi.org/10.1371/journal.pone.0148434
Article Google Scholar
Kahng, M., Andrews, P.Y., Kalro, A., Chau, D.H.P.: Activis: visual exploration of industry-scale deep neural network models. IEEE Trans. Vis. Comput. Gr. 24(1), 88–97 (2018)
Article Google Scholar
Lansdall-Welfare, T., Sudhahar, S., Thompson, J., Lewis, J., Team, F.N., Cristianini, N., Gregor, A., Low, B., Atkin-Wright, T., Dobson, M.: Content analysis of 150 years of british periodicals. Proc. Natl. Acad. Sci. 114(4), E457–E465 (2017)
Article Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment analysis in twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1–18 (2016)
Google Scholar
Office for National Statistics: Statistical bulletin: Annual survey of hours and earnings: 2017 provisional and 2016 revised results (2017). https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/earningsandworkinghours/bulletins/annualsurveyofhoursandearnings/2017provisionaland2016revisedresults
Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition ldc2011t07. DVD. Linguistic Data Consortium, Philadelphia (2011)
Google Scholar
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count: LIWC 2007. Mahway: Lawrence Erlbaum Associates, vol. 71 (2001)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. (2017)
Google Scholar
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3, p. 6 (2017)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)

Download references

Acknowledgements

AS is supported by EPSRC Centre for Communications. TLW and NC are support by the FP7 Ideas: European Research Council Grant 339365 - ThinkBIG.

Author information

Authors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Bristol, BS8 1UB, UK
Adam Sutton, Thomas Lansdall-Welfare & Nello Cristianini

Authors

Adam Sutton
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Lansdall-Welfare
View author publications
You can also search for this author in PubMed Google Scholar
Nello Cristianini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Lansdall-Welfare .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Wouter Duivesteijn
Department of Information and Computing Sciences, University Utrecht, Utrecht, The Netherlands
Arno Siebes
University of Helsinki, Helsinki, Finland
Antti Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutton, A., Lansdall-Welfare, T., Cristianini, N. (2018). Biased Embeddings from Wild Data: Measuring, Understanding and Removing. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-01768-2_27
Published: 05 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01767-5
Online ISBN: 978-3-030-01768-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics