Using Word Embeddings to Analyze how Universities Conceptualize “Diversity” in their Online Institutional Presence
The term diversity can be operationalized demographically (in terms of physical or external characteristics such as race, gender, ethnicity and nationality) or intellectually (in terms of mental phenomena such as viewpoints, beliefs, ideas and political opinion). This work examines the context in which the concept of diversity is used by 50 US elite universities in their online institutional presence. Distributional semantics theory is leveraged to quantify semantic similarity between linguistic items based on their distributional properties in a large sample of language data taken from universities online profiles. The language modelling is carried out using Word2vec, a state-of-the-art machine learning model widely used by the natural language processing community to create vector representations of words (i.e. word embeddings). The model uses a neural network trained to reconstruct the linguistic context of words in the training corpus. As a by-product of the training objective, word2vec embeds words into a learned vector space where words that share common contexts and thus semantic meaning according to the distributional hypotheses, are located in close proximity to one another. A quantitative analysis of cosine similarities between word vectors derived from the corpus of text retrieved from universities online institutional profiles shows that the diversity concept is much closer to demographic operationalisations of diversity such as race, gender, ethnicity or nationality than to intellectual ones such as viewpoints, values, beliefs or political orientation. That is, the universities studied tend to use the word diversity predominantly in its demographic denotation to refer to variety of external appearance instead of to variety of mental phenomena. This is significant in light of the severe lack of ideological diversity in universities across the US, with the vast majority of faculty leaning left of center. Universities emphasis on the usage of the term diversity to denote demographic subtypes of diversity could be indicative of a majority power structure in the Academy which tries to hinder the fostering of viewpoint diversity by steering diversity efforts towards demographic interpretations of the word. At the very least, the results of this work suggest that universities, as judged from the way they use language in their own online institutional profiles, prioritize demographic types of diversity around variety of external appearance cues over intellectual heterogeneity.
KeywordsDiversity Word embeddings Word2vec Computational content analysis Viewpoint diversity
- Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. 2014. Political Diversity Will Improve Social Psychological Science. Behavioral and Brain Sciences, 1–54. https://doi.org/10.1017/S0140525X14000430.
- Firth, J. R. 1957. A synopsis of linguistic theory (pp. 1930–1955).Google Scholar
- Langbert, M., Quain, A., & B. Klein, D. 2016. Faculty Voter Registration in Economics, History, Journalism, Law, and Psychology. Econ Journal Watch, 13, 422–451.Google Scholar
- Maaten, L. van der, & Hinton, G. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.Google Scholar
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. 2013. Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (vol. 26, pp. 3111–3119). Curran Associates, Inc Retrieved from http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
- Page, S. E. 2008. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press.Google Scholar
- Řehůřek, R., & Sojka, P. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA.Google Scholar
- Shi, F., Teplitskiy, M., Duede, E., & Evans, J. 2017. The Wisdom of Polarized Crowds. ArXiv:1712.06414 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1712.06414
- Simonton, D. K. 1999. Origins of Genius: Darwinian Perspectives on Creativity. Oxford, New York:Oxford University Press.Google Scholar
- Williams, K., & O’Reilly, C. 1998. Demography and Diversity in Organizations: A Review of 40 Years of Research. Research in Organizational Behavior, 20, 77–140.Google Scholar