Abstract
We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-groups of population and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining concepts and tools. The framework is challenged on a cases study in the context of company networks. We analyse segregation on the grounds of sex and age for directors in the boards of the Italian companies. The network includes 2.15M companies and 3.63M directors.
Similar content being viewed by others
Notes
The geometric interpretation of the Gini index is provided in the space [0, 1] × [0, 1]. The Lorenz curve plots the cumulative fraction of minority against the cumulative fraction of majority. Formally, assume that p 1,…,p n are in descending order. The Lorenz curve f() is the piece-wise linear function such that f(0) = 0, f(1) = 1, and, for i ∈ [1,n], \(f(\hat {X}_{i}) = \hat {Y}_{i}\) where \(\hat {X}_{i}\) is the cumulative fraction of the minority group up to unit i, and \(\hat {Y}_{i}\) is the cumulative fraction of the majority group up to unit i. The diagonal represents the perfect equality of distribution of majority vs minority population. The Gini index is twice the area between the Lorenz curve and the diagonal. See (Duncan and Duncan 1955; Xu 2003) for details.
Level 3 of the Nomenclature of Territorial Units for Statistics (NUTS)(International Organization for Standardization 2013).
We adopted the methods and software from (Alstott et al. 2014) for fitting heavily tailed distributions. The distribution with the best loglikelihood ratio is selected among power laws, truncated power laws, exponentials, stretched exponentials, and log-normals.
References
Almeida, H.V., & Wolfenzon, D. (2006). A theory of pyramidal ownership and family business groups. The Journal of Finance, 61(6), 2637–2680.
Alstott, J., Bullmore, E., & Plenz, D. (2014). Powerlaw: A Python package for analysis of heavy-tailed distributions. PLoS ONE, 9(1), e85777.
Atkinson, A.B., Piketty, T., & Saez, E. (2011). Top incomes in the long run of history. Journal of Economic Literature, 1(49), 3–71.
Bakshy, E., Messing, S., & Adamic, L.A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132.
Baroni, A., & Ruggieri, S. (2015). Segregation discovery in a social network of companies. In Advances in intelligent data analysis XIV, LNCS, vol. 9385, pp. 37–48. Springer.
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., & Lakhal, L. (2000). Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2), 66–75.
Battiston, S., Bonabeau, E., & Weisbuch, G. (2003). Decision making dynamics in corporate boards. Physica A: Statistical Mechanics and its Applications, 322, 567–582.
Battiston, S., & Catanzaro, M. (2004). Statistical properties of corporate board and director networks. The European Physical Journal B, 38(2), 345–352.
Bell, W. (1954). A probability model for the measurement of ecological segregation. Social Forces, 32, 357–364.
Bettio, F., & Verashchagina, A. (2009). Gender segregation in the labour market: Root causes, implications and policy responses in the EU. Publications Office of the European Union.
Borgelt, C. (2012). Frequent item set mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 437–456. http://www.borgelt.net/fpgrowth.html.
Burke, R.J. (2000). Company size, board size and numbers of women corporate directors. In Women on corporate boards of directors, pp. 157–167. Springer.
Burke, R.J., & Mattis, M.C. (2013). Women on corporate boards of directors: International challenges and opportunities, vol. 14. Springer Science & Business Media.
Clark, W.A.V. (1991). Residential preferences and neighborhood racial segregation: A test of the Schelling segregation model. Demography, 28(1), 1–19.
Cristaldi, F. (2012). Immigrazione e territorio: lo spazio con/diviso. Pàtron.
Croppenstedt, A., Goldstein, M., & Rosas, N. (2013). Gender and agriculture: Inefficiencies, segregation, and low productivity traps. World Bank Research Observer, 28, 79–109.
Das, S., & Kramer, A.D.I. (2013). Self-censorship on Facebook. In Proceedings of the international conference on weblogs and social media (ICWSM 2013). The AAAI Press.
Davis, G.F., Yoo, M., & Baker, W.E. (2003). The small world of the American corporate elite, 1982-2001. Strategic organization, 1(3), 301–326.
Demb, A., & Neubauer, F.F. (1992). The corporate board: Confronting the paradoxes. Long range planning, 25(3), 9–20.
Denton, N.A., & Massey, D.S. (1988). Residential segregation of Blacks, Hispanics, and Asians by socioeconomic status and generation. Social Science Quarterly, 69(4), 797–817.
Duncan, O.D., & Duncan, B. (1955). A methodological analysis of segregation indexes. American Sociological Review, 20, 210–217.
Fischer, E. (2011). Distribution of race and ethnicity in US major cities. Published on line at http://www.flickr.com/photos/walkingsf/sets/72157624812674967/detail/ under Creative Commons licence, CC BY-SA 2.0.
Flaxman, S., Goel, S., & Rao, J.M. (2013). Ideological segregation and the effects of social media on news consumption. Available at SSRN: http://ssrn.com/abstract=2363701.
Flückiger, Y., & Silber, J. (1999). The measurement of segregation in the labor force. Berlin: Springer Science & Business Media.
Frey, J.H., & Eitzen, D.S. (1991). Sport and society. Annual Review of Sociology, 17, 503–522.
Gastwirth, J.L. (1971). A general definition of the Lorenz curve. Econometrica: Journal of the Econometric Society, 39, 1037–1039.
Gentzkow, M., & Shapiro, J.M. (2011). Ideological segregation online and offline. Quarterly Journal of Economics, 126(4), 1799–1839.
Goethals, B. (2010). Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi.
Grevet, C. (2016). Being nice on the internet: Designing for the coexistence of diverse opinions online. Ph.D. thesis: Georgia Institute of Technology.
Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd edn. Burlington: Morgan Kaufmann Publishers Inc.
Hutchens, R.M. (1991). Segregation curves, Lorenz curves, and inequality in the distribution of people across occupations. Mathematical Social Sciences, 21(1), 31–51.
International Organization for Standardization (2013). ISO 3166-1:2013 International standard for country codes and codes for their subdivisions.
James, D.R., & Tauber, K.E. (1985). Measures of segregation. Sociological Methodology, 13, 1–32.
Kaser, O., & Lemire, D. (2016). Compressed bitmap indexes: Beyond unions and intersections. Software: Practice and Experience, 46, 167–198. https://github.com/lemire/javaewah.
Kogut, B., & Walker, G. (2001). The small world of Germany and the durability of national networks. American Sociological Review, 66, 317–335.
Loy, J.W., & Elvogue, J.F. (1970). Racial segregation in American sport. International Review for the Sociology of Sport, 5(1), 5–24.
Maes, M., & Bischofberger, L. (2015). Will the personalization of online social networks foster opinion polarization Available at SSRN: http://ssrn.com/abstract=2553436.
Massey, D.S. (2016). Segregation and the perpetuation of disadvantage. In The Oxford Handbook of the Social Science of Poverty (pp. 369–393).
Massey, D.S., & Denton, N.A. (1988). The dimensions of residential segregation. Social Forces, 67(2), 281–315.
Massey, D.S., Rothwell, J., & Domina, T. (2009). The changing bases of segregation in the United States. Annals of the American Academy of Political and Social Science, 626, 74–90.
Mitchell, T. (1997). Machine Learning. New York: The Mc-Graw-Hill Companies, Inc.
Mizruchi, M.S. (1996). What do interlocks do? An analysis, critique, and assessment of research on interlocking directorates. Annual Review of Sociology, 22(1), 271–298.
Mora, R., & Ruiz-Castillo, J. (2011). Entropy-based segregation indices. Sociological Methodology, 41, 159–194.
Musterd, S. (2005). Social and ethnic segregation in Europe: Levels, causes, and effects. Journal of Urban Affairs, 27(3), 331–348.
Negrevergne, B., Termier, A., Rousset, M.C., & Méhaut, J.F. (2014). Para Miner: a generic pattern mining algorithm for multi-core architectures. Data Mining and Knowledge Discovery, 28(3), 593–633.
Ooi, C.A., Hooy, C.W., & Som, A.P.M. (2015). Diversity in human and social capital: Empirical evidence from Asian tourism firms in corporate board composition. Tourism Management, 48, 139–153.
Pariser, E. (2011). The Filter Bubble: What the Internet is hiding from you. Penguin UK.
Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd edn. New York: Cambridge University Press.
Pearl, J. (2014). Comment: Understanding simpson’s paradox. The American Statistician, 68(1), 8–13.
Piccardi, C., Calatroni, L., & Bertoni, F. (2010). Communities in Italian corporate networks. Physica A: Statistical Mechanics and its Applications, 389(22), 5247–5258.
Randøy, T., Thomsen, S., & Oxelheim, L. (2006). A nordic perspective on corporate board diversity. Tech. Rep., 0, 5428.
Robins, G., & Alexander, M. (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1), 69–94.
Romei, A., & Ruggieri, S. (2014). A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5), 582–638.
Romei, A., Ruggieri, S., & Turini, F. (2015). The layered structure of company share networks. In Proceedings of the IEEE international conference on data science and advanced analytics (DSAA 2015), pp. 1–10. IEEE Computer Society.
Sankowska, A., & Siudak, D. (2016). The small world phenomenon and assortative mixing in Polish corporate board and director networks. Physica A: Statistical Mechanics and its Applications, 443, 309–315.
Schelling, T.C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143–186.
Smith, S.L., & Choueiti, M. (2011). Black characters in popular film: Is the key to diversifying cinematic content held in the hand of the black director. Annenberg School for Communication & Journalism. Retrieved March, 12, 2013.
Xu, K. (2003). How has the literature on Gini’s index evolved in the past 80 years? Economics working paper. Halifax: Dalhousie University. Available at SSRN: http://ssrn.com/abstract=423200.
Zhou, T., Ren, J., Medo, M., & Zhang, Y.C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4), 046115.
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of the results of this paper appeared in (Baroni and Ruggieri 2015).
Rights and permissions
About this article
Cite this article
Baroni, A., Ruggieri, S. Segregation discovery in a social network of companies. J Intell Inf Syst 51, 71–96 (2018). https://doi.org/10.1007/s10844-017-0485-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-017-0485-0