Skip to main content
Log in

Segregation discovery in a social network of companies

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-groups of population and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining concepts and tools. The framework is challenged on a cases study in the context of company networks. We analyse segregation on the grounds of sex and age for directors in the boards of the Italian companies. The network includes 2.15M companies and 3.63M directors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The geometric interpretation of the Gini index is provided in the space [0, 1] × [0, 1]. The Lorenz curve plots the cumulative fraction of minority against the cumulative fraction of majority. Formally, assume that p 1,…,p n are in descending order. The Lorenz curve f() is the piece-wise linear function such that f(0) = 0, f(1) = 1, and, for i ∈ [1,n], \(f(\hat {X}_{i}) = \hat {Y}_{i}\) where \(\hat {X}_{i}\) is the cumulative fraction of the minority group up to unit i, and \(\hat {Y}_{i}\) is the cumulative fraction of the majority group up to unit i. The diagonal represents the perfect equality of distribution of majority vs minority population. The Gini index is twice the area between the Lorenz curve and the diagonal. See (Duncan and Duncan 1955; Xu 2003) for details.

  2. Level 3 of the Nomenclature of Territorial Units for Statistics (NUTS)(International Organization for Standardization 2013).

  3. We adopted the methods and software from (Alstott et al. 2014) for fitting heavily tailed distributions. The distribution with the best loglikelihood ratio is selected among power laws, truncated power laws, exponentials, stretched exponentials, and log-normals.

  4. See also https://en.wikipedia.org/wiki/Gender_representation_on_corporate_boards_of_directors.

References

  • Almeida, H.V., & Wolfenzon, D. (2006). A theory of pyramidal ownership and family business groups. The Journal of Finance, 61(6), 2637–2680.

    Article  Google Scholar 

  • Alstott, J., Bullmore, E., & Plenz, D. (2014). Powerlaw: A Python package for analysis of heavy-tailed distributions. PLoS ONE, 9(1), e85777.

    Article  Google Scholar 

  • Atkinson, A.B., Piketty, T., & Saez, E. (2011). Top incomes in the long run of history. Journal of Economic Literature, 1(49), 3–71.

    Article  Google Scholar 

  • Bakshy, E., Messing, S., & Adamic, L.A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132.

    Article  MathSciNet  MATH  Google Scholar 

  • Baroni, A., & Ruggieri, S. (2015). Segregation discovery in a social network of companies. In Advances in intelligent data analysis XIV, LNCS, vol. 9385, pp. 37–48. Springer.

  • Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., & Lakhal, L. (2000). Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2), 66–75.

    Article  MATH  Google Scholar 

  • Battiston, S., Bonabeau, E., & Weisbuch, G. (2003). Decision making dynamics in corporate boards. Physica A: Statistical Mechanics and its Applications, 322, 567–582.

    Article  MATH  Google Scholar 

  • Battiston, S., & Catanzaro, M. (2004). Statistical properties of corporate board and director networks. The European Physical Journal B, 38(2), 345–352.

    Article  Google Scholar 

  • Bell, W. (1954). A probability model for the measurement of ecological segregation. Social Forces, 32, 357–364.

    Article  Google Scholar 

  • Bettio, F., & Verashchagina, A. (2009). Gender segregation in the labour market: Root causes, implications and policy responses in the EU. Publications Office of the European Union.

  • Borgelt, C. (2012). Frequent item set mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 437–456. http://www.borgelt.net/fpgrowth.html.

    Google Scholar 

  • Burke, R.J. (2000). Company size, board size and numbers of women corporate directors. In Women on corporate boards of directors, pp. 157–167. Springer.

  • Burke, R.J., & Mattis, M.C. (2013). Women on corporate boards of directors: International challenges and opportunities, vol. 14. Springer Science & Business Media.

  • Clark, W.A.V. (1991). Residential preferences and neighborhood racial segregation: A test of the Schelling segregation model. Demography, 28(1), 1–19.

    Article  MathSciNet  Google Scholar 

  • Cristaldi, F. (2012). Immigrazione e territorio: lo spazio con/diviso. Pàtron.

  • Croppenstedt, A., Goldstein, M., & Rosas, N. (2013). Gender and agriculture: Inefficiencies, segregation, and low productivity traps. World Bank Research Observer, 28, 79–109.

    Article  Google Scholar 

  • Das, S., & Kramer, A.D.I. (2013). Self-censorship on Facebook. In Proceedings of the international conference on weblogs and social media (ICWSM 2013). The AAAI Press.

  • Davis, G.F., Yoo, M., & Baker, W.E. (2003). The small world of the American corporate elite, 1982-2001. Strategic organization, 1(3), 301–326.

    Article  Google Scholar 

  • Demb, A., & Neubauer, F.F. (1992). The corporate board: Confronting the paradoxes. Long range planning, 25(3), 9–20.

    Article  Google Scholar 

  • Denton, N.A., & Massey, D.S. (1988). Residential segregation of Blacks, Hispanics, and Asians by socioeconomic status and generation. Social Science Quarterly, 69(4), 797–817.

    Google Scholar 

  • Duncan, O.D., & Duncan, B. (1955). A methodological analysis of segregation indexes. American Sociological Review, 20, 210–217.

    Article  Google Scholar 

  • Fischer, E. (2011). Distribution of race and ethnicity in US major cities. Published on line at http://www.flickr.com/photos/walkingsf/sets/72157624812674967/detail/ under Creative Commons licence, CC BY-SA 2.0.

  • Flaxman, S., Goel, S., & Rao, J.M. (2013). Ideological segregation and the effects of social media on news consumption. Available at SSRN: http://ssrn.com/abstract=2363701.

  • Flückiger, Y., & Silber, J. (1999). The measurement of segregation in the labor force. Berlin: Springer Science & Business Media.

    Book  Google Scholar 

  • Frey, J.H., & Eitzen, D.S. (1991). Sport and society. Annual Review of Sociology, 17, 503–522.

    Article  Google Scholar 

  • Gastwirth, J.L. (1971). A general definition of the Lorenz curve. Econometrica: Journal of the Econometric Society, 39, 1037–1039.

    Article  MATH  Google Scholar 

  • Gentzkow, M., & Shapiro, J.M. (2011). Ideological segregation online and offline. Quarterly Journal of Economics, 126(4), 1799–1839.

    Article  Google Scholar 

  • Goethals, B. (2010). Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi.

  • Grevet, C. (2016). Being nice on the internet: Designing for the coexistence of diverse opinions online. Ph.D. thesis: Georgia Institute of Technology.

    Google Scholar 

  • Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.

    Article  MathSciNet  Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd edn. Burlington: Morgan Kaufmann Publishers Inc.

    MATH  Google Scholar 

  • Hutchens, R.M. (1991). Segregation curves, Lorenz curves, and inequality in the distribution of people across occupations. Mathematical Social Sciences, 21(1), 31–51.

    Article  MathSciNet  Google Scholar 

  • International Organization for Standardization (2013). ISO 3166-1:2013 International standard for country codes and codes for their subdivisions.

  • James, D.R., & Tauber, K.E. (1985). Measures of segregation. Sociological Methodology, 13, 1–32.

    Article  Google Scholar 

  • Kaser, O., & Lemire, D. (2016). Compressed bitmap indexes: Beyond unions and intersections. Software: Practice and Experience, 46, 167–198. https://github.com/lemire/javaewah.

    Google Scholar 

  • Kogut, B., & Walker, G. (2001). The small world of Germany and the durability of national networks. American Sociological Review, 66, 317–335.

    Article  Google Scholar 

  • Loy, J.W., & Elvogue, J.F. (1970). Racial segregation in American sport. International Review for the Sociology of Sport, 5(1), 5–24.

    Article  Google Scholar 

  • Maes, M., & Bischofberger, L. (2015). Will the personalization of online social networks foster opinion polarization Available at SSRN: http://ssrn.com/abstract=2553436.

  • Massey, D.S. (2016). Segregation and the perpetuation of disadvantage. In The Oxford Handbook of the Social Science of Poverty (pp. 369–393).

  • Massey, D.S., & Denton, N.A. (1988). The dimensions of residential segregation. Social Forces, 67(2), 281–315.

    Article  Google Scholar 

  • Massey, D.S., Rothwell, J., & Domina, T. (2009). The changing bases of segregation in the United States. Annals of the American Academy of Political and Social Science, 626, 74–90.

    Article  Google Scholar 

  • Mitchell, T. (1997). Machine Learning. New York: The Mc-Graw-Hill Companies, Inc.

    MATH  Google Scholar 

  • Mizruchi, M.S. (1996). What do interlocks do? An analysis, critique, and assessment of research on interlocking directorates. Annual Review of Sociology, 22(1), 271–298.

    Article  Google Scholar 

  • Mora, R., & Ruiz-Castillo, J. (2011). Entropy-based segregation indices. Sociological Methodology, 41, 159–194.

    Article  Google Scholar 

  • Musterd, S. (2005). Social and ethnic segregation in Europe: Levels, causes, and effects. Journal of Urban Affairs, 27(3), 331–348.

    Article  Google Scholar 

  • Negrevergne, B., Termier, A., Rousset, M.C., & Méhaut, J.F. (2014). Para Miner: a generic pattern mining algorithm for multi-core architectures. Data Mining and Knowledge Discovery, 28(3), 593–633.

    Article  MathSciNet  MATH  Google Scholar 

  • Ooi, C.A., Hooy, C.W., & Som, A.P.M. (2015). Diversity in human and social capital: Empirical evidence from Asian tourism firms in corporate board composition. Tourism Management, 48, 139–153.

    Article  Google Scholar 

  • Pariser, E. (2011). The Filter Bubble: What the Internet is hiding from you. Penguin UK.

  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd edn. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Pearl, J. (2014). Comment: Understanding simpson’s paradox. The American Statistician, 68(1), 8–13.

    Article  MathSciNet  Google Scholar 

  • Piccardi, C., Calatroni, L., & Bertoni, F. (2010). Communities in Italian corporate networks. Physica A: Statistical Mechanics and its Applications, 389(22), 5247–5258.

    Article  Google Scholar 

  • Randøy, T., Thomsen, S., & Oxelheim, L. (2006). A nordic perspective on corporate board diversity. Tech. Rep., 0, 5428.

    Google Scholar 

  • Robins, G., & Alexander, M. (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1), 69–94.

    Article  MATH  Google Scholar 

  • Romei, A., & Ruggieri, S. (2014). A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5), 582–638.

    Article  Google Scholar 

  • Romei, A., Ruggieri, S., & Turini, F. (2015). The layered structure of company share networks. In Proceedings of the IEEE international conference on data science and advanced analytics (DSAA 2015), pp. 1–10. IEEE Computer Society.

  • Sankowska, A., & Siudak, D. (2016). The small world phenomenon and assortative mixing in Polish corporate board and director networks. Physica A: Statistical Mechanics and its Applications, 443, 309–315.

    Article  Google Scholar 

  • Schelling, T.C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143–186.

    Article  MATH  Google Scholar 

  • Smith, S.L., & Choueiti, M. (2011). Black characters in popular film: Is the key to diversifying cinematic content held in the hand of the black director. Annenberg School for Communication & Journalism. Retrieved March, 12, 2013.

    Google Scholar 

  • Xu, K. (2003). How has the literature on Gini’s index evolved in the past 80 years? Economics working paper. Halifax: Dalhousie University. Available at SSRN: http://ssrn.com/abstract=423200.

    Google Scholar 

  • Zhou, T., Ren, J., Medo, M., & Zhang, Y.C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4), 046115.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Baroni.

Additional information

A preliminary version of the results of this paper appeared in (Baroni and Ruggieri 2015).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baroni, A., Ruggieri, S. Segregation discovery in a social network of companies. J Intell Inf Syst 51, 71–96 (2018). https://doi.org/10.1007/s10844-017-0485-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0485-0

Keywords

Navigation