Skip to main content
Log in

Organization Mining Using Online Social Networks

  • Published:
Networks and Spatial Economics Aims and scope Submit manuscript

Abstract

Complementing the formal organizational structure of a business are the informal connections among employees. These relationships help identify knowledge hubs, working groups, and shortcuts through the organizational structure. They carry valuable information on how a company functions de facto. In the past, eliciting the informal social networks within an organization was challenging; today they are reflected by friendship relationships in online social networks. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of targeted organizations. Our results show that it is possible to identify leadership roles within the organization solely by using centrality analysis and machine learning techniques applied to the informal relationship network structure. Valuable non-trivial insights can also be gained by clustering an organization’s social network and gathering publicly available information on the employees within each cluster. Knowledge of the network of informal relationships may be a major asset or might be a significant threat to the underlying organization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. 1 http://www.facebook.com

  2. 2 http://www.linkedin.com

  3. 3 http://www.twitter.com

  4. 4 http://www.flickr.com

  5. 5 http://www.youtube.com

  6. 6 http://www.google.com

  7. 7 http://pipl.com

  8. 8 http://www.peekyou.com

  9. 9 Although, in theory, using a BFS crawler can return the target organization’s complete social network, in practice, a BFS crawler is not usable for crawling organizational social networks due to a BFS crawler’s low precision rates in identifying relevant profiles and due to many social networks providers’ limitations on the number of page requests (Twitter 2013). Moreover, due to BFS algorithm properties, there can be cases in which the BFS crawler collects employee profiles with several thousands of Facebook friends, and even though most of these are not employees of the targeted organization, the crawler will still need to collect the profile pages of each one of these friends before continuing its crawling process and moving to more relevant profiles. Nevertheless, in cases where the organization is relatively small with a few dozen employees, a BFS crawler may be sufficient to crawl the entire organization.

  10. 10 All the organizations’ graphs presented throughout this paper are embedded as Scalable Vector Graphics (SVG) images, which enable the reader to zoom in and view each node in each graph.

  11. 11 In contrast to the (Baker and Faulkner 1993) study in which, based on the employees’ titles and the company’s organization charts, the company’s employees were divided into three management categories (top executives, middle managers, or junior managers), we decided to divide each organization’s employees only into two dichotomous groups – managers and non-managers – where to the best of our judgment employees in the manager’s group held some type of management role in the organization, ranging from sales manager to the organization’s CEO.

  12. 12 In all six organizations, our methods discovered more than 17,000 employees. Therefore, to manually determine if each employee in each organization held a management role or not was impractical; consequently, we evaluated our algorithms on 4,650 manually classified employees’ profiles.

  13. 13 In this study, we considered position information to be partial if the description field in the employee’s Facebook profile was not empty.

  14. 14 The centrality measures were calculated by using the Networkx Python package (Hagberg et al. 2008).

  15. 15 During the precision at T-10, T-20, and T-50 calculations, we only took into account the employees that we succeeded to manually classify. For example, if among the top 10 employees in S2 that received the highest HITS measure, we could only manually classify 6 employees out of which 4 employees held management positions. Then S2’s precision at T-10 would be equal to \(\frac {4}{6}=0.66\)

  16. 16 The diameter of a graph G is defined as the maximum eccentricity, where the eccentricity of a node v is the maximum distance from v to all other nodes in G.

References

  • Acquisti A, Gross R (2006) Imagined communities: Awareness, information sharing, and privacy on the facebook. In: Privacy enhancing technologies. Springer, pp 36–58

  • Allen T, Cohen S (1969) Information flow in research and development laboratories, Administrative Science Quarterly

  • Baker WE, Faulkner RR (1993) The social organization of conspiracy: Illegal networks in the heavy electrical equipment industry, American sociological review, pp 837–860

  • Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: when bots socialize for fame and money. In: Proceedings of the 27th Annual Computer Security Applications Conference. ACM, pp 93–102

  • Burt R (1995) Structural holes: rhe social structure of competition. Harvard University Press

  • Campbell C, Maglio P, Cozzi A, Dom B (2003) Expertise identification using email communications. In: Proceedings of the twelfth international conference on Information and knowledge management. ACM, pp 528–531

  • Cats O, Jenelius E (2014) Dynamic vulnerability analysis of public transport networks: mitigation effects of real-time information. Netw Spatial Economics 14 (3):435–463

    Article  Google Scholar 

  • Chesney T, Fire M (2014) Diffusion through networks of heterogeneous nodes in a population characterized by homophily. Nottingham University Business School Research Paper, pp 2014–05

  • Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066,111

    Article  Google Scholar 

  • Constine J (2013) Facebooks growth since ipo in 12 big numbers. TechCrunch

  • Diehl C P, Namata G, Getoor L (2007) Relationship identification for social network discovery. AAAI 22:546–552

    Google Scholar 

  • Diesner J, Frantz T L, Carley K M (2005) Communication networks from the enron email corpus it’s always about the people. enron is no different. Comput Math Org Theory 11(3):201–228

    Article  Google Scholar 

  • Ducruet C, Beauguitte L (2014) Spatial Science and Network Science: Review and Outcomes of a Complex Relationship. Netw Spatial Economics 14(3):297316

    Google Scholar 

  • Dwyer C, Hiltz S, Passerini K (2007) Trust and privacy concern within social networking sites: A comparison of facebook and myspace. In: Proceedings of AMCIS. Citeseer, pp 1–12

  • Elishar A, Fire M, Kagan D, Elovici Y (2012) Organizational intrusion, ASE Cyber Security Conference (CyberSecurity)

  • Elyashar A, Fire M, Kagan D, Elovici Y (2013) Homing socialbots: intrusion on a specific organization’s employee using socialbots. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, pp 1358–1365

  • Estrada E, Rodriguez-Velazquez J (2005) Subgraph centrality in complex networks. Phys Rev E 71(5):056,103

    Article  Google Scholar 

  • Facebook (2014) Company info. [last accessed on July 27th, 2014]. http://newsroom.fb.com/company-info/

  • Fire M, Tenenboim-Chekina L, Puzis R, Lesser O, Rokach L, Elovici Y (2013) Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst and Technol (TIST) 5(1):10

    Google Scholar 

  • Freeman L (1977) A set of measures of centrality based on betweenness, Sociometry :35–41

  • Gjoka M, Butts C, Kurant M, Markopoulou A (2011) Multigraph sampling of online social networks. Selected Areas in Communications. IEEE J 29 (9):1893–1905

    Google Scholar 

  • Hagberg A A, Schult D A, Swart P J (2008) Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conference (SciPy2008)

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11:10–18. http://doi.acm.org/10.1145/1656274.1656278

    Article  Google Scholar 

  • Illenberger J, Nagel K, Flötteröd G (2013) The Role of Spatial Interaction in Social Networks. Netw Spatial Economics 13(3):255–282

    Article  Google Scholar 

  • Jacobson E, Seashore S (1951) Communication practices in complex organizations. J Soc Issues 7(3): 28–40

    Article  Google Scholar 

  • Kilduff M, Brass D (2010) Organizational social network research: Core ideas and key debates. Acad Manag Ann 4(1):317–357

    Article  Google Scholar 

  • Kilduff M, Tsai W (2003) Social networks and organizations. Sage Publications Ltd

  • Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  Google Scholar 

  • Krackhardt D, Hanson J R (1993) Informal networks: the company behind the chart. Harv Bus Rev 71(4):104–11

    Google Scholar 

  • Krebs V (2002) Mapping networks of terrorist cells. Connections 24(3):43–52

    Google Scholar 

  • Lesser O, Tenenboim-Chekina L, Rokach L, Elovici Y (2013) Intruder or welcome friend: inferring group membership in online social networks. In: Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, pp 368–376

  • Lind P G, González M C, Herrmann H J (2005) Cycles and clustering in bipartite networks. Phys Rev E 72(5):056,127

    Article  Google Scholar 

  • Lindsay G (2013) Engineering serendipity. New York Times

  • McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. Computer Science Department Faculty Publication Series, p 3

  • McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: Homophily in social networks, Annual review of sociology

  • Mislove A, Marcon M, Gummadi K P, Druschel P, Bhattacharjee B (2007) Measurement and Analysis of Online Social Networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego

  • Naddafa Y, Mutyalab S (2010) Social network analysis and community mining in organizations based on email records

  • Newman M (2008) The mathematics of networks, The New Palgrave Encyclopedia of Economics

  • Newman M, et al. (2001) Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Phys Rev Ser E-64(1; PART 2):16,132–16,132

    Article  Google Scholar 

  • Newman M E (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23): 8577–8582

    Article  Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking, Bringing order to the web

  • Paradise A, Puzis R, Shabtai A (2014) Anti-reconnaissance tools: Detecting targeted socialbots. Internet Computing IEEE PP(99):1–1. doi:10.1109/MIC.2014.81

  • Provan K, Fish A, Sydow J (2007) Interorganizational networks at the network level: A review of the empirical literature on whole networks. J Manag 33 (3):479–516

    Google Scholar 

  • Pugh D, Hickson D, Hinings C, Turner C (1968) Dimensions of organization structure, Administrative science quarterly, pp 65–105

  • Rooksby J, Kahn A, Keen J, Sommerville I, Rooksby J (2009) Social networking and the workplace, The UK Large Scale Complex IT Systems Initiative, pp 1–39

  • Saramäki J, Kivelä M, Onnela J P, Kaski K, Kertesz J (2007) Generalizations of the clustering coefficient to weighted complex networks. Phys Rev E 75(2):027,105

    Article  Google Scholar 

  • Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    Article  Google Scholar 

  • Shetty J, Adibi J (2004) The enron email dataset database schema and brief statistical report. Information sciences institute technical report. University of Southern California, p 4

  • Shetty J, Adibi J (2005) Discovering important nodes through graph entropy the case of enron email database. In: Proceedings of the 3rd international workshop on Link discovery. ACM , pp 74–81

  • Sparrow M (1991) The application of network analysis to criminal intelligence: An assessment of the prospects. Soc Networks 13(3):251–274

    Article  Google Scholar 

  • Steinfield C, DiMicco J, Ellison N, Lampe C (2009) Bowling online: Social networking and social capital within the organization. In: Proceedings of the fourth international conference on Communities and technologies. ACM, pp 245–254

  • Tichy N, Tushman M, Fombrun C (1979) Social network analysis for organizations, Academy of Management Review, pp 507–519

  • Twitter (2013) Rest api rate limiting in v1.1. [last accessed on August 3th, 2014]. https://dev.twitter.com/docs/rate-limiting/1.1

  • Tyler J, Wilkinson D, Huberman B (2005) E-mail as spectroscopy: Automated discovery of community structure within organizations. Inf Soc 21(2):143–153

    Article  Google Scholar 

  • Wilkinson D, Huberman B (2004) A method for finding communities of related genes. Proc Natl Acad Sci USA 101(Suppl 1):5241

    Article  Google Scholar 

  • Wilson G, Banzhaf W (2009) Discovery of email communication networks from the enron corpus with a genetic algorithm using social network analysis. In: IEEE Congress on Evolutionary Computation, 2009. CEC’09. IEEE, pp 3256–3263

  • Zhan X, Ukkusuri S V, Zhu F (2014) Inferring urban land use using large-scale social media check-in data. Netw and Spatial Economics 14(3):647–667

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Fire.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fire, M., Puzis, R. Organization Mining Using Online Social Networks. Netw Spat Econ 16, 545–578 (2016). https://doi.org/10.1007/s11067-015-9288-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11067-015-9288-4

Keywords

Navigation