Synonyms

Communities in online social networks; Computational social science; Groups and communities discovery; Polarization in online social networks; Social media analysis and mining

Glossary

Computer science (CS):

Discipline based on a scientific and practical approach to computation and its applications.

Computational social science (CSS):

New discipline based on interdisciplinary investigation of the social universe on many scales, ranging from individual actors to the largest groupings, through the medium of computation (Cioffi-Revilla 2014).

Dunbar number:

Value of the cognitive limit to the number of people with whom a person can maintain stable social relationships (150).

Echo chamber:

“Enclosed” system in which information, ideas, or beliefs are amplified or reinforced by internal transmission and repetition.

Ego network:

Focal node (“ego”) and the nodes to which the ego is directly connected (friends or alters) plus the ties, if any, among the alters.

Group or community:

Set of two or more people who interact with one another, share similar traits, and collectively have a sense of belonging.

Machine learning (ML):

Subfield of computer science, which evolved from the study of pattern recognition and computational learning theory in artificial intelligence (AI).

Online social network (OSN):

Platform to build social relations among people who share similar personal and career interests, activities, backgrounds, or real-life connections (Buettner 2016). Alternatively, they are called social network sites (SNS).

Polarizing subgroup:

Set of people sharing similar points of view about a specific discussed topic.

Social group:

Bond-based group characterized by personal social relations among members.

Social network:

Structure made up of a set of actors, sets of dyadic ties, and interactions between individuals.

Social sciences (SS):

Set of academic disciplines concerning society and the relationships among individuals within it.

Topical group:

Identity-based group whose members share a common interest (topic).

Virtual world:

Computer-based simulated environment populated by users who simultaneously and independently explore the setting, participate in its activities, and interact with others. An online social network is an example of a virtual world, as is the web itself.

Definition

The analysis of communities in online social networks (OSNs) refers to the task of investigating, from both microscopic and macroscopic points of view, the organization of individuals into groups, their relationships, and interactions with other groups, and the behavior and the development of such groups within the whole OSN. In this work, we analyze topical communities, with a focus on polarizing topics, and we discuss a general analytical framework to study online communities along the social, spatial, and temporal dimensions.

Introduction

The understanding of user communities and their interactions in OSNs is a crucial task in many application fields, such as sociology, psychology, computer science, and business. In the past, sociological studies were mainly conducted through a modest number of surveys and questionnaires. Nowadays, OSNs allow scientists to investigate large volumes of very detailed data about millions of users. Indeed, the analysis of the social interactions in OSNs is interesting to shed light on human behavior within and beyond such virtual environments.

We present different approaches in studying groups and communities in OSNs, and we give an overview of the results achieved so far. We describe the characterizing features of OSNs such as topical groups, social drivers, and homophily. Furthermore, we look at polarizing communities and how users’ opinions shape the structure of these communities. Finally, we focus on the analytical dimensions to be taken into consideration when describing communities in OSNs, and we propose a framework to blend together the social, spatial, and temporal dimensions.

Need for Aggregation and Interaction

Man is a social animal. (Aristotle, 384–322 BC)

The Greek philosopher Aristotle, more than 2300 years ago, highlighted that the social nature of human beings urges us to self-organize into groups of different scales: family, tribe, and society. Our lives constantly depend on other people.

Human activities and social behaviors have always been a top area of investigation for anthropologists, sociologists, and psychologists. Animals in general exhibit social behaviors, embedded for instance in the concepts of territory and dominance. However, some social traits are exclusive to the human species, which is organized into a social network without analogous cases in the animal realm, mainly the prevalence of rationality-based decision making and use of a complex communication language (Barrett et al. 2007).

In fact, most human desires are based on social life. In developed countries, people have fulfilled psychological and safety needs—as classified by Maslow (1943)—such as food, water, sleep, and security. Beyond these needs, and possibly with an even stronger desire, humans look for a sense of belonging and love, esteem, and finally self-actualization. These all involve social interactions and they can hardly be obtained in isolation. Humans then, by their nature, organize themselves into social structures.

From the “Real World” to the “Virtual World”

Social Networks

Groups and social networks have been studied for decades. The small-world effect is one of the main findings (de Sola Pool and Kochen 1979), with several implications—for instance, the maximum distance between any two users in the social graph was quantified by Milgram in 6 hops according to his famous experiment (Milgram 1967). Moreover, by looking at interactions among people in social groups, researchers have pointed out the presence of strong and weak ties, which structure the network into tightly clustered communities, with different roles in information spreading (Granovetter 1973).

Recently, the focus of social studies has extended to the digital world, resulting in a “marriage” between social sciences and computer science. Novel network analysis techniques and large-scale computational approaches have been developed to analyze the behavior of individuals and communities in massive virtual contexts (Wasserman and Faust 1994).

Online Social Networks

The birth of OSNs in the late 1990s and their increasing popularity in the early 2000s have answered the human need for belonging even in the virtual world.

The success of OSNs was anticipated by the diffusion of virtual environments and the development of the web. In particular, virtual games have been precursors of OSNs. Messinger et al. (2008) described the historical progression of virtual worlds starting from arcade games, which started in 1972 with the Pong game by Atari Interactive. After that, the path toward OSNs was marked by the introduction of console systems (1986), followed by LAN games, which created the concept of digital communities through internet connectivity. Gaming environments progressively integrated additional social features with unstructured games and player-generated content (e.g., The Sims). Social networking sites are a further evolution in the development of open virtual worlds, with properties that make them equivalent, or at least comparable, to real-world environments.

In OSNs, an individual creates his own profile, publishes content, and interacts with other users through discussions and actions (resharing content, liking, disliking). Users can also set friendship or subscription (following) links with other users.

The world population is 7.4 billion people; among them, 3.4 billion (46%) are internet users and 2.3 billion (31%) are active social media users (Global Web Index data, January 2016). These statistics suggest that there is a large interest in joining OSNs. Oh et al. (2014) have shown positive associations among the number of friends in OSNs, supportive interactions, affect, perceived social support, sense of community, and life satisfaction. On average, the time spent by users in these networking platforms is significant: almost 2 h per day according to the Global Web Index data.

To researchers willing to investigate human behaviors in social environments, such a huge volume of information is a gold mine, with no equivalent in the “real world.”

Structure of Online Social Networks

Today, OSNs represent a significant portion of web traffic, and the pervasive use of these platforms, together with the possibility of tracking any user action, has attracted scientists interested in investigating OSNs’ properties.

The first studies on OSNs explored the topology and the structure of these large networks. From a topological point of view, an OSN can be considered as a graph where the nodes are users and the edges are connections (friendships or subscription/follower relations). Many works have analyzed OSNs from a structural point of view, showing again a small-world effect (Buyukkokten et al. 2005), i.e., a high clustering coefficient and a short average path length (average degree of separation from 4 to 5) in different OSNs: Flickr, LiveJournal, Orkut, YouTube (Mislove et al. 2007), Twitter (Kwak et al. 2010), Facebook (Backstrom et al. 2012; Ugander et al. 2011; Wilson et al. 2012), and Google+ (Magno et al. 2012). Other studies have evaluated the node in/out degree distribution (which typically is a power law) (Ugander et al. 2011; Wilson et al. 2012) and the degree correlation, thus detecting the presence of a large, strongly connected component (Kumar et al. 2010). Finally, interesting works have investigated the evolution of social graphs over time (Wilson et al. 2012).

Moreover, online social microblogging platforms and social networks have proven to be a rich source of information to track and monitor the behavior of users over time. Interactions in OSNs have been studied by weighting the social graph through quantitative considerations on the strength of social ties. These graphs, called interaction graphs, differ from social graphs since they include quantifying mechanisms about the intensity of the connections. The interaction strength in a social network is a mix of the amount of time spent together, intimacy, emotional intensity, and reciprocal services (Granovetter 1973), but in most cases it has been quantified in real OSN applications only in terms of the duration and frequency of contacts (e.g., in Wilson et al. 2012), even though there have been theoretical studies, starting with Marsden and Campbell (1984), which have tried to translate qualities such as intensity and intimacy into quantity values. Interaction graphs in OSNs have been studied, showing microproperties related to ego networks (looking, for instance, at close friends, inactive relationships, homophily, turnover of friendships) and macroproperties related to the whole network (diameters, degree distributions, clustering coefficients), which are generally more stable (e.g., in Twitter (Arnaboldi et al. 2013) and Facebook (Wilson et al. 2012)).

The following sections present the main results achieved in analyzing communities in OSNs and, in particular, they focus on community definition, considering both topical and social groups. A further layer of analysis is added by looking at user interactions. We then present some results aimed at understanding opinions and tracking polarized communities over time. Finally, we look at the main dimensions that have to be taken into consideration in the analysis of groups in the digital world, and we discuss how these dimensions can be used in a unified approach.

Key Points

We analyze communities in OSNs by looking at them through different perspectives, ranging from a computer science point of view to a sociological point of view. We discriminate between social and topical groups to focus in particular on the latter, and we show how to further characterize subcommunities on the basis of user opinions. We look at polarization of users when discussing around a topic and how opinion-based groups can be tracked over time in accordance with the topic’s evolution. Finally, we give an overview of some analytical dimensions to be taken into consideration in order to characterize groups in OSNs.

In particular:

  • We propose an additional topical group classification based on polarization.

  • We describe a method to track polarizing communities and topics for OSNs.

  • We analyze topical communities and, in particular, we focus on alternative and isolated groups; we show how deviant community analysis can be extended to take into consideration the relationship with the whole social network.

  • We propose an analytical framework to describe communities in OSNs by looking at different dimensions: temporal, social, and spatial.

Historical Background

In social sciences, a community, or a group, is defined as a set of two or more people who interact with one another, share similar traits, and collectively have a sense of belonging. This definition implies three main concepts, which have been extensively debated: interdependence, homophily, and social identity. These characteristics shift the definition of community beyond the simplistic idea of a group as an aggregation of individuals and entail a degree of subjectivity, which makes the task of identifying communities hard.

Indeed, interdependence and homophily can be measured, and they have been studied in a quantitative way (Aiello et al. 2012; Bisgin et al. 2010). On the other hand, the concept of social identity, which has been extensively studied—first, by Tajfel (1982)—is hard to frame and has been an object of investigation. The subjectivity related to this concept is hardly treatable within the computer science framework. The concept of group membership as a matter of shared self-definition is predominant (Turner 1981), but it is hardly captured by computational studies that base their findings on cohesive interpersonal relationships by looking at interaction patterns. The matching between sociological findings and computational approaches to quantify them is still a challenging research area.

In the following sections, we discuss the computer science perspective in defining and analyzing communities, and we explore how social sciences and computational approaches can be matched.

Computer Science and Social Sciences

Social Sciences

The analysis of social networks is an interdisciplinary academic field, which emerged from social psychology, sociology, statistics, and graph theory. Social structures such as groups and dyadic ties are analyzed to study human behavior and social interactions. Social studies have successfully defined several theoretical models able to explain the patterns observed in these structures (Wasserman and Faust 1994). In fact, social network and community analysis is currently one of the major cores in contemporary sociology and is also employed in a number of other formal sciences.

Computer Science

In computer science, the term community is more frequently used than the term group, which has been widely adopted in social sciences. In this work, we use both terms without distinction. According to computer science terminology, the discovery of communities is related to the task of clustering the nodes of the graph used to represent the social network. People are mapped to nodes of a graph, and edges are created according to their interactions. Borrowing tools from clustering and theoretical graph analysis, a number of techniques have thus been used to detect communities in social networks (e.g., the Girvan–Newman method (Girvan and Newman 2002) and the modularity-based method (Newman and Girvan 2004)).

Such “algorithmic communities” emerge from a data-driven approach, which is now considered a “paradigm shift” in the machine learning (ML) field. This data-driven phenomenon was described by Kuhn (1962) as a phenomenon in which an abrupt shift in the values, goals, and methods of the scientific community occurs (Cristianini 2014). Some success stories (from spelling correction to face recognition, including question answering, machine translation, and information retrieval) have shown how the data-driven approach centered on machine learning technologies is the winning one in many applications (Cristianini 2014).

Community detection techniques have been largely employed in recent years to describe the structure of complex social systems. However, these algorithmic communities are totally defined on the basis of some graph properties (e.g., density) and discard the subjective concept of the identity of the community members. This leads to detection of groups of users who are not always aware of being members of them. Groups detected algorithmically (detected groups) do not correspond to user-generated groups (declared groups), which are considered in social sciences. Attempts to evaluate this mismatch have been made (Aiello 2015).

Computational Social Science

Recently a new discipline based on quantitative understanding of complex social systems (Cioffi-Revilla 2010) has been born. Computational social science (CSS) is the bridge between social sciences (SS) and computer science (CS), based on the study of what is proper in social studies through computational techniques and approaches developed often in the computer science community. CSS can benefit from the presence of huge volumes of data on society’s everyday behavior due to the significant integration of technology into people’s lives (Conte et al. 2012).

Social–Spatiotemporal Analysis of Topical and Polarized Communities in Online Social Networks

Users spend a considerable amount of time in OSNs, creating original content (posting), sharing multimedia content from other users (sharing), discussing (messages and comments), and reinforcing external content (liking). Communities emerge around different topics of interaction, and analysis of the social aggregations in a virtual context is interesting to shed light on human behavior.

In the following sections, after reviewing some basic concepts related to the communities in OSNs, we discuss recent research results along three important analytical dimensions: social, spatial, and temporal. Furthermore, we include a novel orthogonal dimension: polarization.

Communities in Online Social Networks

Homophily and Diversity

Homophily is a main driver that characterizes communities in both real and digital contexts. Homophily induces similarity between members of communities: “birds of a feather flock together” (McPherson et al. 2001). This is due to two cofounding principles: (i) selection mechanisms (preferences are connected to similar users’ traits); and (ii) social contagion (how much linked people influence each other) (Leenders 1997). Homophily has been widely studied in OSNs showing correlation between friendships and interests (Aiello et al. 2012) or between profile information and communication patterns (Leskovec and Horvitz 2008). Local proximity and age are another example of homophily factors in OSNs (Kumar et al. 2005).

On the other hand, it has been shown that diversity in the discussed topics or in the shared content favors the stability of a community as group members continue being stimulated by new input (Ludford et al. 2004). Models of growth and longevity of groups in digital contexts have been also investigated (Backstrom et al. 2006).

Size, Membership, and Barriers

The group size affects the dynamics of interactions. The phenomenon has been deeply studied in real-world social networks by Robin Dunbar (1992). He correlated the volume of the neocortex in primates with the number of social stable relationships they have. He adapted the same theory to humans (Dunbar 1993), concluding that the number of people with whom a person can maintain stable social relationships is about 150. Similar results have been found in the Facebook friendship network, showing similarities between ego network structures in OSNs and in real life (Arnaboldi et al. 2012). Ego networks are graphs where the central node is the studied user and all of the other nodes connected to him/her represent his/her friends. Similarly, Goncalves et al. (2011) performed comparable experiments on Twitter, measuring the average interaction strength.

Topical and Social Groups

Two main processes can be identified in the development of communities in social networks: users create ties based on common interests or based on personal social relationships. The resulting kinds of group have been referred to as common identity groups or common bond groups (Prentice et al. 1994). We also adopt the lexicon proposed by Martin-Borregon et al. (2014) and refer to those kinds of group as topical and social.

Members of topical groups discuss a specific topic or a specific area of interest, and they do not usually have personal relationships with each other. Conversely, members of social groups tend to be reciprocal in their interactions with other members, and discussions are focused on multiple topics. One implication is that social groups are vulnerable to turnover, since personal relationships are present and they can influence user departure. Topical groups, on the other hand, are robust to departures and they are open to accepting new members (Aiello 2015).

To distinguish between the different kinds of group, Aiello suggested quantifying the reciprocity of interactions and the topical width of the discussions (Aiello 2015). Typically, greater reciprocity indicates a higher probability that the group is social, while a small topical width indicates topical groups. These variables integrate social and content-based aspects. In practice, groups can be both topical and social.

From a computer science point of view, instead, when we refer to topics, we mean a multinomial distribution of words that represents a coherent concept in a set of text documents. To extract the most important topics from a piece of text (a topic selection task), different techniques have been developed: the most frequently applied method is the unsupervised latent Dirichlet allocation (LDA) (Blei et al. 2003). Novel methods still based on LDA have been proposed recently (Blei and Lafferty 2007; McAuliffe and Blei 2008; Wang et al. 2009).

Moreover, researchers have explored the relationship between diffusion of a topic and network structure (Barbieri et al. 2013), focusing on the structural and dynamic properties of specific topical communities such as groups supporting political parties (Conover et al. 2011) or groups discussing various conspiracy theories (Bessi et al. 2015), rumors and hoaxes (Ratkiewicz et al. 2010), deviant behaviors (Coletto et al. 2016a), or more ordinary topics such as fashion or sport. In the section “Social Dimension and Polarization” we describe in more detail how we can study a topical group in an OSN. In particular, we discuss the case of deviant communities, which are highly topical.

Topical Groups and Polarization

An interesting fine-grained investigation can be obtained by focusing on user opinion. When users discuss a specific topic, they cluster into subgroups. If the topic is controversial, such groups are strongly polarized. The relation between the topic and polarization is dynamic: users generally start discussing a topic and, around that, different opinions emerge.

Understanding opinion and polarization is a challenging task, and it has recently received a lot of attention in the information retrieval and data mining communities. Analysis of polarization is useful to investigate the evolution of groups, and sometimes it can be used to predict the behavior of users or their activities, e.g., predicting vote intention among Twitter users (Coletto et al. 2015) or understanding product preferences for marketing aims (Leskovec et al. 2007).

In Coletto et al. (2016c), we proposed an iterative procedure to detect polarized users and to monitor topic evolution over time. We focused on the frequent scenario where users interact and produce content according to a set of polarization classes. By polarization classes, we mean subjects that require the user to side exclusively with one part. Political parties are typical examples of these classes: users discuss several parties, and their opinion changes over time, but they can eventually vote for only one. Other examples include brand analysis, product comparison, and opinion mining in general. Topic detection and tracking (TDT) (Allan 2012) has been widely explored within the scope of news stream analysis (Walls et al. 1999). We are interested in content and user tracking for polarized users. This notion is connected with the concept of controversy in social media, which has been studied, mostly in political contexts, using data coming from different sources (Adamic and Glance 2005; Coletto et al. 2017b; Garimella et al. 2016; Gysel et al. 2015; Makazhanov et al. 2014).

In these scenarios, the polarization classes are known, and some limited information may also be available, e.g., a set of relevant keywords. By leveraging such limited knowledge, several challenging tasks can be tackled:

  1. 1.

    How to identify the users being polarized (or not) according to those classes

  2. 2.

    How to identify the most relevant subtopics being discussed among such users

  3. 3.

    How to monitor the evolution of such user communities and their online discussions over time

Those tasks are all very challenging, as the available knowledge may be approximate or insufficient, and it may also become obsolete over time. Therefore, the classification into polarization classes should be able to self-update continuously by catching upcoming relevant users and discussion topics.

We have described PTR (Polarization TRacker) (Coletto et al. 2016c) for the discovery of polarized users in a Twitter stream. While there exist several works about community detection and trending topic tracking, we have proposed a novel setting where the number of communities is known, but very little information is provided (a keyword per class only) and those communities are competing with each other.

The PTR algorithm is illustrated in Algorithm 1

Algorithm. 1 PTR Algorithm

. As input the algorithm receives an initial set of polarized keywords \( \left\{{H}_c^0\right\} \) (initial seed) for each polarized class c. For instance, in Twitter the initial seeds can be selected by analyzing the most frequent hashtags and manually selecting a few per class. One of the benefits of PTR is that after only a few iterations the results are less dependent on the size of the original seed, since new relevant keywords are continuously discovered.

The algorithm iterates the following classification steps:

  • TweetClassify: a message/post is said to be polarized to class c if it does not mention any keyword from classes other than c, which are denoted with \(\left\{ {H_c } \right\}_{c\, \in \, \mathscr C}\).

  • UserClassify: a user is polarized to one class c only if his/her polarized tweets of class c are at least twice the number of the polarized tweets of any other class.

  • HashTagsClassify: a keyword h is assigned to one class c if S c (h) > β· S c′ (h), ∀c′c, where S c (h) is the joint probability of observing h in messages polarized to class c, and not observing h in messages polarized to other classes.

The procedure iterates classifying tweets, users, and keywords (e.g., Twitter hashtags) until convergence. Note that the whole content stream is taken into consideration at each iteration, including those users/posts/keywords that were not previously labeled with a polarization class. In Coletto et al. (2016c), the accuracy of the method is greater in comparison with a k-means baseline in terms of F-measure by a percentage from 7% to 71% in relation to the specific dataset considered.

The PTR algorithm can be used to perform polarization or sentiment analysis, to discover polarized communities, and to study their structural evolution over time in different contexts. In Coletto et al. (2016b), for example, it was used to study the opinions of users about the social phenomenon of migration and the refugee crisis.

A temporal version of the proposed algorithm, TPTR (temporal PTR), is also described in the previously mentioned paper (Coletto et al. 2016c). TPTR is able to track users and topics over time.

A similar approach, aimed at dynamically tracking polarization in OSNs, was proposed by Lu et al. (2015). They presented an efficient optimization-based opinion bias propagation method over the social/information network.

Analytical Dimensions

So far we have pointed out different ingredients that compose the concept of communities in the digital context. Moreover, to analyze a community in an OSN, we might look at different dimensions: temporal, spatial, and social (Martin-Borregon et al. 2014). We extend the three-dimensional characterization of groups proposed by Martin-Borregon et al. (2014) with the concept of polarization of users around a topic.

Temporal Dimension

The temporal dimension is crucial in studying dynamics of groups in a social network. Aiello (2015) has proposed to classify groups into three classes:

  • Short-lived: The groups in this category show a low level of activity after being created, soon becoming inactive.

  • Evergreen: The evergreen cluster is characterized by groups created at a certain point in the past, which have been growing in number of users and content produced.

  • Bursty: These are groups with the lowest skewness and big burstiness, especially in the number of users joining. Usually the highest activity is registered at the beginning of their life and from time to time they experience content production. Some of these groups are related to recurring (e.g., yearly) events that regularly attract the attention of users.

Moreover, we have already described the importance of time in dynamically tracking polarized communities (through TPTR) (Coletto et al. 2016c). To develop realistic predictive models, for instance, we have to take into consideration in a proper way dynamical changes in the network (Tatemura 2000).

From a microscopic point of view, temporal analyses of interactions among users in OSNs are fundamental. Miritello et al. have worked extensively on temporal patterns of human communications and their influence on the spreading of information in social networks (Miritello et al. 2011; Miritello 2013a, b). In OSNs, study on the evolution in time of interactions has been performed in recent research works: Viswanath et al. (2009) showed a dynamical study for the case of Facebook. Many studies have led to insights on how an interaction graph is structurally different from the social network itself, and the temporal component is important to track the strength of the relationships. Over time, social links can grow stronger or weaker, and this knowledge is crucial to characterize communities and their evolution.

Furthermore, understanding dynamical phenomena such as cascades and flames in conversations or linking communication patterns with events though time or content is equally important. The task, however, is challenging, since the variables to consider are innumerable and the data processing is not trivial. In addition to that, integrating the dynamical characteristic of the interactions and external data related to events is a very complex task.

Spatial Dimension

There is a significant increase of interest in collecting and analyzing geo-located data from OSNs. Typically, OSNs enable different location information for users and for actions. Usually there are two classes of geographical information: the locations of the users (GPS, user description) or places mentioned in the interactions (Coletto et al. 2016b).

Several works have studied different aspects of the geographical dimension of OSNs: a broad study on this argument was reported in Scellato et al. (2010). The authors proposed a framework to compare social networks based on two new measures: one capturing the geographical closeness of a node with its network neighborhood and a clustering coefficient weighted on the geographical distance between nodes.

Liben-Nowell et al. (Kumar et al. 2005) have found a strong correlation between location and friendship. Twitter geo-located posts have been studied by Takhteyev et al. (2012) to understand how Twitter social ties are affected by distance. Linked users are identified as “egos” and “alters” and the distance between them is analyzed by considering the correlation with the air travel connection distance and with national borders and languages. An analogous objective was the focus of Kulshrestha et al. (2012), who inferred the locations of 12 million Twitter users in a worldwide dataset. In contrast to the previous paper, they studied the correlation between the Twitter population and the socioeconomic status of a country, suggesting that highly developed countries are characterized by greater Twitter usage.

Finally, the geographical properties of an OSN have also been shown to be useful to study migration phenomena (Coletto et al. 2016b; Hawelka et al. 2014; Zagheni et al. 2014).

Social Dimension and Polarization

Even though it is computationally hard to model the concept of social identity, this is the base driver for human association. Groups give us a sense of belonging to the social world, which is a source of pride and self-esteem. In order to increase our self-image, we enhance the status of the group to which we belong, or we discriminate and hold prejudices against the groups to which we do not belong (Turner 1981). The notion of social closeness and sense of belonging is tightly linked to cultural proximity and sharing opinions and values. Communities emerge in digital contexts, as in real life, following specific drivers classified as common identity and common bond (Prentice et al. 1994).

A further level of differentiation is the polarization process, which creates subcommunities based on sharing values and opinions around a specific topic. Polarization can be studied by looking at the interactions among these different subcommunities and by evaluating the controversy in the conversations among users belonging to different subgroups. On the web, and in particular in the context of OSNs, controversy has been studied from different perspectives (Coletto et al. 2017b; Dori-Hacohen 2015; Dori-Hacohen and Allan 2013; Garimella et al. 2016).

Moreover, it is interesting to associate polarization with topics. The problem has been partially studied (Jo and Oh 2011), proposing some models that associate topic modeling with sentiment analysis: the Topic Sentiment Mixture (TSM) model (Mei et al. 2007), Multi-Aspect Sentiment (MAS) model (Titov and McDonald 2008), and Joint Sentiment/Topic (JST) model (Lin and He 2009). Briefly, in TSM, each word is connected with a specific topic and independently with a sentiment; in MAS, the set of aspects to be evaluated is fixed and sentiment is modeled as a probability distribution for each aspect; in JST, sentiment is integrated with a topic in a single language model (Jo and Oh 2011). Alternative recent interesting approaches proposed in this field are a semisupervised model based on conditional random fields (Marcheggiani et al. 2014) and a hierarchical text classification method (Esuli et al. 2008). The models still are not flexible enough to catch evolution of topics and opinions in conversation in OSNs, but this is still an open issue.

Deviant Behaviors

Online social media are also favorable ecosystems for the formation of topical communities centered on matters that are not commonly taken up by the general public because of the embarrassment, discomfort, or shock they may cause (Coletto et al. 2016a).

Those are communities that depict or discuss what are usually referred to as deviant behaviors (Clinard and Meier 2015)—conducts that are commonly considered inappropriate because they are somehow violative of society’s norms or moral standards that are shared among the majority of the members of society. Pornography consumption, drug use, excessive drinking, illegal hunting, eating disorders, or any self-harming or addictive practice are all examples of deviant behaviors. Many of them are represented, to different extents, on social media (De Choudhury 2015; Haas et al. 2010; Morgan et al. 2010). However, since all of these topics touch upon different societal taboos, the commonsense assumption is that they are embodied either in niche, isolated social groups, or in communities that might be quite numerous but whose activity runs separately from the mainstream social media life. In line with this belief, researchers have mostly considered those groups in isolation, focusing predominantly on the communication patterns among community members (Tyson et al. 2015) or, from a sociological perspective, on the motivations of their members and on the impact of the groups’ activities on their lives and perceptions (Attwood 2005).

In reality, people who are involved in deviant practices are not segregated outcasts but part of the fabric of the global society. As such, they can be members of multiple communities and interact with very diverse sets of people, possibly exposing their deviant behavior to the public. In Coletto et al. (2016a), we aimed to go beyond previous studies that looked at deviant groups in isolation by observing them in context. In particular, we wanted to shed light on a matter that is relevant to both network science and social sciences: how much deviant groups are structurally secluded from the rest of the social network, and what the characteristics of their subgroups who build ties with the external world are (Coletto et al. 2016a).

In Coletto et al. (2016a), we focused on the behavior of adult content consumption. Public depiction of pornographic material is considered inappropriate in most cultures, yet the number of consumers is strikingly high (Sabina et al. 2008). Despite that, we were not aware of any study about online communities that produced that type of content interfacing with the rest of the social network. We studied this phenomenon in a large dataset from Tumblr, considering a big sample of the follow and reblog networks for a total of more than 130 million nodes and almost 7 billion dyadic interactions. To spot the community that generated adult content, we also recurred to a large sample of 146 million queries from a 7-month query log from a very popular search engine, out of which we built an extensive dictionary of terms related to adult content that we made publicly available.

The results showed that:

  • The deviant network is a tightly connected community structured into subgroups, but it is linked with the rest of the network with a very large number of ties.

  • The vastest amount of information originating in the deviant network is produced from a very small core of nodes but spreads widely across the whole social graph, potentially reaching a large audience of people who might unwillingly see that type of content.

Echo Chambers

Different polarized clusters usually interact in exchanging points of view, but often it is the case of echo chambers, which represent a situation in which information, ideas, and opinions are amplified or reinforced by transmission and repetition inside an “isolated” system, where alternative views are censored or underrepresented.

This phenomenon has been studied in particular in the context of misinformation, by analyzing interactions between scientific communities and groups focused on conspiracy theories. Bessi et al. (2015) studied the scientific community and the conspiracy community in Facebook, and the results showed that polarized communities emerge around distinct types of content and usual consumers of conspiracy news become more self-contained and focused on their specific content.

Key Applications

Toward a Unified Approach

Analytical Framework

The social, spatial, and temporal dimensions, as well as the topic and polarization dimensions, which we have discussed so far, can be used to analyze communities in OSNs in a general and comprehensive way.

In Coletto et al. (2017a), we proposed an analytical framework to investigate social trends in large tweet collections by extracting and crossing information about the following three dimensions: time, location, and polarization. The methodology described how to: (1) extract relevant spatial information; (2) enrich data with the sentiment of the message and of the user (retrieved in an automatic iterative way through machine learning); and (3) perform multidimensional analyses considering content and locations in time.

The approach is general and can be easily adapted to any topic of interest involving multiple dimensions.

Micromarriage and Macromarriage

To study social processes, it is furthermore useful to merge dyadic interaction analysis with the study of multiuser interactions.

Social networks are complex systems and the relationships between atomic components create emergent behaviors that can hardly be modeled directly from the composition of the individual parts (Aiello 2015).

Group analysis then is very important to understand the properties of the collectivity in a macroperspective. The challenge is to merge the analysis of dyadic interactions and one-to-one communication patterns (microanalysis) with more general comprehension of multiuser aggregations (macroanalysis) to fully understand social behaviors.

Research on communities in OSNs through the abovementioned dimensions could create the bridge between individual characterization of the users and their interactions and the understanding of collective behaviors that are detected on a higher level in the social network.

Isolation and Spreading

Finally, it is important to study not only the community and its properties but also its relation to the rest of the network, as was done, for instance, in Coletto et al. (2016a). The authors studied a community in an OSN in relation to all of the network. Usually this is not done because of the idea that some communities are isolated or for the lack of broader data.

Future Directions

We have analyzed communities in online social networks from different angles. We have discussed:

  • Why people organize into groups

  • Some reasons for the success of online social networks and the digital environment

  • How people interact in online social networks

  • How communities are detected in online social networks

  • The computer science and social sciences perspectives in defining groups

  • The success of computational social science and the data-driven approach

  • The typology of communities (topical and social)

  • The structure of communities in polarized subgroups

  • How to track polarization in time and topic evolution

  • Some examples of topical communities (in particular, we proposed the study of a deviant community and its relationships with the whole network)

  • The importance of time, place, social links, and polarization in analyzing communities in online social networks

This work summarizes the recent advances in this topic, but still many challenges are open. We have contributed to exploring this area by proposing new methods and analyses, but there are many steps that must be taken by the research community to create a robust framework in order to achieve comprehensive studies of interactions among groups in online social networks and their evolution.

In fact, this research area may contribute to creating bridges between the computer science and social science communities in order to develop new meaningful research through computational tools by merging the skills and the knowledge of the two domains.

The study of people and how they interact, both in the real-world case and in the digital environment, is a complicated task because humans are heterogeneous systems and, through their interactions, each system is linked to the others in an even more complex relation. We believe that a successful approach should foster the integration of the different tools and methodologies made available by the research conducted so far in the fields of both social and computational sciences, thus supporting multidimensional analyses as discussed before.

The presence of massive datasets and the continuous increase of new social traces on the one side, and the evolution of methods and technology on the other side, will provide more keys to better deal with the complexity that is the foundational pillar of social phenomena.

Cross-References