Glossary
- Computer science (CS):
-
Discipline based on a scientific and practical approach to computation and its applications.
- Computational social science (CSS):
-
New discipline based on interdisciplinary investigation of the social universe on many scales, ranging from individual actors to the largest groupings, through the medium of computation (Cioffi-Revilla 2014).
- Dunbar number:
-
Value of the cognitive limit to the number of people with whom a person can maintain stable social relationships (≈150).
- Echo chamber:
-
“Enclosed” system in which information, ideas, or beliefs are amplified or reinforced by internal transmission and repetition.
- Ego network:
-
Focal node (“ego”) and the nodes to which the ego is directly connected (friends or alters) plus the ties, if any, among the alters.
- Group or community:
-
Set of two or more people who interact with one another, share similar traits, and collectively have a sense of belonging.
- Machine learning (ML):
-
Subfield of computer science, which evolved from the study of pattern recognition and computational learning theory in artificial intelligence (AI).
- Online social network (OSN):
-
Platform to build social relations among people who share similar personal and career interests, activities, backgrounds, or real-life connections (Buettner 2016). Alternatively, they are called social network sites (SNS).
- Polarizing subgroup:
-
Set of people sharing similar points of view about a specific discussed topic.
- Social group:
-
Bond-based group characterized by personal social relations among members.
- Social network:
-
Structure made up of a set of actors, sets of dyadic ties, and interactions between individuals.
- Social sciences (SS):
-
Set of academic disciplines concerning society and the relationships among individuals within it.
- Topical group:
-
Identity-based group whose members share a common interest (topic).
- Virtual world:
-
Computer-based simulated environment populated by users who simultaneously and independently explore the setting, participate in its activities, and interact with others. An online social network is an example of a virtual world, as is the web itself.
Definition
The analysis of communities in online social networks (OSNs) refers to the task of investigating, from both microscopic and macroscopic points of view, the organization of individuals into groups, their relationships, and interactions with other groups, and the behavior and the development of such groups within the whole OSN. In this work, we analyze topical communities, with a focus on polarizing topics, and we discuss a general analytical framework to study online communities along the social, spatial, and temporal dimensions.
Introduction
The understanding of user communities and their interactions in OSNs is a crucial task in many application fields, such as sociology, psychology, computer science, and business. In the past, sociological studies were mainly conducted through a modest number of surveys and questionnaires. Nowadays, OSNs allow scientists to investigate large volumes of very detailed data about millions of users. Indeed, the analysis of the social interactions in OSNs is interesting to shed light on human behavior within and beyond such virtual environments.
We present different approaches in studying groups and communities in OSNs, and we give an overview of the results achieved so far. We describe the characterizing features of OSNs such as topical groups, social drivers, and homophily. Furthermore, we look at polarizing communities and how users’ opinions shape the structure of these communities. Finally, we focus on the analytical dimensions to be taken into consideration when describing communities in OSNs, and we propose a framework to blend together the social, spatial, and temporal dimensions.
Need for Aggregation and Interaction
Man is a social animal. (Aristotle, 384–322 BC)
The Greek philosopher Aristotle, more than 2300 years ago, highlighted that the social nature of human beings urges us to self-organize into groups of different scales: family, tribe, and society. Our lives constantly depend on other people.
Human activities and social behaviors have always been a top area of investigation for anthropologists, sociologists, and psychologists. Animals in general exhibit social behaviors, embedded for instance in the concepts of territory and dominance. However, some social traits are exclusive to the human species, which is organized into a social network without analogous cases in the animal realm, mainly the prevalence of rationality-based decision making and use of a complex communication language (Barrett et al. 2007).
In fact, most human desires are based on social life. In developed countries, people have fulfilled psychological and safety needs—as classified by Maslow (1943)—such as food, water, sleep, and security. Beyond these needs, and possibly with an even stronger desire, humans look for a sense of belonging and love, esteem, and finally self-actualization. These all involve social interactions and they can hardly be obtained in isolation. Humans then, by their nature, organize themselves into social structures.
From the “Real World” to the “Virtual World”
Social Networks
Groups and social networks have been studied for decades. The small-world effect is one of the main findings (de Sola Pool and Kochen 1979), with several implications—for instance, the maximum distance between any two users in the social graph was quantified by Milgram in 6 hops according to his famous experiment (Milgram 1967). Moreover, by looking at interactions among people in social groups, researchers have pointed out the presence of strong and weak ties, which structure the network into tightly clustered communities, with different roles in information spreading (Granovetter 1973).
Recently, the focus of social studies has extended to the digital world, resulting in a “marriage” between social sciences and computer science. Novel network analysis techniques and large-scale computational approaches have been developed to analyze the behavior of individuals and communities in massive virtual contexts (Wasserman and Faust 1994).
Online Social Networks
The birth of OSNs in the late 1990s and their increasing popularity in the early 2000s have answered the human need for belonging even in the virtual world.
The success of OSNs was anticipated by the diffusion of virtual environments and the development of the web. In particular, virtual games have been precursors of OSNs. Messinger et al. (2008) described the historical progression of virtual worlds starting from arcade games, which started in 1972 with the Pong game by Atari Interactive. After that, the path toward OSNs was marked by the introduction of console systems (1986), followed by LAN games, which created the concept of digital communities through internet connectivity. Gaming environments progressively integrated additional social features with unstructured games and player-generated content (e.g., The Sims). Social networking sites are a further evolution in the development of open virtual worlds, with properties that make them equivalent, or at least comparable, to real-world environments.
In OSNs, an individual creates his own profile, publishes content, and interacts with other users through discussions and actions (resharing content, liking, disliking). Users can also set friendship or subscription (following) links with other users.
The world population is ≈7.4 billion people; among them, ≈3.4 billion (46%) are internet users and ≈2.3 billion (31%) are active social media users (Global Web Index data, January 2016). These statistics suggest that there is a large interest in joining OSNs. Oh et al. (2014) have shown positive associations among the number of friends in OSNs, supportive interactions, affect, perceived social support, sense of community, and life satisfaction. On average, the time spent by users in these networking platforms is significant: almost 2 h per day according to the Global Web Index data.
To researchers willing to investigate human behaviors in social environments, such a huge volume of information is a gold mine, with no equivalent in the “real world.”
Structure of Online Social Networks
Today, OSNs represent a significant portion of web traffic, and the pervasive use of these platforms, together with the possibility of tracking any user action, has attracted scientists interested in investigating OSNs’ properties.
The first studies on OSNs explored the topology and the structure of these large networks. From a topological point of view, an OSN can be considered as a graph where the nodes are users and the edges are connections (friendships or subscription/follower relations). Many works have analyzed OSNs from a structural point of view, showing again a small-world effect (Buyukkokten et al. 2005), i.e., a high clustering coefficient and a short average path length (average degree of separation from 4 to 5) in different OSNs: Flickr, LiveJournal, Orkut, YouTube (Mislove et al. 2007), Twitter (Kwak et al. 2010), Facebook (Backstrom et al. 2012; Ugander et al. 2011; Wilson et al. 2012), and Google+ (Magno et al. 2012). Other studies have evaluated the node in/out degree distribution (which typically is a power law) (Ugander et al. 2011; Wilson et al. 2012) and the degree correlation, thus detecting the presence of a large, strongly connected component (Kumar et al. 2010). Finally, interesting works have investigated the evolution of social graphs over time (Wilson et al. 2012).
Moreover, online social microblogging platforms and social networks have proven to be a rich source of information to track and monitor the behavior of users over time. Interactions in OSNs have been studied by weighting the social graph through quantitative considerations on the strength of social ties. These graphs, called interaction graphs, differ from social graphs since they include quantifying mechanisms about the intensity of the connections. The interaction strength in a social network is a mix of the amount of time spent together, intimacy, emotional intensity, and reciprocal services (Granovetter 1973), but in most cases it has been quantified in real OSN applications only in terms of the duration and frequency of contacts (e.g., in Wilson et al. 2012), even though there have been theoretical studies, starting with Marsden and Campbell (1984), which have tried to translate qualities such as intensity and intimacy into quantity values. Interaction graphs in OSNs have been studied, showing microproperties related to ego networks (looking, for instance, at close friends, inactive relationships, homophily, turnover of friendships) and macroproperties related to the whole network (diameters, degree distributions, clustering coefficients), which are generally more stable (e.g., in Twitter (Arnaboldi et al. 2013) and Facebook (Wilson et al. 2012)).
The following sections present the main results achieved in analyzing communities in OSNs and, in particular, they focus on community definition, considering both topical and social groups. A further layer of analysis is added by looking at user interactions. We then present some results aimed at understanding opinions and tracking polarized communities over time. Finally, we look at the main dimensions that have to be taken into consideration in the analysis of groups in the digital world, and we discuss how these dimensions can be used in a unified approach.
Key Points
We analyze communities in OSNs by looking at them through different perspectives, ranging from a computer science point of view to a sociological point of view. We discriminate between social and topical groups to focus in particular on the latter, and we show how to further characterize subcommunities on the basis of user opinions. We look at polarization of users when discussing around a topic and how opinion-based groups can be tracked over time in accordance with the topic’s evolution. Finally, we give an overview of some analytical dimensions to be taken into consideration in order to characterize groups in OSNs.
In particular:
-
We propose an additional topical group classification based on polarization.
-
We describe a method to track polarizing communities and topics for OSNs.
-
We analyze topical communities and, in particular, we focus on alternative and isolated groups; we show how deviant community analysis can be extended to take into consideration the relationship with the whole social network.
-
We propose an analytical framework to describe communities in OSNs by looking at different dimensions: temporal, social, and spatial.
Historical Background
In social sciences, a community, or a group, is defined as a set of two or more people who interact with one another, share similar traits, and collectively have a sense of belonging. This definition implies three main concepts, which have been extensively debated: interdependence, homophily, and social identity. These characteristics shift the definition of community beyond the simplistic idea of a group as an aggregation of individuals and entail a degree of subjectivity, which makes the task of identifying communities hard.
Indeed, interdependence and homophily can be measured, and they have been studied in a quantitative way (Aiello et al. 2012; Bisgin et al. 2010). On the other hand, the concept of social identity, which has been extensively studied—first, by Tajfel (1982)—is hard to frame and has been an object of investigation. The subjectivity related to this concept is hardly treatable within the computer science framework. The concept of group membership as a matter of shared self-definition is predominant (Turner 1981), but it is hardly captured by computational studies that base their findings on cohesive interpersonal relationships by looking at interaction patterns. The matching between sociological findings and computational approaches to quantify them is still a challenging research area.
In the following sections, we discuss the computer science perspective in defining and analyzing communities, and we explore how social sciences and computational approaches can be matched.
Computer Science and Social Sciences
Social Sciences
The analysis of social networks is an interdisciplinary academic field, which emerged from social psychology, sociology, statistics, and graph theory. Social structures such as groups and dyadic ties are analyzed to study human behavior and social interactions. Social studies have successfully defined several theoretical models able to explain the patterns observed in these structures (Wasserman and Faust 1994). In fact, social network and community analysis is currently one of the major cores in contemporary sociology and is also employed in a number of other formal sciences.
Computer Science
In computer science, the term community is more frequently used than the term group, which has been widely adopted in social sciences. In this work, we use both terms without distinction. According to computer science terminology, the discovery of communities is related to the task of clustering the nodes of the graph used to represent the social network. People are mapped to nodes of a graph, and edges are created according to their interactions. Borrowing tools from clustering and theoretical graph analysis, a number of techniques have thus been used to detect communities in social networks (e.g., the Girvan–Newman method (Girvan and Newman 2002) and the modularity-based method (Newman and Girvan 2004)).
Such “algorithmic communities” emerge from a data-driven approach, which is now considered a “paradigm shift” in the machine learning (ML) field. This data-driven phenomenon was described by Kuhn (1962) as a phenomenon in which an abrupt shift in the values, goals, and methods of the scientific community occurs (Cristianini 2014). Some success stories (from spelling correction to face recognition, including question answering, machine translation, and information retrieval) have shown how the data-driven approach centered on machine learning technologies is the winning one in many applications (Cristianini 2014).
Community detection techniques have been largely employed in recent years to describe the structure of complex social systems. However, these algorithmic communities are totally defined on the basis of some graph properties (e.g., density) and discard the subjective concept of the identity of the community members. This leads to detection of groups of users who are not always aware of being members of them. Groups detected algorithmically (detected groups) do not correspond to user-generated groups (declared groups), which are considered in social sciences. Attempts to evaluate this mismatch have been made (Aiello 2015).
Computational Social Science
Recently a new discipline based on quantitative understanding of complex social systems (Cioffi-Revilla 2010) has been born. Computational social science (CSS) is the bridge between social sciences (SS) and computer science (CS), based on the study of what is proper in social studies through computational techniques and approaches developed often in the computer science community. CSS can benefit from the presence of huge volumes of data on society’s everyday behavior due to the significant integration of technology into people’s lives (Conte et al. 2012).
Social–Spatiotemporal Analysis of Topical and Polarized Communities in Online Social Networks
Users spend a considerable amount of time in OSNs, creating original content (posting), sharing multimedia content from other users (sharing), discussing (messages and comments), and reinforcing external content (liking). Communities emerge around different topics of interaction, and analysis of the social aggregations in a virtual context is interesting to shed light on human behavior.
In the following sections, after reviewing some basic concepts related to the communities in OSNs, we discuss recent research results along three important analytical dimensions: social, spatial, and temporal. Furthermore, we include a novel orthogonal dimension: polarization.
Communities in Online Social Networks
Homophily and Diversity
Homophily is a main driver that characterizes communities in both real and digital contexts. Homophily induces similarity between members of communities: “birds of a feather flock together” (McPherson et al. 2001). This is due to two cofounding principles: (i) selection mechanisms (preferences are connected to similar users’ traits); and (ii) social contagion (how much linked people influence each other) (Leenders 1997). Homophily has been widely studied in OSNs showing correlation between friendships and interests (Aiello et al. 2012) or between profile information and communication patterns (Leskovec and Horvitz 2008). Local proximity and age are another example of homophily factors in OSNs (Kumar et al. 2005).
On the other hand, it has been shown that diversity in the discussed topics or in the shared content favors the stability of a community as group members continue being stimulated by new input (Ludford et al. 2004). Models of growth and longevity of groups in digital contexts have been also investigated (Backstrom et al. 2006).
Size, Membership, and Barriers
The group size affects the dynamics of interactions. The phenomenon has been deeply studied in real-world social networks by Robin Dunbar (1992). He correlated the volume of the neocortex in primates with the number of social stable relationships they have. He adapted the same theory to humans (Dunbar 1993), concluding that the number of people with whom a person can maintain stable social relationships is about 150. Similar results have been found in the Facebook friendship network, showing similarities between ego network structures in OSNs and in real life (Arnaboldi et al. 2012). Ego networks are graphs where the central node is the studied user and all of the other nodes connected to him/her represent his/her friends. Similarly, Goncalves et al. (2011) performed comparable experiments on Twitter, measuring the average interaction strength.
Topical and Social Groups
Two main processes can be identified in the development of communities in social networks: users create ties based on common interests or based on personal social relationships. The resulting kinds of group have been referred to as common identity groups or common bond groups (Prentice et al. 1994). We also adopt the lexicon proposed by Martin-Borregon et al. (2014) and refer to those kinds of group as topical and social.
Members of topical groups discuss a specific topic or a specific area of interest, and they do not usually have personal relationships with each other. Conversely, members of social groups tend to be reciprocal in their interactions with other members, and discussions are focused on multiple topics. One implication is that social groups are vulnerable to turnover, since personal relationships are present and they can influence user departure. Topical groups, on the other hand, are robust to departures and they are open to accepting new members (Aiello 2015).
To distinguish between the different kinds of group, Aiello suggested quantifying the reciprocity of interactions and the topical width of the discussions (Aiello 2015). Typically, greater reciprocity indicates a higher probability that the group is social, while a small topical width indicates topical groups. These variables integrate social and content-based aspects. In practice, groups can be both topical and social.
From a computer science point of view, instead, when we refer to topics, we mean a multinomial distribution of words that represents a coherent concept in a set of text documents. To extract the most important topics from a piece of text (a topic selection task), different techniques have been developed: the most frequently applied method is the unsupervised latent Dirichlet allocation (LDA) (Blei et al. 2003). Novel methods still based on LDA have been proposed recently (Blei and Lafferty 2007; McAuliffe and Blei 2008; Wang et al. 2009).
Moreover, researchers have explored the relationship between diffusion of a topic and network structure (Barbieri et al. 2013), focusing on the structural and dynamic properties of specific topical communities such as groups supporting political parties (Conover et al. 2011) or groups discussing various conspiracy theories (Bessi et al. 2015), rumors and hoaxes (Ratkiewicz et al. 2010), deviant behaviors (Coletto et al. 2016a), or more ordinary topics such as fashion or sport. In the section “Social Dimension and Polarization” we describe in more detail how we can study a topical group in an OSN. In particular, we discuss the case of deviant communities, which are highly topical.
Topical Groups and Polarization
An interesting fine-grained investigation can be obtained by focusing on user opinion. When users discuss a specific topic, they cluster into subgroups. If the topic is controversial, such groups are strongly polarized. The relation between the topic and polarization is dynamic: users generally start discussing a topic and, around that, different opinions emerge.
Understanding opinion and polarization is a challenging task, and it has recently received a lot of attention in the information retrieval and data mining communities. Analysis of polarization is useful to investigate the evolution of groups, and sometimes it can be used to predict the behavior of users or their activities, e.g., predicting vote intention among Twitter users (Coletto et al. 2015) or understanding product preferences for marketing aims (Leskovec et al. 2007).
In Coletto et al. (2016c), we proposed an iterative procedure to detect polarized users and to monitor topic evolution over time. We focused on the frequent scenario where users interact and produce content according to a set of polarization classes. By polarization classes, we mean subjects that require the user to side exclusively with one part. Political parties are typical examples of these classes: users discuss several parties, and their opinion changes over time, but they can eventually vote for only one. Other examples include brand analysis, product comparison, and opinion mining in general. Topic detection and tracking (TDT) (Allan 2012) has been widely explored within the scope of news stream analysis (Walls et al. 1999). We are interested in content and user tracking for polarized users. This notion is connected with the concept of controversy in social media, which has been studied, mostly in political contexts, using data coming from different sources (Adamic and Glance 2005; Coletto et al. 2017b; Garimella et al. 2016; Gysel et al. 2015; Makazhanov et al. 2014).
In these scenarios, the polarization classes are known, and some limited information may also be available, e.g., a set of relevant keywords. By leveraging such limited knowledge, several challenging tasks can be tackled:
-
1.
How to identify the users being polarized (or not) according to those classes
-
2.
How to identify the most relevant subtopics being discussed among such users
-
3.
How to monitor the evolution of such user communities and their online discussions over time
Those tasks are all very challenging, as the available knowledge may be approximate or insufficient, and it may also become obsolete over time. Therefore, the classification into polarization classes should be able to self-update continuously by catching upcoming relevant users and discussion topics.
We have described PTR (Polarization TRacker) (Coletto et al. 2016c) for the discovery of polarized users in a Twitter stream. While there exist several works about community detection and trending topic tracking, we have proposed a novel setting where the number of communities is known, but very little information is provided (a keyword per class only) and those communities are competing with each other.
The PTR algorithm is illustrated in Algorithm 1
. As input the algorithm receives an initial set of polarized keywords \( \left\{{H}_c^0\right\} \) (initial seed) for each polarized class c. For instance, in Twitter the initial seeds can be selected by analyzing the most frequent hashtags and manually selecting a few per class. One of the benefits of PTR is that after only a few iterations the results are less dependent on the size of the original seed, since new relevant keywords are continuously discovered.
The algorithm iterates the following classification steps:
-
TweetClassify: a message/post is said to be polarized to class c if it does not mention any keyword from classes other than c, which are denoted with \(\left\{ {H_c } \right\}_{c\, \in \, \mathscr C}\).
-
UserClassify: a user is polarized to one class c only if his/her polarized tweets of class c are at least twice the number of the polarized tweets of any other class.
-
HashTagsClassify: a keyword h is assigned to one class c if S c (h) > β· S c′ (h), ∀c′ ≠ c, where S c (h) is the joint probability of observing h in messages polarized to class c, and not observing h in messages polarized to other classes.
The procedure iterates classifying tweets, users, and keywords (e.g., Twitter hashtags) until convergence. Note that the whole content stream is taken into consideration at each iteration, including those users/posts/keywords that were not previously labeled with a polarization class. In Coletto et al. (2016c), the accuracy of the method is greater in comparison with a k-means baseline in terms of F-measure by a percentage from 7% to 71% in relation to the specific dataset considered.
The PTR algorithm can be used to perform polarization or sentiment analysis, to discover polarized communities, and to study their structural evolution over time in different contexts. In Coletto et al. (2016b), for example, it was used to study the opinions of users about the social phenomenon of migration and the refugee crisis.
A temporal version of the proposed algorithm, TPTR (temporal PTR), is also described in the previously mentioned paper (Coletto et al. 2016c). TPTR is able to track users and topics over time.
A similar approach, aimed at dynamically tracking polarization in OSNs, was proposed by Lu et al. (2015). They presented an efficient optimization-based opinion bias propagation method over the social/information network.
Analytical Dimensions
So far we have pointed out different ingredients that compose the concept of communities in the digital context. Moreover, to analyze a community in an OSN, we might look at different dimensions: temporal, spatial, and social (Martin-Borregon et al. 2014). We extend the three-dimensional characterization of groups proposed by Martin-Borregon et al. (2014) with the concept of polarization of users around a topic.
Temporal Dimension
The temporal dimension is crucial in studying dynamics of groups in a social network. Aiello (2015) has proposed to classify groups into three classes:
-
Short-lived: The groups in this category show a low level of activity after being created, soon becoming inactive.
-
Evergreen: The evergreen cluster is characterized by groups created at a certain point in the past, which have been growing in number of users and content produced.
-
Bursty: These are groups with the lowest skewness and big burstiness, especially in the number of users joining. Usually the highest activity is registered at the beginning of their life and from time to time they experience content production. Some of these groups are related to recurring (e.g., yearly) events that regularly attract the attention of users.
Moreover, we have already described the importance of time in dynamically tracking polarized communities (through TPTR) (Coletto et al. 2016c). To develop realistic predictive models, for instance, we have to take into consideration in a proper way dynamical changes in the network (Tatemura 2000).
From a microscopic point of view, temporal analyses of interactions among users in OSNs are fundamental. Miritello et al. have worked extensively on temporal patterns of human communications and their influence on the spreading of information in social networks (Miritello et al. 2011; Miritello 2013a, b). In OSNs, study on the evolution in time of interactions has been performed in recent research works: Viswanath et al. (2009) showed a dynamical study for the case of Facebook. Many studies have led to insights on how an interaction graph is structurally different from the social network itself, and the temporal component is important to track the strength of the relationships. Over time, social links can grow stronger or weaker, and this knowledge is crucial to characterize communities and their evolution.
Furthermore, understanding dynamical phenomena such as cascades and flames in conversations or linking communication patterns with events though time or content is equally important. The task, however, is challenging, since the variables to consider are innumerable and the data processing is not trivial. In addition to that, integrating the dynamical characteristic of the interactions and external data related to events is a very complex task.
Spatial Dimension
There is a significant increase of interest in collecting and analyzing geo-located data from OSNs. Typically, OSNs enable different location information for users and for actions. Usually there are two classes of geographical information: the locations of the users (GPS, user description) or places mentioned in the interactions (Coletto et al. 2016b).
Several works have studied different aspects of the geographical dimension of OSNs: a broad study on this argument was reported in Scellato et al. (2010). The authors proposed a framework to compare social networks based on two new measures: one capturing the geographical closeness of a node with its network neighborhood and a clustering coefficient weighted on the geographical distance between nodes.
Liben-Nowell et al. (Kumar et al. 2005) have found a strong correlation between location and friendship. Twitter geo-located posts have been studied by Takhteyev et al. (2012) to understand how Twitter social ties are affected by distance. Linked users are identified as “egos” and “alters” and the distance between them is analyzed by considering the correlation with the air travel connection distance and with national borders and languages. An analogous objective was the focus of Kulshrestha et al. (2012), who inferred the locations of 12 million Twitter users in a worldwide dataset. In contrast to the previous paper, they studied the correlation between the Twitter population and the socioeconomic status of a country, suggesting that highly developed countries are characterized by greater Twitter usage.
Finally, the geographical properties of an OSN have also been shown to be useful to study migration phenomena (Coletto et al. 2016b; Hawelka et al. 2014; Zagheni et al. 2014).
Social Dimension and Polarization
Even though it is computationally hard to model the concept of social identity, this is the base driver for human association. Groups give us a sense of belonging to the social world, which is a source of pride and self-esteem. In order to increase our self-image, we enhance the status of the group to which we belong, or we discriminate and hold prejudices against the groups to which we do not belong (Turner 1981). The notion of social closeness and sense of belonging is tightly linked to cultural proximity and sharing opinions and values. Communities emerge in digital contexts, as in real life, following specific drivers classified as common identity and common bond (Prentice et al. 1994).
A further level of differentiation is the polarization process, which creates subcommunities based on sharing values and opinions around a specific topic. Polarization can be studied by looking at the interactions among these different subcommunities and by evaluating the controversy in the conversations among users belonging to different subgroups. On the web, and in particular in the context of OSNs, controversy has been studied from different perspectives (Coletto et al. 2017b; Dori-Hacohen 2015; Dori-Hacohen and Allan 2013; Garimella et al. 2016).
Moreover, it is interesting to associate polarization with topics. The problem has been partially studied (Jo and Oh 2011), proposing some models that associate topic modeling with sentiment analysis: the Topic Sentiment Mixture (TSM) model (Mei et al. 2007), Multi-Aspect Sentiment (MAS) model (Titov and McDonald 2008), and Joint Sentiment/Topic (JST) model (Lin and He 2009). Briefly, in TSM, each word is connected with a specific topic and independently with a sentiment; in MAS, the set of aspects to be evaluated is fixed and sentiment is modeled as a probability distribution for each aspect; in JST, sentiment is integrated with a topic in a single language model (Jo and Oh 2011). Alternative recent interesting approaches proposed in this field are a semisupervised model based on conditional random fields (Marcheggiani et al. 2014) and a hierarchical text classification method (Esuli et al. 2008). The models still are not flexible enough to catch evolution of topics and opinions in conversation in OSNs, but this is still an open issue.
Deviant Behaviors
Online social media are also favorable ecosystems for the formation of topical communities centered on matters that are not commonly taken up by the general public because of the embarrassment, discomfort, or shock they may cause (Coletto et al. 2016a).
Those are communities that depict or discuss what are usually referred to as deviant behaviors (Clinard and Meier 2015)—conducts that are commonly considered inappropriate because they are somehow violative of society’s norms or moral standards that are shared among the majority of the members of society. Pornography consumption, drug use, excessive drinking, illegal hunting, eating disorders, or any self-harming or addictive practice are all examples of deviant behaviors. Many of them are represented, to different extents, on social media (De Choudhury 2015; Haas et al. 2010; Morgan et al. 2010). However, since all of these topics touch upon different societal taboos, the commonsense assumption is that they are embodied either in niche, isolated social groups, or in communities that might be quite numerous but whose activity runs separately from the mainstream social media life. In line with this belief, researchers have mostly considered those groups in isolation, focusing predominantly on the communication patterns among community members (Tyson et al. 2015) or, from a sociological perspective, on the motivations of their members and on the impact of the groups’ activities on their lives and perceptions (Attwood 2005).
In reality, people who are involved in deviant practices are not segregated outcasts but part of the fabric of the global society. As such, they can be members of multiple communities and interact with very diverse sets of people, possibly exposing their deviant behavior to the public. In Coletto et al. (2016a), we aimed to go beyond previous studies that looked at deviant groups in isolation by observing them in context. In particular, we wanted to shed light on a matter that is relevant to both network science and social sciences: how much deviant groups are structurally secluded from the rest of the social network, and what the characteristics of their subgroups who build ties with the external world are (Coletto et al. 2016a).
In Coletto et al. (2016a), we focused on the behavior of adult content consumption. Public depiction of pornographic material is considered inappropriate in most cultures, yet the number of consumers is strikingly high (Sabina et al. 2008). Despite that, we were not aware of any study about online communities that produced that type of content interfacing with the rest of the social network. We studied this phenomenon in a large dataset from Tumblr, considering a big sample of the follow and reblog networks for a total of more than 130 million nodes and almost 7 billion dyadic interactions. To spot the community that generated adult content, we also recurred to a large sample of 146 million queries from a 7-month query log from a very popular search engine, out of which we built an extensive dictionary of terms related to adult content that we made publicly available.
The results showed that:
-
The deviant network is a tightly connected community structured into subgroups, but it is linked with the rest of the network with a very large number of ties.
-
The vastest amount of information originating in the deviant network is produced from a very small core of nodes but spreads widely across the whole social graph, potentially reaching a large audience of people who might unwillingly see that type of content.
Echo Chambers
Different polarized clusters usually interact in exchanging points of view, but often it is the case of echo chambers, which represent a situation in which information, ideas, and opinions are amplified or reinforced by transmission and repetition inside an “isolated” system, where alternative views are censored or underrepresented.
This phenomenon has been studied in particular in the context of misinformation, by analyzing interactions between scientific communities and groups focused on conspiracy theories. Bessi et al. (2015) studied the scientific community and the conspiracy community in Facebook, and the results showed that polarized communities emerge around distinct types of content and usual consumers of conspiracy news become more self-contained and focused on their specific content.
Key Applications
Toward a Unified Approach
Analytical Framework
The social, spatial, and temporal dimensions, as well as the topic and polarization dimensions, which we have discussed so far, can be used to analyze communities in OSNs in a general and comprehensive way.
In Coletto et al. (2017a), we proposed an analytical framework to investigate social trends in large tweet collections by extracting and crossing information about the following three dimensions: time, location, and polarization. The methodology described how to: (1) extract relevant spatial information; (2) enrich data with the sentiment of the message and of the user (retrieved in an automatic iterative way through machine learning); and (3) perform multidimensional analyses considering content and locations in time.
The approach is general and can be easily adapted to any topic of interest involving multiple dimensions.
Micromarriage and Macromarriage
To study social processes, it is furthermore useful to merge dyadic interaction analysis with the study of multiuser interactions.
Social networks are complex systems and the relationships between atomic components create emergent behaviors that can hardly be modeled directly from the composition of the individual parts (Aiello 2015).
Group analysis then is very important to understand the properties of the collectivity in a macroperspective. The challenge is to merge the analysis of dyadic interactions and one-to-one communication patterns (microanalysis) with more general comprehension of multiuser aggregations (macroanalysis) to fully understand social behaviors.
Research on communities in OSNs through the abovementioned dimensions could create the bridge between individual characterization of the users and their interactions and the understanding of collective behaviors that are detected on a higher level in the social network.
Isolation and Spreading
Finally, it is important to study not only the community and its properties but also its relation to the rest of the network, as was done, for instance, in Coletto et al. (2016a). The authors studied a community in an OSN in relation to all of the network. Usually this is not done because of the idea that some communities are isolated or for the lack of broader data.
Future Directions
We have analyzed communities in online social networks from different angles. We have discussed:
-
Why people organize into groups
-
Some reasons for the success of online social networks and the digital environment
-
How people interact in online social networks
-
How communities are detected in online social networks
-
The computer science and social sciences perspectives in defining groups
-
The success of computational social science and the data-driven approach
-
The typology of communities (topical and social)
-
The structure of communities in polarized subgroups
-
How to track polarization in time and topic evolution
-
Some examples of topical communities (in particular, we proposed the study of a deviant community and its relationships with the whole network)
-
The importance of time, place, social links, and polarization in analyzing communities in online social networks
This work summarizes the recent advances in this topic, but still many challenges are open. We have contributed to exploring this area by proposing new methods and analyses, but there are many steps that must be taken by the research community to create a robust framework in order to achieve comprehensive studies of interactions among groups in online social networks and their evolution.
In fact, this research area may contribute to creating bridges between the computer science and social science communities in order to develop new meaningful research through computational tools by merging the skills and the knowledge of the two domains.
The study of people and how they interact, both in the real-world case and in the digital environment, is a complicated task because humans are heterogeneous systems and, through their interactions, each system is linked to the others in an even more complex relation. We believe that a successful approach should foster the integration of the different tools and methodologies made available by the research conducted so far in the fields of both social and computational sciences, thus supporting multidimensional analyses as discussed before.
The presence of massive datasets and the continuous increase of new social traces on the one side, and the evolution of methods and technology on the other side, will provide more keys to better deal with the complexity that is the foundational pillar of social phenomena.
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery. ACM, New York, NY, USA, pp 36–43
Aiello LM (2015) Group types in social media. In: Paliouras G, Papadopoulos S, Vogiatzis D, Kompatsiaris Y (eds) User community discovery, Human-computer interaction series. Springer International Publishing, Switzerland, pp 97–134
Aiello LM, Barrat A, Schifanella R, Cattuto C, Markines B, Menczer F (2012) Friendship prediction and homophily in social media. ACM Trans Web (TWEB) 6(2):9
Allan J (2012) Topic detection and tracking: event-based information organization, vol 12. Springer Science & Business Media
Arnaboldi V, Conti M, Passarella A, Pezzoni F (2012) Analysis of ego network structure in online social networks. In: Privacy, security, risk and trust (PASSAT), 2012 international conference on and 2012 international conference on social computing (SocialCom), pp 31–40. IEEE
Arnaboldi V, Conti M, Passarella A, Dunbar R (2013) Dynamics of personal social relationships in online social networks: a study on Twitter. In: Proceedings of the first ACM conference on online social networks, COSN ’13. ACM, New York, pp 15–26
Attwood F (2005) What do people do with porn? Qualitative research into the comsumption, use, and experience of pornography and other sexually explicit media. Sex Cult 9(2)
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Ithaca, NY, pp 44–54
Backstrom L, Boldi P, Rosa M, Ugander J, Vigna S (2012) Four degrees of separation. In: Proceedings of the 3rd annual ACM web science conference. ACM, Ithaca, NY, pp 33–42
Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: WSDM. ACM, New York, NY, USA
Barrett L, Henzi P, Rendall D (2007) Social brains, simple minds: does social complexity really require cognitive complexity? Philos Trans R Soc Lond B: Biol Sci 362(1480):561–575
Bessi A, Coletto M, Davidescu GA, Scala A, Caldarelli G, Quattrociocchi W (2015) Science vs conspiracy: collective narratives in the age of misinformation. PLoS One 10(2)
Bisgin H, Agarwal N, Xu X (2010) Investigating homophily in online social networks. In: Web intelligence and intelligent agent technology (WI-IAT), 2010 IEEE/WIC/ACM international conference on, vol 1. IEEE, Toronto, ON, Canada, pp 533–536
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat:17–35
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Buettner, R. (2016) Getting a job via career-oriented social networking sites: the weakness of ties. In: 2016 49th Hawaii international conference on system sciences (HICSS). IEEE, Koloa, HI, USA, pp 2156–2165
Buyukkokten O, Adar E, Adamic L (2005) A social network caught in the web. First Monday 8(6):15–40
Cioffi-Revilla C (2010) Computational social science. Wiley Interdiscip Rev: Comput Stat 2(3):259–271
Cioffi-Revilla C (2014) Introduction to computational social science: principles and applications. Berlin/New York: Springer 10 (2014): 978–1
Clinard M, Meier R (2015) Sociology of deviant behavior. Cengage Learning, Wadsworth
Coletto M, Lucchese C, Orlando S, Perego R (2015) Electoral predictions with Twitter: a machine-learning approach. IIR
Coletto M, Aiello LM, Lucchese C, Silvestri F (2016a) On the behaviour of deviant communities in online social networks. In: Tenth international AAAI conference on web and social media (ICWSM), pp 72–81
Coletto M, Lucchese C, Muntean CI, Nardini FM, Esuli A., Renso C, Perego R (2016b) Sentiment-enhanced multidimensional analysis of online social networks: perception of the Mediterranean refugees crisis. In: Advances in social networks analysis and mining (ASONAM), 2016 IEEE/ACM international conference on, pp 1270–1277. IEEE
Coletto M, Lucchese C, Orlando S, Perego R (2016c) Polarized user and topic tracking in Twitter. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, Pisa, ITALY, pp 945–948
Coletto M, Esuli A, Lucchese C, Muntean CI, Nardini FM, Perego R, Renso C (2017a) Perception of social phenomena through the multidimensional analysis of online social networks. Online Soc Netw Media 1:14–32
Coletto M, Garimella K, Gionis A, Lucchese C (2017b) A motif-based approach for identifying controversy. In: Proceedings of the eleventh international conference on web and social media, ICWSM 2017, Montréal, 15–18 May 2017, pp 496–499. https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15653
Conover M, Ratkiewicz J, Francisco M, Gonçalves B, Menczer F, Flammini A (2011) Political polarization on Twitter. In: ICWSM, vol 133, pp 89–96
Conte R, Gilbert N, Bonelli G, Cioffi-Revilla C, Deffuant G, Kertesz J, Loreto V, Moat S, Nadal JP, Sanchez A et al (2012) Manifesto of computational social science. Eur Phys J Spec Top 214(1):325–346
Cristianini N (2014) On the current paradigm in artificial intelligence. AI Commun 27(1):37–43
De Choudhury M (2015) Anorexia on Tumblr: a characterization study. In: Florence, Italy, Digital health. ACM
de Sola Pool I, Kochen M (1979) Contacts and influence. Soc Netw 1(1):5–51
Dori-Hacohen S (2015) Controversy detection and stance analysis. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, pp 1057–1057
Dori-Hacohen S, Allan J (2013) Detecting controversy on the web. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, New York, NY, pp 1845–1848
Dunbar RI (1992) Neocortex size as a constraint on group size in primates. J Hum Evol 22(6):469–493
Dunbar RI (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(04):681–694
Esuli A, Fagni T, Sebastiani F (2008) Boosting multi-label hierarchical text categorization. Inf Retr 11(4):287–313
Garimella K, De Francisci Morales G, Gionis A, Mathioudakis M (2016) Quantifying controversy in social media. In: Proceedings of the ninth ACM international conference on web search and data mining (WSDM). ACM, pp 33–42
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Gonçalves B, Perra N, Vespignani A (2011) Modeling users’ activity on Twitter networks: validation of Dunbar’s number. PLoS One 6(8):e22,656
Granovetter MS (1973) The strength of weak ties. Am J Sociol:1360–1380
Haas SM, Irr ME, Jennings NA, Wagner LM (2010) Online negative enabling support groups. New Media Soc
Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P, Ratti C (2014) Geo-located Twitter as proxy for global mobility patterns. Cartogr Geogr Inf Sci 41(3):260–271
Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on web search and data mining. ACM, New York, NY, pp 815–824
Kuhn TS (1962) The structure of scientific revolutions. University of Chicago Press
Kulshrestha J, Kooti F, Nikravesh A, Gummadi KP (2012) Geographic dissection of the Twitter network. In: Proceedings of the sixth international AAAI conference on weblogs and social media (ICWSM)
Kumar R, Liben-Nowell D, Novak J, Raghavan P, Tomkins A (2005) Theoretical analysis of geographic routing in social networks. CSAIL Technical Reports, MIT Massachusetts, USA
Kumar R, Novak J, Tomkins A (2010) Structure and evolution of online social networks. In: Link mining: models, algorithms, and applications. Springer, New York, pp 337–357
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web. ACM, New York, NY, USA, pp 591–600
Leenders, R.: Longitudinal behavior of network structure and actor attributes: modeling interdependence of contagion and selection. Evolution of social networks 1 (1997). Evolution of social networks, 1997, 1: 165–184.
Leskovec J, Horvitz E (2008) Planetary-scale views on a large instant-messaging network. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, NY, USA, pp 915–924
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web (TWEB) 1(1):5
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 375–384
Lu H, Caverlee J, Niu W (2015) Biaswatch: a lightweight system for discovering and tracking topic-sensitive opinion bias in social media. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 213–222
Ludford PJ, Cosley D, Frankowski D, Terveen L (2004) Think different: increasing online community participation using uniqueness and group dissimilarity. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, NY, USA, pp 631–638
Magno G, Comarela G, Saez-Trumper D, Cha M, Almeida V (2012) New kid on the block: exploring the Google+ social graph. In: Proceedings of the 2012 ACM conference on internet measurement conference. ACM, New York, NY, USA, pp 159–170
Makazhanov A, Rafiei D, Waqar M (2014) Predicting political preference of Twitter users. Soc Netw Anal Min 4(1):1–15
Marcheggiani D, Täckström O, Esuli A, Sebastiani F (2014) Hierarchical multi-label conditional random fields for aspect-oriented opinion mining. In: Advances in information retrieval. Springer, pp 273–285
Marsden PV, Campbell KE (1984) Measuring tie strength. Soc Forces 63(2):482–501
Martin-Borregon D, Aiello LM, Grabowicz P, Jaimes A, Baeza-Yates R (2014) Characterization of online groups along space, time, and social dimensions. EPJ Data Sci 3(1):8
Maslow AH (1943) A theory of human motivation. Psychol Rev 50(4):370
McAuliffe JD, Blei DM (2008) Supervised topic models. In: Advances in neural information processing systems, pp 121–128
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, NY, USA, pp 171–180
Messinger PR, Stroulia E, Lyons K (2008) A typology of virtual worlds: historical overview and future directions. J Virtual Worlds Res 1(1)
Milgram S (1967) The small world problem. Psychol Today 2(1):60–67
Miritello G (2013a) Information spreading on communication networks. In: Temporal patterns of communication in social networks. Springer, Switzerland, pp 107–130
Miritello G (2013b) Temporal patterns of communication in social networks. Springer
Miritello G, Moro E, Lara R (2011) Dynamical strength of social ties in information spreading. Phys Rev E 83(4):045102
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement. ACM, New York, NY, USA, pp 29–42
Morgan EM, Snelson C, Elison-Bowers P (2010) Image and video disclosure of substance use on social media websites. Comput Hum Behav 26(6):1405–1411
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Oh HJ, Ozkaya E, LaRose R (2014) How does online social networking enhance life satisfaction? The relationships among online supportive interaction, affect, perceived social support, sense of community, and life satisfaction. Comput Hum Behav 30:69–78
Prentice DA, Miller DT, Lightdale JR (1994) Asymmetries in attachments to groups and to their members: distinguishing between common-identity and common-bond groups. Key Readings Soc Psychol 20(5):484–493
Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Patil S, Flammini A, Menczer F (2010) Detecting and tracking the spread of astroturf memes in microblog streams, Palo Alto, California. arXiv preprint arXiv:1011.3768
Sabina C, Wolak J, Finkelhor D (2008) The nature and dynamics of internet pornography exposure for youth. CyberPshychol Behav 11(6)
Scellato S, Mascolo C, Musolesi M, Latora V (2010) Distance matters: geo-social metrics for online social networks. In: Conference on online social networks, WOSN’10
Tajfel H (1982) Social psychology of intergroup relations. Annu Rev Psychol 33(1):1–39
Takhteyev Y, Gruzd A, Wellman B (2012) Geography of Twitter networks. Soc Netw 34(1):73–81
Tatemura J (2000) Virtual reviewers for collaborative exploration of movie reviews. In: Proceedings of the 5th international conference on intelligent user interfaces. ACM, New York, NY, USA, pp 272–275
Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, vol 8. Citeseer, pp 308–316
Turner JC (1981) Towards a cognitive redefinition of the social group. Cahiers de Psychologie Cognitive/Current Psychol Cognition, pp 15–40
Tyson G, Elkhatib Y, Sastry N, Uhlig S (2015) Are people really social in porn 2.0? In: ICWSM. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10511
Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the Facebook social graph. arXiv preprint arXiv:1111.4503
Van Gysel C, Goethals B, de Rijke M (2015) Determining the presence of political parties in social circles. In: ICWSM, pp 690–693
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM workshop on online social networks. ACM, New York, NY, USA, pp 37–42
Walls F, Jin H, Sista S, Schwartz R (1999) Topic detection in broadcast news. In: Proceedings of the DARPA broadcast news workshop, Morgan Kaufmann Publishers, Inc., pp 193–198
Wang Y, Bai H, Stanton M, Chen WY, Chang EY (2009) Plda: Parallel latent Dirichlet allocation for large-scale applications. In: Algorithmic aspects in information and management. Springer, pp 301–314
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, Cambridge
Wilson C, Sala A, Puttaswamy KP, Zhao BY (2012) Beyond social graphs: user interactions in online social networks and their implications. ACM Trans Web (TWEB) 6(4):17
Zagheni E, Garimella VRK, Weber I, State B (2014) Inferring international and internal migration patterns from Twitter data. In: WWW conference, WWW’14 Companion, April 7–11, 2014, Seoul, Korea.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this entry
Cite this entry
Coletto, M., Lucchese, C. (2017). Social–Spatiotemporal Analysis of Topical and Polarized Communities in Online Social Networks. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7163-9_110182-1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7163-9_110182-1
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7163-9
Online ISBN: 978-1-4614-7163-9
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering