1 Introduction

Examples of social media being successfully used to help handle extreme situations are growing in numbers. Platforms can serve as a medium to the self-organization of people during critical events and dissemination of calls for assistance [3, 12, 18]. Though social media on their own provide functionality to improve cooperation, it is the social response of the community that lays at the base of resilience in crisis. However, communicative and cooperative behavior of users is hard to predict, especially if we consider that these are in fact influenced not only by the nature of a critical event, but the way it is reported by emergency managers.

People living in large congregations like cities constantly experience extreme situations at different scales. They react to extreme situations with organizing themselves into communities (or groups) dedicated to the information sharing, situational awareness and volunteering. In social media, the resulting communities become a natural way to interact with users in emergency cases and enact social responses of appropriate scales. Social response, as a component of self-organization effort, is interpreted here as a heterogeneous communicative reaction to critical situation that is discussed through social media. We specify that the distinctive feature of such reaction is its focus on the establishment of cooperation.

One of the most typical examples of such emergency response communities is a group called “ДTП и ЧП” (namely, road accidents and emergency situations, found at https://vk.com/spb_today) from Russian social network “Vkontakte” (https://vk.com). This group serves as a public hub of Saint-Petersburg for situational awareness and information dissemination about extreme situations in the city, a point where volunteers can be found and calls for help may be published. The group has obtained significant influence and attention from the residents of Saint-Petersburg and has been replicated with similar names in other cities of Russia (for example, Moscow and regions).

In this group, social response can be expressed with several main forms: (a) information dissemination via reposting which is the most accessible and the most important way for increasing situation awareness and leveraging social response; (b) help offering (volunteering); (c) discussion and advising which helps to increase attention to the problem and in some cases, may even help to find solution.

However, different resorts to the community via posts receive social response of different scales thus having either limited or amplified attention. Examples of both outcomes include: (a) the event of explosion in Saint-Petersburg on 3 April 2017; (b) frequent requests to help with a broken car.

To obtain a social response of an appropriate scale, it is important to understand what it depends on. Thus, the goal of this research is to address the following questions: (a) which characteristics of the community and messages shared within it predict the social response? (b) how to assess the scale of such response during the extreme situation?

To achieve this goal, we employ a random forest ensemble classifier that allows us to highlight the importance of communication features in social media communities. Some of the features proved to be more significant than others. For instance, the topics of the messages that resemble direct or indirect request for help are shown to be more associated with the social response than other topics and textual features. Answering the abovementioned questions allows us to propose a unified solution that can be effectively used as a basis for decision support in the scope of monitoring of the online social networks. This framework gives decision maker an ability to assess potential social response prior to public interaction with a certain community, depending on the textual representation of the certain event and desired format of the message as a device for effective communication.

The following are the contributions of the research outlined herein:

  1. 1.

    A framework designed as a module for a decision-support system, to estimate social response on message being published

  2. 2.

    Experimental study of the approach and framework.

2 Related Works

At its emergence, the use of social media in critical events has been a bottom-up process [1]. Prior to the first accounts of users engaging into coordination and self-organization via online social networks, new media (discussion, networking and blogging platforms) have been mostly viewed as a means to archive and disseminate the evidence of critical events in parallel with the more traditional centralized sources of information. Slowly, yet steadily, application of social media by emergency management practitioners is becoming more wide-spread [4]. As it is stated in the research literature, social media provides opportunities for decision makers to cope effectively with a number of issues [1] that are characteristic for virtually each step of the disaster management cycle. However, results of a large study of emergency services suggest that practitioners yet see social media as more of a broadcasting tool other than a mean to receive information [5].

Emergent volunteering in critical events is a long-standing research issue in the field of emergency management [6]. Previous research suggests [7] that though spontaneous volunteering is beneficial and in certain circumstances, essential for coping with critical situations, there are apparent issues that may complicate the engagement of so-called “unaffiliated disaster volunteers” [8]. Apart from legal gaps, apparent safety risks, organizational barriers and recognition problems [9], these include difficulties in coordination with official structures and practical absence of control over the activities carried out.

Unaffiliated volunteering can take up different forms. The more traditional ones result from the spatial convergence of people in the relative proximity to the area affected by the critical event. These include on-site transportation, manual labor, situational awareness and other efforts that require physical presence. Newer forms of online voluntarism (often referred to as digital volunteerism [9]) overcome the spatial and temporal limitations of localized ones. Users willing to help can participate in the information aggregation, sharing, manual analysis of data [10], translation [11] and other activities.

There is an increasing body of works that look into the characteristics of the communication of digital volunteers. For instance, in [12] authors analyze the use of ad hoc language and special codes to simplify the online coordination of crowdsourcing activities aimed at providing timely resilience for disaster-struck communities. Reuter et al. [13] analyze the shift in the content of the messages shared on Twitter throughout the lifecycle (i.e. increasing number of external links and decreasing number of reposts) of a particular disaster and the roles that online volunteers undertake in emergencies (including on-site volunteers, information generators and spreaders). More importantly, authors outline the design of the prototype system for bridging and coordinating spontaneous digital volunteers and those who engage in regular volunteering. Ukkusuri et al. [14] in their research carry out the analysis of posts of potential digital volunteers and affected households with a greater emphasis on the topics that are discussed by volunteers and characteristic sentiment of messages. They classify the content of the disaster-related messages spread and track the dynamics of the emergent categories relevant to critical event. Purohit et al. [15] extend the study of social media activities to distinguish between the information posted by users that request and provide help in three major cases spanning from 2010 to 2012.

It is worth to highlight that, in our knowledge, our work constitutes the first attempt for predicting social response by using the analysis of characteristics of the user-generated posts which contain requests for help in critical events. This issue has received attention in other subdomains [16], but has been overlooked in digital volunteering studies, primarily because of the focus of the researchers on detailed study of individual cases. Here we aim at merging these two areas of knowledge – predictive analysis of social response to social media messages and critical informatics. Moreover, we employ a perspective that aims at broader application of analysis than situational awareness or coordination.

3 Approach

To measure the potential scale of social response and turn it into a support tool for decision makers, it is necessary to build a framework which can operate in the environment where communities mentioned above are located. The key component of such framework is a classifier and a machine learning module which is used to train the former. Despite that, it shall contain monitoring and data preprocessing tools.

As it was mentioned above, social response is a collective reaction of community members to a certain event or new bits of information about it. Thus, estimation of social response scale should be based on accounting for individual users’ reactions to published posts. The most important reaction is reposting and the rest of paper will be focused on repost count prediction, however it can be extended further to other types of reactions such as help offers.

The task can be formalized as a multi class classification problem where each class corresponds to a certain level of reaction (the classes can be found in Sect. 4). In most situations it is enough to know the scale than exact number of reposts. It should be noted that besides set of classes there is also a small number of posts that reach extremely high levels of reposting and may be seen as anomalies.

The communicative model of post introduced in our framework allows to combine information from structured and unstructured sources. It consists of three groups of features:

  • Primary topic – usually there is only one topic that characterizes the type of the event itself such as missing person, fire, car hijack, etc.

  • Secondary topics – topics which characterize conditions under which the event is happening. It may include topics about police engagement in the event or presence of casualties. Primary topics are accompanied these secondary topics and the same secondary topic may occur in pairs with different primary topics.

  • Presence of special phrases or keywords – it includes explicit calls for help that can be detected by exact matching.

  • External attributes – text length, presence of link to coordinator or phone number.

Using this model, the framework is capable of performing required mining and topic modelling to cover requirements for social response measuring. General architecture of the framework is presented on Fig. 1.

Fig. 1.
figure 1

General framework architecture

The framework consists of the following modules:

Crawler.

The crawler module is responsible for collecting data about published messages and social response enacted by these messages. The module performs periodical monitoring of community and send updates to data preprocessing module and estimator module. The estimator module can use these data for assimilation and correcting its own predictions based upon dynamic of reposting for a certain message. The crawler was developed in previous works and its detailed description can be found in [20].

Data Preprocessing Module.

The data preprocessing module is responsible for extraction of features from unstructured and half-structured data to be used to train classifier. The important part of this module is topic modelling procedure which uses BigARTM library [17] to mine topics from the text of posts. Topics may represent different aspects of situations such as types of participants, typical actions, locations and implicitly carry information about what part of population may be affected by message and thus who are more likely to repost it.

Offline Learning Module.

The offline learning module is responsible for retraining classifier upon arriving data and updating it in estimator. Currently, random forest is used to perform classification. This algorithm has been chosen because of it has lower sensitivity to presence of unbalanced classes in dataset. The module also may perform grid search to ensure that the optimal parameters of the classifier itself have been selected. The need in retraining can be explained by dynamicity of the community. It grows and developing its own standards on message publishing, the users also get used to these standards and develop stable reaction on standard message types.

Social Response Estimator Module.

The social response estimator module is responsible for using classifier to answer external requests on response scale estimation. This module serves as an interface to communicate with user or other components of decision support system.

Data assimilation and prediction of the scale on-the-fly can be also used to decide whatever crawler should increase frequency of its monitoring to collect comments as quick as they appear. The latter may be important to detect emerging deviant behaviors of users. Some users can comment using offensive language to deliberately harm others participating in discussion or troll. Such actions may distract community from the problem and reduce social response.

4 Experimental Study

In our experimental study we use the data collected from the public page in Russian social network “Vkontakte” intended for user’s reports of road accidents, emergencies, and search for the missing people. The whole dataset for 2017 year amounts to 12457 records.

Our set of classification features consists of topic distribution of post’s text across 40 topics extracted with BigARTM [17], and two textual features: text length, and whether text of the post contains phone numbers of possible emergency coordinators.

The number of topics has been established as a result of multiple trials in which we have investigated its impact on the classifier’s accuracy, as well as patterns in distributions of main topics for each classification class, which we discuss in more details in the end of the section. In order to obtain more meaningful topics, we use the following set of model regularizers with the following values (Table 1):

Table 1. Values of ARTM model regularizers

These model parameters allow to successfully sharpen important topics and smooth background topics, which yield higher accuracy of classification. To assess the relative importance of classification features we have trained an ensemble of extremely randomized trees [18]. Importance of each feature is shown on Fig. 2:

Fig. 2.
figure 2

Feature importance

It is evident that the most important features are text length, topic 23, 20, 14, 2, and topic 36. Importance of other features apart from phone number is distributed more or less uniformly. We can assume that presence of phone number in text doesn’t play any vital role in social response prediction. Most important topics are presented in the Table 2:

Table 2. Description of important topics

We treat prediction of social response as a classification problem. To do so, we have distributed posts with different reposts volume into the following categories:

  • Reposts volume below 10 (class 0) – 9819 posts

  • Reposts volume between 10 & 100 (class 1) – 2257 posts

  • Reposts volume higher than 100 (class 2) – 381 posts.

These categories were used as labels for the classifier.

In order to test our ability to predict levels of social response, we have trained random forest ensemble [19] using one-vs-rest multiclass strategy. We have employed 5-fold grid-search cross-validation to establish model parameters that correspond to highest classification accuracy. Values of these parameters are shown in the Table 3:

Table 3. Parameters of random forest ensemble

Results of our classification experiment are presented on Fig. 3:

Fig. 3.
figure 3

Receiver operating characteristic for each class of reposting volume (with grid-search)

We further investigate distinctive features of each reposting category by drawing distribution of corresponding main topics (Fig. 4).

Fig. 4.
figure 4

Distribution of main topics for posts with reposts volume under 10

The most prevailing topic for posts with reposting volume under 10 is topic 14 which corresponds to basic reports of road accidents (Fig. 5).

Fig. 5.
figure 5

Distribution of main topics for posts with reposts volume between 10 & 100

In case of reposting volume between 10 & 100 the most dominant topic is topic 23 which corresponds to reports of car theft (Fig. 6).

Fig. 6.
figure 6

Distribution of main topics for posts with reposts volume more than 100

Posts with counts for reposts more than 100 are mostly described by topics 20, 36, and 2. These topics correspond to reports of missing people and pets, as well as generic messages involving police notifications.

We see that prevailing topic of the post is indeed a good indicator of potential social response. One of the prominent examples of this is prevailing of topic corresponding to generic road accidents reports across posts with low reposting activity, while posts with reposting volume higher than 10 mostly described by topics related to car high jacking, reports of missing persons, and other important information. However, the most research interest is attracted by the posts with highest reposting activity. Due to their low occurrence (less than 5% of the whole dataset) these posts should be treated as outliers. To investigate which features contribute to such high level of social response, we selected topic 36 that is articulated in both posts with huge amount of reposts and ordinary ones. As a result of side-by-side comparison of posts with the same main topic from both categories, we found out that unlike ordinary posts, posts with high reposting activity often contain explicit or implicit pleas and calls for action that naturally require as wide information spreading as possible. On the other hand, posts with low reposting volume are mostly of informative nature. Such posts are rare among posts with high reposting volume, but if they occur, they often contain information that might be useful for users in the future, like important phone numbers and warnings. We need to take these distinctive features into account, to successfully distinguish posts with very high possible reposting volume on the early stage of their spreading. One of possible ways to do so is to specifically mine such calls for actions or other textual forms that require active participation.

5 Conclusion and Future Work

In this paper we presented a framework for social response estimation, which can be used to predict possible level of reposting activity considering textual features of the post and topics that characterize its semantic. The results of our experimental study show that out framework is able to predict the scale of response with high precision and response is mostly driven by topics. However, while investigating rare posts with very high reposting level it was found out that such level of social response is achieved not only due to the main topics of messages, but rather as interplay between topics and additional features such as pleas, calls for action, and other subtle textual or external attributes. One of the prospect for future work is to use these implicit features to successfully predict which posts are about to obtain wide propagation.