Keyword-Based Approach for Detecting Civil Unrest Events from Social Media

Iyda, J. Joslin; Geetha, P.

doi:10.1007/978-3-030-19562-5_29

J. Joslin Iyda^6,7 &
P. Geetha⁸

Part of the book series: EAI/Springer Innovations in Communication and Computing ((EAISICC))

Abstract

In recent years the various online social media platforms like Twitter, Facebook, and Google+ are much popular and this popularity makes the protesters to actively use social media during civil unrest to express their opinions of the remonstrance, communicate their plans, and organize future events, which yield an impressive amount of data that has been used by the researchers to predict the protest activity in near future. Effective detection of such potentially dangerous misinformation can help to ensure the safety of the public with minimum disruption. We identified the correlation between the tweets promoting protest and the imminent protest activity. Thus we proposed a keyword-based approach for analyzing the behavior of a civil unrest event and also build a probabilistic model for classifying civil unrest events. Extensive experimental evaluations were done on the Twitter dataset from #Jallikattu, #BusFareHike and #SaveFisherMen civil unrest to demonstrate the effectiveness and efficiency of our proposed approach.

Download conference paper PDF

Civil Unrest Prediction: A Tumblr-Based Exploration

Discovering Popular Events on Twitter

Event Detection Over Twitter Social Media

Keywords

1 Introduction

Nowadays, online social media plays a vital role in our daily lives and are the major way through which individuals interact on the Internet. The social networking sites like Facebook, Twitter, LinkedIn and MySpace enables the user to communicate with other users, or to find people with similar interests to one’s own. And also online profiles can be created by the users in which they post daily updates about their lives in the form of pictures, videos, and related content. Facebook and Twitter have more than billions of users and it grows every day. Everybody started using social media ranging from normal people to celebrities, politicians, and media houses. They become prominent news source and can disseminate the information much faster than the traditional news media. Many real-world examples have shown the effectiveness and the timely information reported by Twitter during disasters and social movements. The following are the representative examples: the bomb blasts in Mumbai in November 2008, [1] the flooding of the Red River Valley in the United States and Canada in March and April 2009, the U.S. Airways plane crash on the Hudson River in January 2009, the devastating earthquake at Haiti in 2010, the demonstrations following the Iranian Presidential elections in 2009, and the “Arab Spring” in the Middle East and North Africa region.

As online social networking utilization turned out to be progressively interlaced with the occasions in the online world, people and organizations have discovered approaches to abuse these stages to spread wrong information [2], to assault and calumniate others [3], or to mislead and control. Clients with some tricky expectation may utilize this to spread bits of gossip, issue threats, give the wrong direction to their adherents and impart their tentative arrangements to their community [4, 5]. Criminal gangs and terrorist organizations like ISIS receive web-based social networking for purposeful publicity and enlistment [6]. Fraudulent action and social bots have been utilized to facilitate planned protest campaigns, to control political decisions and stock markets [7]. The absence of compelling substance confirmation frameworks and insufficient technical solutions to timely detect and ruin improper use on a considerable lot of these platforms, including Twitter and Facebook, raises concerns when more youthful clients disclose to cyber-bullying, harassment, or hate speech, initiating dangers like gloom and suicide. Moreover, online communications such as highly powerful social media are often used as a way of shouting out people’s intentions before engaging in their acts of violence and also to coordinate criminal activities [8]. Being able to automatically detect negative material is beneficial to the managers of websites that allow users to post content or as part of an early warning system to authorities on possible threats to public safety [9]. The automatic detection of potentially dangerous words can help to ensure the safety of the public with minimum disruption. Thus monitoring social media posts and discussions, then figuring out how participants are reacting to a brand or event can improve the business [10,11,12]. Extraction of useful information from social media is more challenging than classic information extraction, i.e., extraction from trusted sources like traditional news media and well-formed grammatical texts. The actual challenge is in accessing that data and transforming it into something that is usable and actionable. Social media text [13] is typically very short, noisy, a high uncertainty of the reliability of the information conveyed in the text messages compared to conventional news media, and many social media support multi-lingual languages.

In this paper, we propose a keyword-based approach for detecting civil unrest events from twitter dataset. This system can automatically learn keywords from the dataset and the dataset is filtered based on these identified keywords. Then clustering analysis is performed in order to detect tweets promoting civil unrest and analyze the impact of the protest on the public. Finally, extensive experimental evaluation and performance analysis are performed.

2 Related Work

In recent years much attention is given to Online Social Network Mining due to the availability of enormous volume of uncensored data posted by people, which focuses on Social Recommendations, Opinion Mining, Sentiment Analysis, Topic Detection and Tracking, Community Detection, Event Detection, and Forecasting. This section presents related works in the following areas: (1) Spatiotemporal mining of Social Media; (2) Event Detection and Forecasting; (3) Early detection of Suspicious Behaviors in Social Media; and (4) Civil Unrest event forecasting from Social Media.

2.1 Spatiotemporal Mining of Social Media

Considerable research work has been carried out by the researchers for studying the spatiotemporal event that is mainly relevant to the tweets posted within a certain geographical neighborhood. Thus, forecasting of such events requires an examination of spatial features and their correlations in addition to the temporal dimension. Ting Hua [14] reviewed several methods of spatiotemporal event detection and event forecasting. Judith Gelernter proposed a method for identifying locations and associating them with people by mining social media text conversations. Bo Hu [15] developed a probabilistic model for location recommendation by capturing the spatiotemporal aspects of user check-ins. Andrade [16] adopted a temporal approach for analyzing the cross-correlation between rainfall gauge data and rainfall-related Twitter messages by means of temporal units and their lag-time.

2.2 Event Detection and Forecasting

Most prior event detection research has focused on keywords present in the text also they rely on templates, dictionaries or presence of a specific pattern in the text. Wei Wang [17] extracted key sentences promoting civil unrest contain fields like participants, purpose, location and time using multiple instance learning. Yiming Yang [18] adapted the traditional hierarchical and non-hierarchical clustering techniques for online event detection based on semantic and temporal properties of events. Fang Jin [19] detected civil unrest events by representing the spatiotemporal structure of user activity in twitter in the form of graph wavelets. Minglai Shao [20] proposed a method to indicate the forthcoming or ongoing events in dynamic multivariate networks by measuring the significance of evolving sub graphs and subsets of attributes.

2.3 Early Detection of Suspicious Behaviors in Social Media

Considerable research work has been carried out in the area of Social Media Analysis. However, there has been relatively little work with respect to the early detection of Suspicious Behaviors targeting civil unrest, by observing text-based user’s conversations. Some of the significant works are presented in this section. Myriam Munezero [21] developed a framework to search for linguistic features that pertain to Anti Social Behaviors (ASBs) in order to use those features for the automatic identification of suspicious activities in texts. Dongjin Choi [22] proposed a method by using word similarity based on WordNet hierarchy and n-gram data frequency for distinguishing articles about terrorism. Burnap [23] built models that predicted information flow size and survival on Twitter following the terrorist event in Woolwich, London in 2013. Emilio Ferrara [24] has proposed a method to identify criminal networks from communication media such as mobile phones and online social networks that leave digital traces in the form of metadata.

2.4 Civil Unrest Event Forecasting from Social Media

Many events with a large number of people gathering to support a common case are not civil unrest events [25] rather it is typically defined by law enforcement as a gathering of three or more people, in reaction to an event, with the intention of causing a public disturbance in violation of the law. Ryan Compton and Jiejun Xu [26] proposed a strategy by simply applying various filters like keyword filter, future dates filter, and location filter for early detection of civil unrest from social media. Congyu [27] proposed to locate the predictive power of social media in its function as a protest advertisement and organization mechanism from the Global Database of Events, Location, and Tone (GDELT).

3 System Framework for Civil Unrest Detection

Social network analysis (SNA) has long been used for identifying social groups and for determining the relationships among the members of social groups. Figure 29.1 depicts the overall architecture of civil unrest detection system. It is divided into the following steps. First, all tweets between two dates are collected and preprocessed, where basic pre-processing steps are taken to clean the tweets and make them suitable for further processing. Second, automatic keyword learning is done based on the highest term frequency and significant keywords representing a particular protest are identified. Third, using this set of keywords the preprocessed tweets are filtered and the features used for detecting civil unrest are extracted from the resulting tweets. Fourth, clustering analysis is done to detect the essence of unrest content in those tweets in order to understand the influence of that protest on society.

3.1 Preprocessing

The extracted tweets contain many unwanted words, symbols, white spaces, acronyms, etc., and such unwanted elements must be eliminated so that they can be easily processed in future and yield results with maximum accuracy. So the raw tweets were cleaned and preprocessed in order to remove the stop words, punctuations, and unwanted symbols. And the tweets written in natural languages are translated into English by Google Translate in order to process the tweets incrementally.

3.2 Keyword Learning and Filtering

Then the average term frequency and inverse document frequency score for each word are calculated and words were listed in decreasing order. Then the top ranked 100 words were selected and they were highly related to the cause for protest, place of protest and the key actors of protest. And the keyword matching was applied to the complete dataset using these protest-related terms. Keyword matching method is used to measure the tweets containing information about the upcoming protest. We measured the volume of tweets containing protest-related keywords and future-oriented words. First, we applied the keyword matching method. Since the tweets were extracted in the period of BusFareHike protest we tried with the basic keywords related to that protest like #BusFareHike, #TNBusStrike were the most popular hashtags of that protest. The tweets containing these keywords were selected and aggregated by day and thus we collected a huge volume of tweets containing the post of twitter for the period of 8 days for each protest.

3.3 Clustering Model for Civil Unrest Detection

The unsupervised learning is highly useful in social media monitoring as it enables us to obtain an overview of the public opinion about an event by applying various clustering techniques. Clustering is the technique of collecting the similar type of components in one cluster. Tweets containing information about the same event express collective behavior. This can be used to make different clusters having keywords representing various civil unrest events like #SaveFisherMen, #BusFareHike, and #Jallikattu. Simple TF-IDF algorithm is used for making clusters.

Algorithm

Civil Unrest detection based on keyword extraction will be performed in four general steps as below:

Input : Document containing tweets.

Output : Number of Clusters each representing different protest events.

Step1: Remove stop words and repeated tweets from each posts.

Step2: Extracting keyword of the user tweets based on TFIDF method:

TF-IDF value is composed of two components TF and IDF values. The logical basis of TF value is that more frequent words in a document are more important than less frequent words. TF value in a document is the number of times a given term appears in that document. The IDF, which measures the importance of a term in the collection. Dividing the number of all documents by the number of documents containing the term, and then taking the logarithm of that quotient gives the value.

$$ tf\left(i,j\right)=\frac{n\left(i,j\right)}{\sum \limits_kn\left(k,j\right)} $$

(29.1)

n(i, j): The number of occurrences of the considered term in document d _j

$ \sum \limits_kn\left(k,j\right): $ The number of occurrences of all term in document d _j

$$ idf(i)=\log \left(\frac{\left|D\right|}{\left| dj: tj\in dj\right|}\right) $$

(29.2)

|D|: The total number of documents in the corpus

|dj : tj ∈ dj|: Number of documents where the term t_i appears

$$ tfidf\left(i,j\right)= tf\left(i,j\right)\times idf(i) $$

(29.3)

Step 3: Calculate cosine distance between each tweet as a measure of similarity such that

$$ \cos \theta =\frac{x.y}{\left|x\right|.\left|y\right|} $$

(29.4)

where x and y are term frequency-inverse document frequency (TF-IDF) vectors corresponding to documents x and y.

Step4: Clustering the tweets using the K Mean clustering algorithm.

4 Results and Discussion

The implementation process starts with the data collection. Twitter API allows the users to extract information needed by providing them separate login and access credentials. These credentials are used to handshake with the R tool. The tweets were extracted using the Twitter API and R tool. The twitter posts were called tweets and that were collected in the period of 22/01/2018 to 29/01/2018 for #TNBusFareHike protest. We retrieved about 35,000 tweets; which contains people’s opinions against the Tamil Nadu government for suddenly increasing the Bus Fare. Similarly, the dataset for #SaveFisherMen and #HydroCarbon protest was collected during the days of protest and they were aggregated by day. Thus we collected a huge volume of tweets for different protests.

Figure 29.2 shows the word cloud that is formed using the protest-related keywords identified from the tweets. The words that appear in bigger size are the words that appear frequently in the tweets. TF-IDF is the product of TF and IDF. When the Term Frequency is high and the Document Frequency is low (IDF is high) a high TF-IDF is obtained. TF/IDF and the many other clustering techniques work well if applied on a large size dataset. And also a bar chart representation of frequent words that appear in the tweets which promote protest is prepared. It clearly shows the comparison between the word frequent counts.

The top-ranked frequent words in each document containing tweets of a particular event were taken which are shown in Fig. 29.3. Based on the ranking the words that occur very frequently are considered to be the keywords which are used to cluster the tweets of a particular protest. The following Table 29.1 lists the collection of keywords extracted from sample data for different civil unrest events.

Table 29.1 Keywords extracted to identify tweets of different protests

Full size table

The clustering analysis complete with the process of measurement of cluster validation by evaluating the clustering algorithms used. This study uses an internal validation since the dataset used in this system do not have prior knowledge, but uses the information residing in the data. A user study was conducted on 150 real-time tweets to validate the clusters. There are several types of indices to determine the optimal cluster of internal validation, one among is Sum of Squared Error (SSE). When the clusters are well separated the “goodness” of the resulting clusters can be evaluated using Sum of Squared Error (SSE) to measure the compactness of the cluster. Sum of Squared Error (SSE) is calculated as,

$$ \mathrm{SSE}=\sum \limits_{i=1}^n{\left({x}_i-\hat{x_i}\right)}^2 $$

(29.5)

where each x _i is the actual value of observation, each $ \hat{x_i} $ is the estimated or forecast value of observation.

By comparing the Sum of Squared Error (SSE) of the different number of clusters is one of the ways to determine the appropriate number of cluster. SSE is defined as the sum of the squared distance between each member of a cluster and its cluster centroid. The plot of the SSE against the number of clusters k shown in Fig. 29.4 shows that as the k-value increases the SSE value decreases since clusters become smaller. In Fig. 29.4, the first elbow is found for the k-value 3. Thus the optimum number of clusters for the dataset is 3. To enable the detection and make the probability estimation feasible, we repeated the experiment using various datasets and the results were improved.

5 Conclusion

In this paper, we investigated existing text-mining methods for detecting civil unrest contents for preventing from the upcoming protest. Specifically, we proposed the Keyword-Based approach to detect civil unrest from social media before it may occur. We learned civil unrest keywords to train real-time tweets with clustering algorithm and tackled the problem of detecting civil unrest events. We integrated our ideas in a modular framework and experimentally demonstrated the validity and scalability of the method. The performance of the system can be improved, (1) to include location extraction method, by applying more advanced Geotagging scheme, using GPS signals, and by using information about the Twitter graph to estimate the location of a tweet from the location of related Twitter users, (2) multilingual text analysis can be applied to improve the clustering accuracy.

References

http://www.chinapost.com.tw/taiwan/national/national-news/2015/02/26/429715/Cabinet-on.html
S. Wen, J. Jiang, X. Yang, S. Yu, To shut them up or to clarify: Restraining the spread of rumors in online social networks. IEEE Trans. Parallel Distributed Syst. 25(12), 3306–3316 (2014)
Article Google Scholar
E. Ferrara, Manipulation and abuse on social media. SIGWEB Newsletter, (Spring), 4 (2015)
Google Scholar
J. Joslin Iyda, S. Visalaxi, G. Anitha, Discovering criminal communities from e-mails a graph-based approach. J. Chem. Pharm. Sci. Spec. Iss. 9, 44–49 (2016)
Google Scholar
M. Alzaabi, K. Taha, T.A. Martin, CISRI: A crime investigation system using the relative importance of information spreaders in networks depicting criminals communications. IEEE Trans. Inf. Forensics Secur. 10(10), 2196–2212 (2015)
Article Google Scholar
Charu C. Aggarwal IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, Counterterrorism, Social Network Analysis, in Encyclopedia of Social Network Analysis and Mining (Springer Book, 2014), pp. 285–289
Google Scholar
Holly Paquette, Social media as a marketing tool: a literature review, Major Papers by Master of Science Students. Paper 2, 2013.
Google Scholar
Xueyan Zhou, Jing Yang, Zehong Lin, Jianpei Zhang, “ITEPE: A source tracing algorithm for the microblog”, PLoS ONE, 9:e111380, 2014.
Google Scholar
Mohammed Mahmood Ali, Khaja Moizuddin Mohammed, Lakshmi Rajamani, Framework for surveillance of instant messages in instant messengers and social networking sites using Data mining and Ontology, in Proceeding of the 2014 IEEE Students’ Technology Symposium, IIT Kharagpur, 2014
Google Scholar
Z. Wang, W. Zhu, P. Cui, L. Sun, S. Yang, Social media recommendation, in Social Media Retrieval, Computer Communications and Networks, ed. by N. Ramzan et al., (Springer-Verlag, London, 2013). https://doi.org/10.1007/978-1-4471-4555-4_2
Chapter Google Scholar
Yaniv Altshuler, Wei Pan, and Alex (Sandy) Pentland, Trends prediction using social diffusion models, in Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, and Prediction (2012)
Google Scholar
Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, Erel Uziel, IBM Research Lab, Social media recommendation based on people and tags, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, July 19–23 July 2010
Google Scholar
Mary McGlohon, Leman Akoglu, Christos Faloutsos, Statistical properties of social networks, in Social Network Data Analytics ed. by C.C. Aggarwal (Springer, Science Business Media, LLC, 2011). doi: 10.1007/978-1-4419-8462-3_2
Chapter Google Scholar
T. Hua, Z. Liang, F. Chen, C.-T. Lu, N. Ramakrishnan, How events unfold: Spatiotemporal mining in social media. ACM Newslet. Sigspatial 7(3), 19–25 (2015)
Google Scholar
Bo Hu, Mohsen Jamali, Martin Ester, Spatio-temporal topic modeling in mobile social media for location recommendation, in IEEE 13th International Conference on Data Mining (ICDM), USA, 2013, ISSN: 1550-4786
Google Scholar
S.C. de Andrade, C. Restrepo-Estrada A.C.B. Delbem, E.M. Mendiondo, J.P. de Albuquerque, Mining rainfall spatio-temporal patterns in twitter: a temporal approach, in Societal Geo-innovation, 20th AGILE Conference on Geographic Information Science (Springer, 2017)
Google Scholar
Wei Wang, Yue Ning, Huzefa Rangwala, Naren Ramakrishnan, A multiple instance learning framework for identifying key sentences and detecting events, in CIKM’16, ACM, 24–28 October, 2016
Google Scholar
Yiming Yang, Tom Pierce, Jaime Carbonell, A study on retrospective and online event detection, in ACM Conference SIGR’98, Melbourne, 1998
Google Scholar
Fang Jin, Feng Chen, Rupinder Khandpur, Chang-Tien Lu, Naren Ramakrishnan. Absenteeism detection in social media, in Proceedings of the SIAM International Conference on Data Mining (SDM’17), Houston, TX, Apr 2017
Google Scholar
Minglai Shao, Jianxin Li, Feng Chen, Hongyi Huang, Shuai Zhang, Xunxun Chen, An efficient approach to event detection and forecasting in dynamic multivariate social media networks, in ACM Conference WWW 2017, Australia, 2017
Google Scholar
M. Munezero, C.S. Montero, T. Kakkonen, E. Sutinen, Automatic detection of antisocial behaviour in texts. J. Informatica 38, 3–10 (2014)
Google Scholar
D. Choi, B. Ko, H. Kim, P. Kim, Text analysis for detecting terrorism-related articles on the web. J. Netw. Comput. Appl. 38, 16–21 (2014)
Article Google Scholar
P. Burnap, M.L. Williams, L. Sloan, Tweeting the terror: modeling the social media reaction to the Woolwich terrorist attack. J. Soc. Netw. Anal. Mining 4, 206 (2014)
Article Google Scholar
E. Ferrara, P. De Meo, S. Catanese, G. Fiumara, Detecting criminal organizations in mobile phone networks. Int. J. Expert Syst. Appl. 41(13), 5733–5750 (2014)
Article Google Scholar
A. Hoegh, S. Leman, P. Saraf, N. Ramakrishnan, Bayesian model fusion for forecasting civil unrest. J. Technometrics 57, 332–340 (2015)
Article MathSciNet Google Scholar
J. Xu, T.C. Lu, R. Compton, D. Allen, Civil unrest prediction: a tumblr-based exploration, in Social Computing, Behavioral-Cultural Modeling and Prediction, ed. by W. G. Kennedy, N. Agarwal, S. J. Yang, vol. 8393, (Springer, Cham, 2014). SBP 2014. Lecture Notes in Computer Science
Chapter Google Scholar
C. Wu, M.S. Gerber, Forecasting civil unrest using social media and protest participation theory. IEEE Trans. Comput. Soc. Syst. 5(1), 82–94
Article Google Scholar

Download references

Author information

Authors and Affiliations

Anna University, Chennai, Tamil Nadu, India
J. Joslin Iyda
Rajalakshmi Engineering College, Chennai, Tamil Nadu, India
J. Joslin Iyda
Department of Information Science and Technology, CEG, Anna University, Chennai, Tamil Nadu, India
P. Geetha

Authors

J. Joslin Iyda
View author publications
You can also search for this author in PubMed Google Scholar
P. Geetha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Joslin Iyda .

Editor information

Editors and Affiliations

Department of Computer Science & Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India
Anandakumar Haldorai
Department of Computer Science & Engineering, Presidency University, Bengaluru, India
Arulmurugan Ramu
Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India
Sudha Mohanram
Department of Electrical Engineering, Faculty of Engineering, University of Malaysia, Kuala Lumpur, Kuala Lumpur, Malaysia
Chow Chee Onn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iyda, J.J., Geetha, P. (2020). Keyword-Based Approach for Detecting Civil Unrest Events from Social Media. In: Haldorai, A., Ramu, A., Mohanram, S., Onn, C. (eds) EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-19562-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-19562-5_29
Published: 19 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19561-8
Online ISBN: 978-3-030-19562-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Keyword-Based Approach for Detecting Civil Unrest Events from Social Media

Abstract

Similar content being viewed by others