Determining the Veracity of Rumours on Twitter

Giasemidis, Georgios; Singleton, Colin; Agrafiotis, Ioannis; Nurse, Jason R. C.; Pilgrim, Alan; Willis, Chris; Greetham, D. V.

doi:10.1007/978-3-319-47880-7_12

Georgios Giasemidis¹⁵,
Colin Singleton¹⁵,
Ioannis Agrafiotis¹⁶,
Jason R. C. Nurse¹⁶,
Alan Pilgrim¹⁷,
Chris Willis¹⁷ &
…
D. V. Greetham¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10046))

Included in the following conference series:

International Conference on Social Informatics

3305 Accesses
23 Citations
26 Altmetric

Abstract

While social networks can provide an ideal platform for up-to-date information from individuals across the world, it has also proved to be a place where rumours fester and accidental or deliberate misinformation often emerges. In this article, we aim to support the task of making sense from social media data, and specifically, seek to build an autonomous message-classifier that filters relevant and trustworthy information from Twitter. For our work, we collected about 100 million public tweets, including users’ past tweets, from which we identified 72 rumours (41 true, 31 false). We considered over 80 trustworthiness measures including the authors’ profile and past behaviour, the social network connections (graphs), and the content of tweets themselves. We ran modern machine-learning classifiers over those measures to produce trustworthiness scores at various time windows from the outbreak of the rumour. Such time-windows were key as they allowed useful insight into the progression of the rumours. From our findings, we identified that our model was significantly more accurate than similar studies in the literature. We also identified critical attributes of the data that give rise to the trustworthiness scores assigned. Finally we developed a software demonstration that provides a visual user interface to allow the user to examine the analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://scikit-learn.org/.
2.
https://pythonhosted.org/neurolab/.
3.
http://www.nltk.org/.
4.
https://networkx.github.io/.
5.
We abandoned Neural Networks at early stages as the library implementation used was very slow and the results were underperforming.
6.
A phenomenon known as the “wisdom of the crowd”.
7.
We compute the average of the first eight models because this is the range where the classifiers peak their performance. As we argue in Sect. 3 all plots indicate that classifiers performance decreases when more than eight features are added.

References

Cambridge Advanced Learner’s Dictionary and Thesaurus. Cambridge University Press. http://dictionary.cambridge.org/dictionary/english/rumour
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
MATH Google Scholar
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International conference on World wide web, pp. 675–684. ACM (2011)
Google Scholar
Castillo, C., Mendoza, M., Poblete, B.: Predicting information credibility in time-sensitive social media. Internet Res. 23(5), 560–588 (2013)
Article Google Scholar
Pennebaker, J.W., Booth, R.J., Boyd, R.L., Francis, M.E.: Linguistic Inquiry and Word Count: LIWC 2015. Pennebaker Conglomerates, Austin (2015). www.LIWC.net
Google Scholar
Finn, S., Metaxas, T.P., Mustafraj, E.: Investigating rumor propagation with TwitterTrails. arXiv:1411.3550 (2014)
Fox, J.: Applied Regression Analysis, Linear Models, and Related Methods. Sage Publications, London (1997)
Google Scholar
Gil, Y., Artz, D.: Towards content trust of web resources. Web Semant. Sci. Serv. Agents World Wide Web 5(4), 227–239 (2007)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)
Book MATH Google Scholar
Kelton, K., Fleischmann, K., Wallace, W.: Trust in digital information. J. Am. Soc. Inf. Sci. Technol. 59(3), 363–374 (2008)
Article Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge (2009)
MATH Google Scholar
Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining, pp. 1103–1108. IEEE (2013)
Google Scholar
Lomax, G.R., Hahs-Vaughn, D.L.: An Introduction to Statistical Concepts. Routledge, New York (2012)
Google Scholar
Lukyanenko, R., Parsons, J.: Information quality research challenge: adapting information quality principles to user-generated content. J. Data Inf. Qual. (JDIQ) 6(1), 3 (2015)
Google Scholar
Mai, J.: The quality and qualities of information. J. Am. Soc. Inf. Sci. Technol. 64(4), 675–688 (2013)
Article Google Scholar
Mendoza, M., Poblete, B., Castillo, C.: Twitter under crisis: can we trust what we RT? In: Proceedings of the First Workshop on Social Media Analytics, pp. 71–79. ACM, New York (2010)
Google Scholar
Nurse, J.R.C., Agrafiotis, I., Goldsmith, M., Creese, S., Lamberts, K.: Two sides of the coin: measuring and communicating the trustworthiness of online information. J. Trust Manag. 1(5), 1–20 (2014). doi:10.1186/2196-064X-1-5
Google Scholar
Nurse, J.R.C., Creese, S., Goldsmith, M., Rahman, S.S.: Supporting human decision-making online using information-trustworthiness metrics. In: Marinos, L., Askoxylakis, I. (eds.) HAS 2013. LNCS, vol. 8030, pp. 316–325. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39345-7_33
Chapter Google Scholar
Nurse, J.R.C., Rahman, S.S., Creese, S., Goldsmith, M., Lamberts, K.: Information quality and trustworthiness: a topical state-of-the-art review. In: Proceedings of the International Conference on Computer Applications and Network Security (ICCANS) (2011)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pew Research Center: The evolving role of news on Twitter and Facebook (2015). http://www.journalism.org/2015/07/14/the-evolving-role-of-news-on-twitter-and-facebook
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Reuters Institute for the Study of Journalism: Digital news report 2015: tracking the future of news (2015). http://www.digitalnewsreport.org/survey/2015/social-networks-and-their-role-in-news-2015/
Seo, E., Mohapatra, P., Abdelzaher, T.: Identifying rumors and their sources in social networks. In: SPIE Defense, Security, and Sensing, p. 83891I. International Society for Optics and Photonics (2012)
Google Scholar
Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)
Article MathSciNet Google Scholar
The Guardian: How riot rumours spread on Twitter (2011). http://www.theguardian.com/uk/interactive/2011/dec/07/london-riots-twitter
Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 758–770. Springer, Heidelberg (2005). doi:10.1007/11494669_93
Chapter Google Scholar
Vosoughi, S.: Automatic detection and verification of rumors on Twitter. Ph.D. thesis, MIT (2015)
Google Scholar
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)
Article Google Scholar
Zubiaga, A., Liakata, M., Procter, R., Bontcheva, K., Tolmie, P.: Towards detecting rumours in social media. arXiv preprint arXiv:1504.04712 (2015)

Download references

Acknowledgements

This work was partly supported by UK Defence Science and Technology Labs under Centre for Defence Enterprise grant CDE42008. We thank Andrew Middleton for his helpful comments during the project. We would also like to thank Nathaniel Charlton and Matthew Edgington for their assistance in collecting and preprocessing part of the data.

Author information

Authors and Affiliations

CountingLab Ltd., Reading, UK
Georgios Giasemidis & Colin Singleton
Department of Computer Science, University of Oxford, Oxford, UK
Ioannis Agrafiotis & Jason R. C. Nurse
BAE Systems Applied Intelligence, Chelmsford, UK
Alan Pilgrim & Chris Willis
Department of Mathematics and Statistics, University of Reading, Reading, UK
D. V. Greetham

Authors

Georgios Giasemidis
View author publications
You can also search for this author in PubMed Google Scholar
Colin Singleton
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Agrafiotis
View author publications
You can also search for this author in PubMed Google Scholar
Jason R. C. Nurse
View author publications
You can also search for this author in PubMed Google Scholar
Alan Pilgrim
View author publications
You can also search for this author in PubMed Google Scholar
Chris Willis
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Greetham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Giasemidis .

Editor information

Editors and Affiliations

University of Washington, Seattle, Washington, USA
Emma Spiro
Indiana University, Bloomington, Indiana, USA
Yong-Yeol Ahn

Appendices

Appendix

A Data Collection Process

Our data collection process consists of four main steps:

1.
Collection of tweets with a specific keyword, e.g. “#ParisAttacks” or “Brussels”. The Twitter API only allows the collection of such tweets within a ten-day window. For this reason this step must start as soon as an event happens or a rumour begins.
1. (a)
  Manual analysis of tweets and search for rumours. In this step we filter out all the irrelevant tweets. For example, if we collected tweets containing the keyword “Brussels” (due to the unfortunate Brussels attacks), we ignore tweets talking about holidays in Brussels.
2. (b)
  Collection of more tweets relevant to the story with keywords that we missed in the beginning of Step 1 (this step is optional). For example, while searching for rumours we might come across tweets talking about another rumour. We add the keyword that describes this new rumour in our tweet collection.
3. (c)
  Categorise tweets into rumours. Group all tweets referring to the same rumour.
4. (d)
  Identify all the unique users involved in a rumour. This set of users will be used in Steps 2 to 4.
2.
Collect users’ most recent 400 tweets, posted before the start of the rumour. This step is required because we aim to examine the users’ past behaviour and sentiment, e.g. whether users’ writing style or sentiment changes during the rumour, and whether these features are significant for the model. To the best of our knowledge, this set of features is considered for the first time in the academic literature in building a rumour classifier.
3.
Collect users’ followees (friends). This data is essential for making the propagation graph, see Sect. 2 and Appendix B.
4.
Collect users’ information, including user’s registration date and time, description, whether account is verified or not etc.

1.1 A.1 Rumours Summary Statistics

We provide a summary statistics table of the 72 collected rumours, see Table 2. This table shows the total number, mean, median, etc., of the distributions of the number of tweets, the percentage of supporting tweets, etc., of the 72 rumours, as well as some statistics of four example rumours. We collected about a 100 million tweets, including users’ past tweets. From the collected tweets, about 327.5 thousand tweets are part of rumours. These tweets contributed to the message-based features of the classification methods. The users’ past tweets contributed only to the features capturing a user’s past behaviour.

Table 2. A summary statistics of the collected rumours. Examples 1 and 2 correspond to the rumours with the largest and second largest number of tweets respectively. Examples 3 and 4 correspond to the rumours with the second smallest and smallest number of tweets respectively.

Full size table

B Making the Propagation Graph

Nodes in the propagation tree correspond to unique users. Edges are drawn between users who retweet messages. However the retweet relationship cannot be directly inferred from the Twitter data. Consider a scenario with three users, A, B and C. User A posts an original tweet. User B sees the tweet from user A and retweets it. Twitter API returns an edge between user A and user B. If user C sees the tweet from user B and retweets it, Twitter API returns an edge between the original user A and user C, even though user A is not a friend with user C and there is no way user C could have seen the tweet from user A. To overcome this, we have collected the users followees. Therefore, in our scenario user B is connected to user C only if the retweet timestamp of user C is later than the retweet of user B and user B is in the followees list of user C.

C A Practical Example for Using Formula (1)

Here, we elaborate on formula (1) and present a practical example. For simplicity reasons and to avoid confusion we define support, $S^{(i)}$, neutral, $N^{(i)}$, and against, $A^{(i)}$, terms in formula (1) following the example attributes given in Sect. 2.2. The generalisations are straightforward. If the attribute of the tweet is a binary indicator, for example whether a tweet contains a URL link or not, we define

$$\begin{aligned} S^{(i)}= & {} \frac{\text {number of tweets with url that support the rumour}}{\text {total number of tweets that support the rumour}}, \\ N^{(i)}= & {} \frac{\text {number of tweets with url that are neutral to the rumour}}{\text {total number of tweets that are neutral to the rumour}}, \\ A^{(i)}= & {} \frac{\text {number of tweets with url that deny the rumour}}{\text {total number of tweets that deny the rumour}}. \end{aligned}$$

If the attribute of the tweet is continuous, for example, the number of words in a tweet, we then define

$$\begin{aligned} S^{(i)}= & {} \text {average number of words in tweets that support the rumour}, \\ N^{(i)}= & {} \text {average number of words in tweets that are neutral to the rumour}, \\ A^{(i)}= & {} \text {average number of words in tweets that deny the rumour}. \end{aligned}$$

These expressions are then combined through formula (1) to give the relevant feature of the rumour.

D Feature Reduction Methods

Since our dataset consists of 72 rumours, from theoretical and experimental arguments, we expect the relevant features to be about 10. We expect models with as many as 20 features to begin to show a decrease in performance. For this reason we set the upper bound on the number of features to be 30 and aim to examine models with an increasing number of features from 1 to 30. If this bound proves to be low we will reconsider this choice. However as it becomes evident in Sect. 3, this bound is satisfactory.

In this study we use four methods which are combinations of those described so far. For filtering we use the ANOVA F-test [14].

Method 1.
A combination of filter method, random wrapper and deterministic wrapper
1. (a)
  Use ANOVA F-Statistics for filtering. Keep the 30-best scoring features.
2. (b)
  From those 30-best we applied the classifier to 100,000 different combinations of 3 features to find the combination of 3 which maximise the $F_1$-score.
3. (c)
  Add one-by-one the remaining 27 features by applying the classifier and keeping the one with the best $F_1$-score in each round.
Method 2.
A forward selection deterministic wrapper method
1. (a)
  Apply the classifier to all features individually and select the one which maximises the $F_1$-score (from all available features, no pre-filtering is required).
2. (b)
  Scan (by applying the classifier) all remaining features to find the combination of two (one from step a.) that maximises the $F_1$-score.
3. (c)
  Continue adding one-by-one the features which maximise the $F_1$-score until the number of features reaches 30.
Method 3.
A combination of filter method and forward selection method
1. (a)
  Use the ANOVA F-Statistics for filtering and keep the 30-best scoring features.
2. (b)
  Apply the classifier and find the best-scoring, i.e. maximum $F_1$-score, from the 30-best selected from the filtering method (step a).
3. (c)
  Continue adding one-by-one the features which maximise the classification $F_1$-score.
Method 4.
A feature transformation method
1. (a)
  Use a feature transformation method, the principal component analysis. Keep the 30-best components.
2. (b)
  Start with the principal component from the 30-best selected from step a.
3. (c)
  Start adding the components one after the other.

We apply each method to each classifier separately, using scikit-learn’s default parameters, and assess it using k-fold cross validation. We have abandoned the Neural Network method for two reasons. First its performance was poor compared to the other methods and secondly it required long computational times which slowed down considerably the analysis of the results. We plot the $F_1$-score as a function of the number of features for the remaining classifiers and each feature reduction method, see Fig. 4.

We observe that the second method (red line in Fig. 4) outperforms, in almost all cases, all the other techniques. Similar plots are produced and same conclusion is reached for the other classifiers too. Therefore we can safely conclude that the forward selection deterministic wrapper is consistently the best-performing method of feature reduction for all classifiers.

E Further Results on Classifier Selection

In Sect. 3 we present the results from running several classifiers for thirty models, each model having an increasing number of features from one to thirty. Here we present more results that support our choice for feature selection.

In Fig. 5 we plot the average $F_1$-score for each method. This is a two-column plot. The first column (blue) corresponds to the average $F_1$-score of all 30 models. The second column (red) is the average $F_1$-score of the first eight models (those with number of features from 1 to 8)^{Footnote 7}.

F Visualisation Tool

As a by-product of our modelling, we also developed a software tool which helps the user to visualise the results and gain a deeper understanding of the rumours, see Fig. 6. The tool consists of three layers. On the first layer the user selects a topic of interest (e.g. “Paris Attacks”). This directs to the second layer which displays all the relevant rumours with a basic summary (e.g. the rumour claim, timestamp of the first tweet, a word cloud, distribution of the tweets that are in favour, neutral or against the rumour and the modelled veracity). After selecting a rumour of interest, the user is navigated to the third layer, shown in Fig. 6. There, the tool shows several figures, such as the propagation forest (supporting, neutral and denying trees are coloured in green, grey and red respectively), a histogram showing the number of tweets in favour of the rumour, against the rumour, and those that are neutral, a plot of classifier’s features and the rumour veracity. A time-slider is provided to allow the user to navigate through the history of the rumour by selecting one of the available time steps. Moving the slider the user can investigate how the rumour, its veracity and the key features evolve over time. This gives the flexibility to the user to explore the key factors that affect the veracity of the rumour.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giasemidis, G. et al. (2016). Determining the Veracity of Rumours on Twitter. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-47880-7_12
Published: 23 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics