Keywords

1 Introduction

Every presidential election of United States of America (USA) is hugely significant for the country and as well for the world due to all the economic and trade relations USA has with other countries. United States presidential election of 2016 was held on Tuesday, November 8, 2016. The final two candidates for the presidential election of 2016 was Republican Donald Trump and Democrat Hillary Clinton. Although initially Hillary Clinton had higher visibility, Donald Trump won the 2016 election. According to Statista, out of 251 million voters in USA, there are around 67 million monthly active users in Twitter. Thus Twitter data can become a significant source of information for analyzing the impact of the election. Now days, people without meeting physically can create, share and exchange their thoughts, ideas, opinions, information, videos, images and other digital content through social media platforms like Twitter [28]. In Twitter anything tweeted by an user becomes available to others following the discussion.

The objective of this paper is to understand the impact of Twitter on the US Presidential Elections 2016. This research paper attempts to evaluate how the sentiments and topics evolving among the voters change over the period of time of the election and how there voting preferences was getting polarized over the period of election. For our study we had collected 784,153 tweets from 287,838 users on USA election over the 18 weeks, starting from 13th of the August to 10th of December, 2016. We applied different social media analytics methods and Newman and Sheth’s model of Voter Choice to get a better understanding of discussions and voting outcome.

2 Review of Literature

The first section in the review of literature highlights the importance of social media and links it to the current context. The second section, highlights some of the public policies related to social media which have been already explored in existing literature.

2.1 Importance of Social Media

Social media platforms are important for various sectors such as education [34], marketing [40], customer engagement [15], brand management [20], product and services promotions [30], recruitment [16], sales forecasting [3] and in evaluation of corporate agility [37] purposes. More and more people are joining these platforms and interacting within the virtual communities in specific interest domains and domain specific understanding may be developed by analyzing user generated content and understand market dynamics [19, 41]. Social media data (i.e. user generated content) had been extensively used for analyzing real life problems such as predicting flu trends [1], predicting electoral forecasting [5], engaging with voters [2], identifying social tensions [6], evaluating voting intentions [18] and measuring transition in organization behavior [22].

2.2 Social Media and Public Policy

Literature highlights that Twitter had been used by Chicago Department of Public Health for the campaigns of electronic cigarettes in public health policy [14]. Twitter was used for giving the early warning of the natural hazards to citizens as done by Indonesian government in 2012 [9]. Sentiments relating to “climate” had also been analyzed to understand social sensitivity towards the environment [10]. Literature indicates post on tobacco and its new products had been analyzed for policy purposes [29].

The evidences and potential of using Twitter to uncover unbiased information from user generated content was the driver for choosing Twitter data for our study.

3 Research Contribution

The contributions of this study is interdisciplinary and addresses both political science and social media literature. To develop a better understanding of the event of USA elections, some of our objectives and contributions are listed subsequently:

  • What is the nature of discussions surrounding US elections?

  • Which types of social discussions tend to affect outcome of elections?

  • Are there indications of polarization of voting preferences during the entire period?

This study highlights a new mixed research methodology of developing insights out of the real time events and the discussion surrounding them in social media. The study lists down the four methodologies for analyzing the Twitter data such as descriptive, content, network and time-space analysis. We try to explain the insights derived out of Twitter analytics using the Newman and Sheth’s model of Voter Choice Behavior [43]. In particular, the focus of how polarization happened in voting choices in social media platform, is a unique contribution of the existing study.

4 Research Methodology

The methodology had been divided into the five phase such as phase 1 identifies the search terms to extract the data from Twitter. For this study, a list of election related search terms like “USA election”, “Hillary Clinton” and “Donald Trump” were identified based on listing in Twitter trends. Phase 2 of the study focuses on extracting data from Twitter. The unstructured data collected through the Twitter API using Python scripts was in JSON format. Phase 3 of the study helps in converting unstructured data to structured data, i.e. JSON to the structured excel format. The steps in phase 2 and 3 where repeated daily over the 18 weeks to extract the data from the Twitter. Phase 4 helps in digging the insights of the data through various Twitter analysis methodologies such as descriptive, content, network and time-space analysis. Table 1 illustrates an indicative list of methods for Twitter analytics. The Phase 5 explains the impact of the findings through the Newman model of voter behavior using seven concepts like issues and policies, social imagery, emotional feelings, candidate image, current events, personal events and epistemic issues. Figure 1 illustrates the flow of analysis based on Twitter analytics mapped to voter behavior model as adopted in this study.

Table 1. Overview of Twitter analytics methods
Fig. 1.
figure 1

Proposed model of analyzing voter behavior

Social media analytics can extract the crucial information from the user profiles, hashtags, groups, communities and search terms. Thus social media analysis can give us useful, non-biased user preferences without intruding the context. An indicative list of methods for Twitter analytics is illustrated in the Table 1.

Descriptive analysis focuses on descriptive statistics, such as the number of tweets and its types, number of unique users, hashtags, @mention and hyperlinks added in the tweets with frequency, word cloud and the reach metrics. Word clouds help us to visualize the popular words/topics tweets [31]. The “reach” metric can be used to measure the reach of the messages of the influencer [12]. Similarly reply and retweet feature in Twitter helps in assessing two way interaction and engagement [33]. The tweet can contain the hyperlinks as well to relevant resources [39]. The hashtags are used in the tweets so that the tweet opinion can be associated with a wider community of similar interest [8]. Similarly the @mentions analysis helps in identifying the influencers who had influenced the users to the extent that he/she wants to have a discussion with the influencer on the tweet topic [36].

Content analysis is used to extract the semantic intelligence from the text data. It leverages upon natural language processing (NLP) and text mining to retrieve the information from large amount of the text data. For example, sentiment analysis includes two types of the analysis such as polarity analysis and emotion analysis. Sentiment analysis is the process of computationally identifying and categorizing the opinions of the text [21]. For this study the sentiment analysis of the tweets were done using the R using syuzhet, lubridate and dplyr libraries. Polarity analysis is one of the highest used techniques for Twitter data analysis to measure the opinions of the user [35]. The e-motion analysis is one of the sentiment analysis techniques where user generated content is grouped into eight emotions categories such as anger, anticipation, disgust, fear, joy, sadness, surprise and trust. Similarly, topic modelling identifies the key themes among the tweets through mining of unstructured text [26]. Topic modelling was done in our study by using the tm and topicmodels libraries of R.

5 Findings and Interpretation

A descriptive overview of the Twitter activity of Clinton and Trump is presented in Table 2, which illustrates the degree of interaction both candidates had with the voters.

Table 2. Descriptive statistics of activity and engagement

We also tried to assess the possibility of voter’s polarization in terms of their preferences. For understanding the same, the election period was divided into the two phases of 60 days each. Phase 1 was considered from August 13, 2016 to October 11, 2016 and Phase 2 was considered from October 12, 2016 to December 10, 2016. For both the phases the tweets was segregated on the basis of Clinton and Trump. The sentiment analysis was applied on tweets for identifying the polarity. Table 3 illustrates the count for users in which sentiment transition had occurred during the election period for Trump and Clinton respectively. For Trump there was around 48.18% of polarization whereas for Clinton there was around 49.66% polarization.

Table 3. Impact assessment of polarization of preferences among voters

In the subsequent section, we attempt to explain based insights derived from “USA Election Twitter data” by applying Twitter analytics method through the Newman and Sheth’s model of voter choice, through seven distinct and separate cognitive domains which drives the voter’s behavior. These factors are issues and policies, social imagery, emotional feelings, candidate image, current events, personal events and epistemic issues [43]. Validation of insights from Twitter analytics, is done by exploring news and blog articles for confirmatory evidences.

5.1 Issues and Policies

This factor tries to address the economic policy, foreign policy and social policy raised by candidate during the election period and the leadership characteristics possess by the candidate. Literature highlights the issues and policies are important component in influencing the voter behavior [43]. Voters will vote for candidate that will provide them with higher level of utility. Economy policy refers to the policies focusing on reducing inflation and budget balancing. Foreign policies include polices like increasing the defense spending. The tweets from both the presidential candidates Twitter screen where extracted and classified into four areas such as economy, foreign policy, social issues and leadership with the help of content analysis. The content analysis procedure was applied on the tweets by both the judges individually. There were 14,508 decision points (2400 tweets of Hillary Clinton, 1227 tweets of Donald Trump and four areas. Two independent judges agreed on 13,293 decisions and disagreed on 1,215 decisions with a coefficient of reliability of 91.62% which satisfies the thresholds of being over 85% [44]. Figure 2 illustrates the counts of the tweets posted by presidential candidates regarding the policies and issues.

Fig. 2.
figure 2

Issues and policies raised by Clinton (left cloud) and Trump (right cloud)

There were around 167 tweets posted by Hillary Clinton regarding the policies and issues where as Donald Trump raised 138 tweets only. Clinton discussed various social issues surrounding the women and children related to equality, safety, empowerment and child care leave, disability, free education, career progression and mental stability. Clinton’s tweets were focusing more on social issues (and Trump’s policies!) whereas Trump was focusing more on economy and foreign policies like fighting against terrorism and crime, immigration, raising jobs and easing the business processes in USA.

5.2 Social Imagery

The factor refers to image of the candidate perceived by the voter in his/her mind. The candidate can have positive and negative stereotypes of the candidate depending on the various attributes such as demographic, socioeconomical, cultural, ethical, political and ideological dimensions. Figure 3 show the top 30 popular hashtags in the election period through which the social image of the candidate can be highlighted. Interestingly, WikiLeaks had released around 20,000 emails with almost 8,000 attachments of Democratic National Committee which indicated possibility of corruption in campaigns led by Clinton. Such discussions are indicated with hashtags like #podestaemails, #wikileaks, and #crookedhillary. However the popularity of #iamwithher was also one of the dominant among the hash tags, which indicate a huge amount support for Clinton.

Fig. 3.
figure 3

Top 30 hashtags in election discussions and their dominant imagery in Tweets

The hashtags in green box indicates positive imagery of the Clinton, hashtags in red box indicates negative imagery of Clinton, the hashtag in blue box show positive imagery of Trump and no negative imagery appear among top 30 hashtags for Trump.

5.3 Emotional Feelings

Emotional feelings refer to the personal feelings possessed by the voter towards the candidate. A comparative analysis of all discussions surrounding the two candidates was conducted in terms of emotion analysis, as illustrated in Fig. 4. Higher visibility and presence among social discussions are likely to win an election through possible polarization [24]. In sheer volumes, discussions centered on Clinton surpassed all discussions surrounding Trump, in terms of all sentiments. This outcome is also comparable in the emotion comparison where the difference is highly contrasted for emotions like trust, anger, anticipation, fear and disgust. In terms of surprise, however, count of tweets were somewhere comparative surrounding both the candidates.

Fig. 4.
figure 4

Emotion analysis around Clinton and Trump

5.4 Candidate Image

This factor refers to the salient personality traits of candidate image. However, in terms of percentage of tweets, polarity is somewhat similar. But given the difference of number of tweets, it is apparent that discussions surrounding Clinton, negative as well as more positive tweets, are more as compared to that of Trump.

Voters make up their opinion of vote on the basis of “candidate image” rather than referencing into election campaign issues, which result in interaction and engagement. Figure 5 illustrates the top 30 @mention along with their frequency over 18 weeks. Among 784,153 tweets there are 32,568 tweets which had @realdonaldtrump (around 4.15%) and 20,515 tweets had @hillaryclinton (around 2.61%). However the third popular mention was @wikileaks where a lot of debate was presented surrounding corruption of Clinton’s administration. This is indicative that the role of WikiLeaks may have been significant in deciding the outcome of the final result. Further dominant mentions were from news and journalism based sources (cnn, nytimes, reuters, foxnews). Further the role of opinion leaders like Linda Suhler and Mike Cernovich is also highlighted, who actively supported Trump, is also indicative in the outcome.

Fig. 5.
figure 5

Polarity analysis and top @mentions in USA election discussions

5.5 Current Events

This factor takes into the account all the events which had happened during the course of the election campaigning it includes both the domestic and international situations that would cause the voter to switch his/her voting preference. Since topic modeling is extremely computationally extensive, only the selective days when user sentiments in Twitter fluctuated significantly (i.e. mean tweet polarity ± 2 × standard deviation), the tweets were analyzed. Then the topic identified from 18 days for creating the word cloud to identify the main concerns during the periods which enhanced user activity and resulted in major fluctuations of sentiments during the period of the elections. For topic modelling, each day top 15 topics were identified. Figure 6 illustrates the word cloud created based on the popularity of 15 topics across 18 days each, to visually present the dominance among emerged topics. Trump has 17.6 million followers on Twitter with 34,160 tweets whereas Hillary Clinton has 11.7 million followers with 9,838 tweets. From this statistics it can be said Donald Trump had more reach than Hillary Clinton. However Fig. 6 still indicates that Twitter users are more frequently discussing about Clinton rather than Trump. Wikileaks appeared to have played an important role in the discussions surrounding around Clinton. Despite such popularity, the final outcome may be attributed to the nature of popularity in such discussions, which may have polarized the citizens of USA.

Fig. 6.
figure 6

Polarity analysis of USA election over 18 weeks

5.6 Personal Events

This factor refers to all events which had happened in the past of the candidate and can cause the voter to switch his/her voting preference. Some of the personal events surrounding Hillary Clinton which may had influence the voter behavior in negative sense are like deleting emails using BleachBit, WikiLeaks releasing the information regarding the governance of Hillary Clinton, FBI had released detailed interview notes of investigation of Hillary Clinton’s email practices, USA WTFM, had declared Hillary Clinton as an insider. Trump in contrast did not hold a government post before winning the election, and such influence based on social discussions were not available.

5.7 Epistemic Issues

This factor refers to the issues raised by candidates to change the pace of the time and bring something new and different. The issues which raises the curiosity of the voters also come under these. In Fig. 3, the analysis highlighted that hashtag #maga contains the highest frequency among all the other hashtags which relates to the nationalist campaign “Make America Great Again”. Other famous campaigns drive by Donald Trump was “Big League Truth” and “Drain The Swamp” were also popular. In contrast to this #strongertogether was launched by Hillary Clinton motivating the citizens to unity and fight against social issues, had much lesser popularity among followers. While Fig. 5 illustrates Trump’s campaign got social support, Clinton’s campaign did not get too much social support from Twitter retweets and mentions.

6 Concluding Discussion

Our study highlights some of the reasons which may have affected the outcome of USA elections. The study help us in understanding the possible reasons for polarization of voters among the Twitter users during the USA election. It helps us to identify the popular hashtags, @mention and the various domains influencing the voter’s behavior on Twitter. However, the analysis of tweets highlights that the election outcome may have been strongly polarized by the presence of debates and opinion leaders. The study also helps us in examining the reactions of the users towards news evolving over the period of the elections. Despite Clinton having much more visibility in terms of interaction, the outcome of the election was effected by the nature of visibility and the resonance the voters had with her content. It appears that the campaigns of Clinton failed to gain popularity, though Trump’s campaign gathered significant support, in terms of their presence in the descriptive analytics of hashtags, @mentions and word-cloud built of topics created. More than the campaigns and their outcome, Clinton also appeared to have spoken more about her competitor which was strongly contrasting for Trump who focused more on his policies and their outcome. Further, interestingly, as highlighted in Fig. 6, massive fluctuations in social activity happened when voters predominantly talked about Clinton, rather than any other topics.

However this study is still descriptive and may be further extended to explore the dynamics of verified and non-verified information which may be have been discussed at length in the USA elections, which may have polarized the outcome. The study also signifies in today’s world Twitter handle plays the great role in the success of the election. The limitation of the study is if Twitter user gets influence by any other external events rather than Twitter discussion than that cannot be mapped. Further only if users contribute to the hashtag or topic directly, the discussion may be analyzed.