FormalPara Overview

This chapter highlights and discusses the potential role of social media in preventing and fighting infectious diseases and summarizes the advantages and limitations of social media and Internet based data for public health surveillance in general and infectious diseases in particular, leading towards the identification of gaps that still require further research and improvement.

1 Introduction

Major public health threats have been on the rise over the past 50 years. Specifically, infectious diseases such as the AIDS epidemic in the late 1970s, followed by the Severe Acute Respiratory Syndrome (SARS-COV) in Asia (2002–2003), pandemic H1N1 worldwide in 2009, Middle East Respiratory Syndrome (MERS-COV) in Saudi Arabia (2012), the re-emergence of Ebola virus in Africa (2014), and the Zika virus in 2016. All these are examples of infectious diseases which are difficult to control, as they require real-time reporting to alert relevant surveillance systems. Social media and internet based data has shown to play a role in sending early warning signals to public health authorities to take an informed course of action to prevent and control the spread of such diseases.

This noticeable increase in new emerging infectious diseases has led to several challenges facing public health officials operating at different levels, locally, nationally and globally, to take the right action at the right time. Addressing such challenges in the era of Information Technology (IT) has called for effective use of new emerging technology development and approaches towards detecting, tracking, reporting, forecasting, and improving early warning systems and proper response (Milinovich et al. 2014c). Nevertheless, most infectious diseases are still being traditionally monitored by passive and reactive public health surveillance systems, which are officially maintained by national public health authorities. These traditional surveillance systems depend, to a large degree, upon data submitted to public health authorities by health professionals such as hospital physicians, laboratories, public health practitioners, and other health-care providers.

Due to the common problems of time and resource constraints and the lack of operational knowledge of traditional surveillance reporting systems, this negatively affects the timeliness of event reporting. Incomplete reporting leads to no detection of public health threats inclusive of missed on-time detection of infectious diseases. Moreover, the substantial lags between the occurrence of an event and its official notification is common among public surveillance systems due to a number of reasons that govern traditional surveillance systems performance, including the rigid hierarchical structure and verification process dealing with receiving and notification of infectious and communicable diseases. These reasons, among others, led to late or failed reporting of the event to those in charge to take the right action for containment and control of the spread of infectious diseases.

Social media and Internet-based data seem to play a pivotal role in improving real-time reporting in informing both the public and governments about the possible public health threats of infectious diseases (Velasco et al. 2014; Milinovich et al. 2014a, c, 2015). It has been reported that there is, on average, a lag time of at least 2 weeks from receipt of the infectious disease event to dissemination of the data by traditional surveillance systems (Cheng et al. 2009) while the availability of data on the internet and social media has been viewed as playing a better role in accelerating the process of informing both the public and governments about any looming public health threat, especially relating to infectious diseases. We anticipate that these new technological improvements could provide a new platform to help improving the quality of detection and reporting of infectious diseases threats, utilizing the sensitivity and timeliness capabilities of digital-based surveillance systems.

2 Basic Terms and Concepts

Social Media has been defined by Kaplan and Haenlein in 2010 as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 (e.g., Twitter, Facebook, YouTube) that allow the creation and exchange of User Generated Content” (Kaplan and Haenlein 2010). That is, social media is comprised of computer-mediated platforms depending on mobile and web-based technologies which allow people to create, share or exchange information, ideas, and pictures/videos in virtual communities and networks. Infectious diseases are the communicable diseases caused by different types of pathogenic microorganisms, such as bacteria, viruses, parasites or fungi (WHO 2015). These types of diseases can spread, directly or indirectly, from one person to another. One of the current emerging sources of infectious diseases is the zoonosis of animals that can cause disease when transmitted to humans. Disease surveillance systems have been defined as the ongoing systematic collection and analysis of data and the provision of information which leads to measures being taken in order to prevent and control the spread of the disease (MedicineNet 2015). The disease surveillance systems primarily aim to detect, prevent, control and eradicate sporadic cases and outbreaks, including endemic, epidemic and pandemic, and other public health threats related to biological (viruses, bacteria, parasites, and their toxins) and chemical agents as well (Bernardo et al. 2013). Digital surveillance is an internet-based surveillance system which attempts to provide real-time knowledge of public health issues by analyzing information stored digitally (Milinovich et al. 2014b, c). There are now several digital systems being used consisting of non-structured, event-based, and digital data to enhance disease detection and public health responses (Anema et al. 2014; Brownstein and Freifeld 2007; Brownstein et al. 2009; Freifeld et al. 2008; Olson et al. 2015; Sturtevant et al. 2007).

2.1 Social Media-Based Surveillance and Infectious Diseases

Infectious disease surveillance is an epidemiological practice by which the incidence, prevalence and spread of infectious diseases are monitored in order to establish patterns of progression and activate measures of management and control. The main role of infectious disease surveillance is to predict, observe, and minimize the harm caused by outbreaks, epidemics, and pandemic situations, as well as increase knowledge of both practitioners and the public about which factors contribute to such circumstances (Choffnes et al. 2007). Reporting incidences of disease outbreaks has been transformed from manual record keeping to instant worldwide internet communication (Brownstein et al. 2009).

Timely identification of infectious disease outbreaks is critical, both for effective initiation of public health interventions and control measures, as well as the timely alerting of government agencies and the general public at large. Surveillance capacity for such detection can be expensive, and many countries lack the public health infrastructure to identify outbreaks at their earliest stages. The Internet is revolutionizing how epidemic intelligence is activated, and it offers solutions to some of these challenges. Social media and freely available web-based sources of information may allow us to detect disease outbreaks earlier with reduced cost and increased reporting transparency (Wilson and Brownstein 2009). Furthermore, the search and exchange of health information on the Internet and social media has been viewed as an opportunity to improve public health surveillance (Velasco et al. 2014; Kass-Hout and Alhinnawi 2013), and to monitor and predict emerging infectious diseases (Milinovich et al. 2014a, c). The diagram above summarizes the key advantages and characteristics of social media-based surveillance and infectious diseases.

figure 1

Sources: Adapted from Bernardo et al. (2013)

3 Social Media and Internet-Based Data

Over the past two decades, the Internet has become an integral component of traditional public health surveillance. Systems using informal electronic information have been credited with reducing the time to recognition of an outbreak, preventing governments from suppressing outbreak information, and facilitating public health responses to outbreaks and emerging diseases. A huge amount of real-time data and information about infectious disease outbreaks is found in various forms of web-based data streams (Brownstein et al. 2008; Freifeld et al. 2008; Brownstein and Freifeld 2007). The web-based data sources exist outside traditional reporting channels; therefore, they are invaluable to public health agencies that depend on timely information flow across national and subnational borders. These information sources, which can be identified through Internet-based tools and social media, are often capable of detecting the first evidence of an outbreak, especially in areas with a limited capacity for public health surveillance (Yang et al. 2013). For example, the World Health Organization’s Global Outbreak Alert and Response Network rely on reporting data for day-to-day surveillance activities (M’Ikanatha et al. 2004, 2006). The figure above shows the percentage of population who use the Internet by country.

figure 2

Source: Lancet Infect Dis 2014; 14: Page 156

The Internet and the use of social media are becoming a critical medium for clinicians, public health practitioners, and people seeking health information (Bhatti 2015; Eke 2011; Hartley 2014). Data regarding diseases and outbreaks are disseminated not only through online announcements issued by government agencies, but also through informal channels, including press reports, blogs, chat rooms and analyses of web searches. Collectively, these sources provide a view of global health that is fundamentally different from that yielded by disease reporting done via the traditional public health infrastructure (Brownstein et al. 2008).

The Internet provides a platform to develop efficient, sustainable online resources for patients to research their medical questions, communicate with one another, and support each other, such that patients assume more responsibility for their care and decrease the burden on the health care system. A number of online communities have been developed by patient organizations, providers, and nonprofit organizations. Such online communities are virtual forums where patients can discuss their health concerns and exchange information. Participation in online communities heightens levels of emotional well-being, perceived control over disease, overall personal empowerment, and level of public awareness and medical knowledge (Wicks et al. 2010).

The large online population creates a vast network composed of individuals reporting on their activities, their social interactions, and the events around them. These colossal data chunks stream in real time, and are often annotated with context including GPS location, relationships, and images. Extensive data analytics and data mining of social media have been suggested for many applications, such as marketing and financial prediction (Kautz 2013). Recently, researchers have begun to leverage sensor network for public health: preventing, detecting and fighting infectious diseases. Researchers have shown, for example, that Twitter postings can be used to track and predict influenza (Krieck et al. 2011; Sadilek et al. 2012). Such work provides evidence that social media can provide data that helps identify early warning for public health threats.

Some researchers have explored augmenting the traditional notification channels about a disease outbreak with data extracted from Twitter. By manually examining a large number of tweets, they showed that self-reported symptoms are the most reliable signal in detecting if a tweet is relevant to an outbreak or not. Researchers have also tried capturing the overall trend of a particular disease outbreak, typically influenza, by monitoring social media (Culotta 2010). Other researchers focus on more detailed modeling of the language of tweets and their relevance to public health in general (Paul and Dredze 2011) and to influenza forecasting in particular (Paul et al. 2014; Broniatowski et al. 2013).

Extensive data analytics and data mining are concerned with finding models and patterns from the available data. They include predictive algorithms, which result in models that can be used for prediction and classification, and descriptive algorithms for finding interesting patterns in the data, like associations, clusters and subgroups (Lavrac et al. 2007; Corley et al. 2010; Yang et al. 2013). The results of extensive data analytics and data mining need data visualization to enhance the presentation and communicate information clearly and efficiently to users via statistical graphics, plots, information graphics and charts. Effective visualization helps users in analyzing and reasoning about data and evidence. It makes complex data more accessible, understandable and usable. Data visualization is both an art and a science. Processing, analyzing and communicating the vast amounts of social media data for public health applications present a variety of ethical and analytical challenges for data visualization (Friendly & Denis 2001).

4 Examples of Using Social Media and Internet-Based Data on Infection Diseases

4.1 Google Trends

With the emerging trends of infectious diseases worldwide along with the emerging trend of the Internet being used in a growing number of ways, a number of tools have been developed for the surveillance and discovery of new diseases. Moreover, traditional surveillance systems are not growing proportionately, thus with the rise in the usage of social media, efficient methods are being built for the purpose of internet based monitoring of diseases like Dengue, Influenza, Ebola and others.

Google Trends (GT) is one of the tools, which aims to provide updated recent time data by analyzing search engines with news, websites, images, twitter and YouTube, etc. Google Trend estimates the proportion of keywords from different search engines to estimate the search performed by using Google and relates those keywords and Google results (Nitu et al. 2014). It provides relative search volume (RVS) which is defined as “the query share of a particular term for a given location and time period, normalized by the highest query share of that term over the time-series” (Nuti et al. 2014). Studies have also shown in the past, that GT highly correlates with the incidence of particular infectious diseases, thus the reliability of GT is sound. (Althouse et al. 2011; Ginsberg et al. 2009).

The primary advantage of online surveillance systems are their speed in identifying early warning signals. (Althouse et al. 2011; Chan et al. 2010; 2011; Chunara et al. 2012).

The idea on which Google Trends was built is that during an epidemic of any infectious disease, many people search for that particular disease on the Internet. Google Trends shows that there is a relationship between how many people search for a disease related activity and how many people are suffering from that disease. It has been reported that searched diseases by internet users appeared because of the pattern between online searchs and people suffering from that disease. When compared with the traditional surveillance systems, they found that people search more about a particular disease during an epidemic of the said disease. Thus, this pattern causes the emergence of Google Trends that counts how often people search for particular infectious disease related activity. Through this they estimate the burden of that particular disease in different countries.

5 Advantages of Social Media and Internet-Based Data

Utilizing social media in the prevention and control of infectious diseases can be cost-effective. Social media has the unique advantage of being rapid and can be updated in a timely manner. If properly and scientifically utilized, social media can provide a sensitive and user friendly tool to monitor the distribution and determinants of epidemics both locally and at a global level. For example, systems like GT, present a novel and free tool that allows users to search for information with ease on the internet. This tool can also provide useful clues for understanding characteristics of the disease and the health-related behavior that influences its occurrence (Nuti et al. 2014). The use of social media in infectious disease epidemiology is, however, a new method of surveillance that is emerging. However, with advances in its use, it has the potential of providing an accurate and rapid estimation of the progression of diseases within communities. In addition, social media can be a valuable tool in providing values in distinct climatic and socio-economic context (Gluskin et al. 2014). Most developing countries do not have a periodically maintained and updated surveillance system for infectious diseases. Countries that own a functioning surveillance system suffer from delay in reporting and many sentinel sites miss out reporting of cases annually or periodically. Lack of resources in most developing countries also hinders communication, training of staff and provision of proper equipment (Madoff et al. 2011). Thus systems like GT, social media and other internet sources can provide a rapid method of surveillance that predicts the real time burden of disease and hence can guide preventive and curative strategies. Systems like GT could also work as a complementary system along with the traditional surveillance system in countries which have already established such systems.

6 Limitations of Social Media and Internet-Based Data

Understanding limitations for using social media and Internet-based data in infectious diseases surveillance is crucial to its proper use. Firstly, most of the information provided is on social media and the internet not moderated by professionals before it gets disseminated online. Reliability of the data is also questionable as the source of information could be from trusted health specialists or from unofficial sources. Lack of standardization for frequency of updates further causes the problem to exacerbate, as too much information becomes available with questionable authenticity. Furthermore, applying algorithms and proper statistical techniques before making it publically available are not usually achieved. Robust monitoring and evaluation of the quality of data needs consideration for its proper usage of information retrieved from social media. (Velasco et al. 2014)

7 Gaps and Future Research

Few challenges and gaps that still exist and need prime attention include collaboration with statisticians, internet and media experts and computer experts to work on different components collectively. Furthermore, training of epidemiologists for monitoring the spread of infectious disease through social media ensures reliability of the evidence extracted. (Velasco et al. 2014) Moreover, strategies and tools should be formulated to compare the traditional system with the online surveillance system, as some diseases can vary drastically. (Keller et al. 2009) Protection and privacy of data should be kept in mind by public health authorities before utilization. This is becoming more important, particularly when the surveillance tools that processes Internet or social media data are within governmental institutions. (Thompson et al. 2011). Future research should focus on factors that hinder the use of social media and internet-based data by health agencies. Social media, weblogs, scientific forums and other electronic communications also have unforeseen social aspects that need to be studied as it can affect human behavior. This in turn has an influence on the information generated by social media and the Internet in general. (Velasco et al. 2014)

8 Chapter Key Points

This chapter reports on the importance of using social media and the Internet in the fight against infectious diseases. Disadvantages and advantages of data gathered from social media and the Internet for public health use are also discussed. Examples and exploration of tools like GT is also given with its own opportunities and challenges. Future challenges and current gaps are also highlighted in this chapter so that future strategies can be formulated in order to improve contemporary surveillance system.