- Information Provenance
Sources of a piece of information.
- Social Computing
An area of computer science that is concerned with the intersection of social behavior and computational systems (Social Computing).
- Social Media
A group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchanges of user-generated content (Kaplan and Haenlein 2010).
- Data Mining
The computational process of discovering patterns in large datasets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems (Data Mining).
- Social Network
A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors (Social Network).
Misinformation is false or incorrect information that is spread intentionally or unintentionally (without realizing it is untrue) (Misinformation).
Disinformation is intentionally false or misleading information that is spread in a calculated way to deceive target audiences (Disinformation).
- Provenance Paths
Paths of information propagation from sources to terminals.
An information propagation network can be represented as a directed graph G (V, E), where V is the node set and E is the edge set. Each node in the graph represents an entity, which publishes a piece of information on social media. The entity may refer to an individual user or a webpage. A directed edge between nodes represents the direction of information flow. For a given piece of information propagating through the social media, the social provenance informs a user about the sources of a given piece of information. Sources refer to the nodes that first publish the concerned messages.
Provenance has been studied in the data management field. In data management, provenance represents the creator of the data and how data has been modified and transferred. Provenance information is used to determine the authenticity and trustworthiness of information. Provenance is the key to solve the data conflict problem (Moreau 2009). Unlike social media, data propagation can be captured in the data management systems. Social provenance has been introduced in the book (Barbier et al. 2013) and received some attention in recent years (Gundecha et al. 2013a; Ranganath et al. 2013; Feng et al. 2013; Gundecha et al. 2013b; Wu et al. 2016). Shah and Zaman (2011) proposed a centrality-based method to determine the single information source among all known recipients on an undirected network. It assumes that information spread on a network follows the susceptible infected (SI) model. Since this method requires the knowledge of all recipients, it is not practical for social provenance. Also, the source computed using this method is more biased toward higher-degree nodes.
Barbier, in his dissertation (Barbier 2012), proposed a method to collect metadata about the received information. Such metadata is referred as provenance attributes. Provenance attributes can play a vital role in obtaining social provenance. As shown in the dissertation (Barbier 2012), some attribute values are easier to obtain than others and some attribute values may be more valuable to a recipient than others. For example, a political statement published by a political candidate might be assessed with some bias if the recipient knows information about the political candidate, such as political party affiliation or special interest associations. An interesting example of the value of provenance attribute would be to reveal the political affiliation and special interests of an unfamiliar social media user propagating political statements, which may help understand latent motivations for propagating a statement in social media. Fake news and its propagation on social media sites have widely been reported during recent US election (http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html?_r=0, http://www.vox.com/new-money/2016/11/16/13659840/facebook-fake-news-chart). Provenance attributes, including political affiliations and special interest group associations, education, occupation, and demographic attributes, of the nodes involved in the propagation of news articles would have helped recipients to decide quickly the fake news from the real ones. Barbier et. al. (2013) reviews the current research on social provenance and explores exciting research opportunities to address pressing needs. Papers (Gundecha et al. 2013a, b; Ranganath et al. 2013; Feng et al. 2013) show how data mining can enable a social media user to make informed judgments about statements published in social media. The chapter (Wu et al. 2016) proposes few benchmark datasets and evaluation metrics to study the social information problem further.
Key Research Issues in Social Provenance
Social media can help in solving the problem of social provenance due to its unique features: user-generated content (e.g., tweets, blog posts, news articles, etc.), users’ profiles, user interactions (e.g., links between friends, hyperlinks on the blog or news articles), and spatial or temporal information. These features can help reconstruct an information propagation network of a given message, and the network is essential for social provenance.
What are the characteristics of sources such that we can identify a source when we encounter one? It is a challenging task because source nodes are not necessarily those without incoming links in social media networks.
How can we use different parts of social media data for inferring provenance paths? Content, user profiles, and interaction patterns can play complementary roles in backtracking information propagation. As a popular source can lead to a shallow cascade (Leskovec et al. 2009), the study of node centrality measures can be of help.
How can we infer missing links in reconstructing a provenance path with partial information? By the nature of social media, most information is informal and partial. Links can expand the network (i.e., new nodes can be added), and data associated with a node provides more information, though still partial.
How can we limit the search space in the vast land of social media? It is incumbent to develop a scalable solution for the social provenance problem.
What are effective and objective ways of verifying and comparing different approaches to social provenance and provenance path problems? Lack of ground truth constitutes one of the foremost difficulties.
Illustrative Examples and Impact
One of the important applications of social provenance is to find the rumormongers or misinformation centers in social media (Wu et al. 2016). As mentioned in several news recently, misinformation has helped unnecessary fears and conspiracies spread through social media. One such example is related to the Ebola outbreak (http://time.com/3479254/ebola-social-media/). As some potential cases are found in Miami and Washington, DC, some tweets sounded as if Ebola is rampant and some kept tweeting even after government issued a statement to dispel the rumor. The “Assam Exodus” is an another example that illustrates the importance of social provenance. Assam is a large state in the North-East of India and a series of riots broke out in July and August 2012. Following the riots, virulent messages along with misinformation were spread in other parts of India via social media. Bulk text messages (short message services, SMS) and social media sites were extensively used to spread information, aiming to incite certain Indian population against the North-East Indian population. For example, a Wall Street journalist reported that a twitter user used a gory video clip on riots in Indonesia as that of Assam riots (Twitter 2012). Violent messages were also spread on Facebook that incite hatred and vengeance against the North-East Indian population (Facebook). The misinformation as well as virulent messages resulted in deep fear among North-East Indian population, which ultimately led to their exodus from some major metropolitan cities across India, which includes Bangalore, Mumbai, Hyderabad, Chennai, and Pune (Wikipedia 2012). In all of these cases, social provenance might be able to help to find the rumormongers or misinformation sources early and to help stop the viral spread of misinformation.
Knowing the social provenance of a piece of information published in social media – how the piece of information was modified as it was propagated through social media and how an owner of the piece of information is connected to the transmission of the statement – provides additional context to the piece of information. A social media user can use this context to help assess how much value, trust, and validity should be placed on the information.
In early 2010, it was rumored that the Chief Justice of the US Supreme Court was going to retire due to medical reasons. In fact, the Justice had no plans to retire. The statement originated from a Georgetown University Law School class and was meant only to be a teaching point. However, with the availability of the Internet, before the Law professor revealed the falsehood, students in the class had transmitted the statement, which was subsequently published on a news blog (http://www.npr.org/templates/story/story.php?storyId=124371570, http://nymag.com/daily/intelligencer/2010/03/heres_how_the_rumor_that_john.html). Had the social provenance information been made available, recipient users might not have considered the statement credible. In another case, a US Department of Agriculture employee was erroneously fired after information about her appearing in social media was published out of context (https://en.wikipedia.org/wiki/Firing_of_Shirley_Sherrod). Had social provenance information been available, sought out, or examined, it might have prevented an injustice to the employee and embarrassment for the Department of Agriculture. Fake news and its impact on recent US election have widely been reported (http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html?_r=0, http://www.vox.com/new-money/2016/11/16/13659840/facebook-fake-news-chart). Social provenance, if available, would have informed users its credibility.
The social provenance problem presents an unprecedented challenge, and its research progress can pave way for many equally challenging and important issues such as source trustworthiness, information reliability, and user credibility.
- Barbier G (2012) Finding provenance data in social media. Doctoral dissertationGoogle Scholar
- Data Mining. https://en.wikipedia.org/wiki/Data_mining
- Disinformation. https://en.wikipedia.org/wiki/Disinformation
- Facebook. https://www.facebook.com/photo.php?fbid=268506716591158&set=a.247241168 71771349889.247222755386221&type=3&theater. Accessed 17 Dec 2012
- Feng Z, Gundecha P, Liu H (2013) Recovering information recipients in social media via provenance. Short paper, the IEEE/ACM international conference on advances in social networks analysis and miningGoogle Scholar
- Gundecha P, Feng Z, and Liu H (2013a) Seeking provenance of information in social media. Short paper, the 22nd ACM international conference on information and knowledge managementGoogle Scholar
- Gundecha P, Ranganath S, Feng Z, and Liu H (2013b) A tool for collecting provenance data in social media, Demonstration paper, the 19th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
- Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 497–506Google Scholar
- Misinformation. https://en.wikipedia.org/wiki/Misinformation
- Ranganath S, Gundecha P, and Liu H (2013) A tool for assisting provenance search in social media. Demonstration paper, the 22nd ACM international conference on information and knowledge managementGoogle Scholar
- Social Computing. https://en.wikipedia.org/wiki/Social_computing
- Social Network. https://en.wikipedia.org/wiki/Social_network
- Twitter (2012) https://twitter.com/dhume01/status/236321660184178688. Accessed 17 Dec 2012
- Wikipedia (2012) http://en.wikipedia.org/wiki/2012_Assam_violence#Attacks_on_people_from_North_East_Exodus. Accessed 17 Dec 2012
- Wu L, Morstatter F, Hu X, Liu H (2016) Mining Misinformation in Social Media, Big Data in Complex and Social Networks, CRC Press, pp 123–152Google Scholar
- http://time.com/3479254/ebola-social-media/. Accessed Oct 2014
- http://www.vox.com/new-money/2016/11/16/13659840/facebook-fake-news-chart. Accessed Dec 2016
- http://www.npr.org/templates/story/story.php?storyId=124371570. Accessed Dec 2016
- https://en.wikipedia.org/wiki/Firing_of_Shirley_Sherrod. Accessed Dec 2016