1 Introduction

Assessing information quality is a challenging task. Assuming a minimal definition of information as ‘data + semantics’, assessing its quality means to establish fitness for purpose for a given piece of information. Given the huge number of possible purposes and to make its computation feasible, information quality is often broken down into ‘dimensions’ [13], like accuracy, precision, completeness. Despite its complexity, humans deal with quality on a daily basis using heuristics to approximate ideal values and using them as a proxy for deciding whether to trust information or not. Notwithstanding the possibility of being deceived by our heuristics, a formalization of such strategies is a useful tool for understanding and prediction. We provide here a framework to mimic such strategies and a relative reference system of sources. When an oracle or fact-checking service is available, such a reference system can be turned into an absolute one, i.e., determining which sources are veracious and which not. Otherwise, our result will still provide a relative ranking of the importance of sources. This task relies on providing appropriate understandings of trust and trustworthiness.

Among the large number of its definitions in the literature, for our purpose trust on contents can be minimally identified with the result of a consistency assessment: a piece of information consistent with the agent’s current set of beliefs or knowledge base is trusted when it allows to preserve other information considered truthful. This approach requires a methodology to deal with inconsistent information and it calls upon the problem of assessing source trustworthiness. The logic \(\mathtt {(un)SecureND}\) [20] provides a mechanism to deal with this aspect through the introduction of separate protocols to deal with failing consistency. An agent A reading a piece of information \(\phi \) from an agent B, where \(\phi \) is inconsistent with A’s knowledge base, has two possibilities: (1) distrust: to reject \(\phi \) and preserve \(\lnot \phi \) and its consequences; and (2) mistrust: to remove \(\lnot \phi \) from her profile and to accept \(\phi \). \(\mathtt {(un)SecureND}\) does not have a selection mechanism for either form of negated trust. In real case scenarios, the choice between distrust and mistrust will be determined by evaluating the source. While trust is the mechanism to establish admissible consistent information, we call trustworthiness the assessment quality on sources. We introduce an ordering function and several decision strategies aiming at providing computational mechanisms to mimic the subjective quality assessment process called trustworthiness. Through any of these mechanisms, A can decide whether the estimated trustworthiness of B is high enough to trust the new information \(\phi \). Consider a simplified scenario, with a finite set of sources sharing information on a common topic and referencing each other (to a lesser or greater degree): some of them will be in conflict and some will be consistent with one another. We identify three dimensions:

  • Knowledgeability: the number of sources to whom a source B refers. This value is used as an indicator of B’s knowledge of other views;

  • Popularity: the number of sources referring to B. This counts the number of inbound links, and it does not involve their polarity. Citing a source, even to attack it, is seen as an indication of the popularity of the latter;

  • Reputation: the proportion between positive and negative evaluations of B.

These dimensions are used for assessing the trustworthiness of B, to compare contradictory sources by a receiver, and to formulate decision strategies.

The paper continues as follows. Section 2 describes formal preliminaries, Sect. 3 describes the different strategies available to resolve the presence of contradictory contents, Sect. 4 translates these strategies in implementable rule-based protocols, Sects. 5 and 6 present and discuss a use case implementation of the proposed logic. Section 7 surveys related work, and Sect. 8 concludes.

2 Formal Preliminaries

Consider a set of sources \(\mathcal {S}\) and a (possibly partial) order relation \(\le _{t}\) over sources \(\mathcal {S}\times \mathcal {S}\) expressing source trustworthiness; once defined, this is used as a proxy to establish trust in contents in the rule-based semantics presented in Sect. 4. We define the trustworthiness order \(\le _{t}\) as a function over three dimensions: reputation, popularity, and knowledgeability.

Reputation is an order relation \(\le _{R}\) over sources \(\mathcal {S}\times \mathcal {S}\): intuitively, \(S\le _{R} S'\) means that source \(S\in \mathcal {S}\) has at least the same reputation as \(S'\in \mathcal {S}\). For simplicity, reputation is evaluated on the following criteria:

  • we denote with \(w(S)_{S'}\) a fixed weight of S received by \(S'\);

  • \(w=\{1,-1\}\), respectively for a positive and a negative assessment;

  • we denote each \(w(S)_{S'}=1\) as pos and each \(w(S)_{S'}=-1\) as neg;

  • for any source \(S\in \mathcal {S}\), a reputation assessment r(S) by other sources in \(\mathcal {S}\) is

    $$r(S)=\frac{|pos|+1}{|pos|+|neg|+2}$$

We note that instead of computing the simple ratio of positive assessments over the total number of assessments, we add a smoothing factor like in Subjective Logic [15]. This allows us to represent assessment as performed in a ‘semi-closed world’: we base ourselves on the evidence at our disposal, but our sample is limited. The smaller our sample, the more the resulting reputation will be close to the neutral prior 0.5, since no prior knowledge is available to believe the source is fully trustworthy or untrustworthy. The larger our sample, the more the weight of the sample ratio will count on the reputation estimation. On the basis of the reputation assessment, we establish the corresponding order on \(\mathcal {S}\):

Definition 1

(Reputation). For any \(S, S'\in \mathcal {S}, S\le _{R} S' \leftrightarrow r(S)\ge r(S')\)

A second-order relation \(\le _{P}\) over sources \(\mathcal {S}\times \mathcal {S}\) is defined: intuitively, \(S\le _{P} S'\) means source S has at least the same popularity as \(S'\), where popularity reflects the number of sources which refer to S. We denote the referenced sources as \(outbound\_links\) and the referencing sources as \(inbound\_links\); non-referenced or non-referencing sources are denoted as \(missing\_links\). Note that \(\forall S,S'\), if \(S\in outbound\_links(S')\) and \(S'\in outbound\_links(S)\), we can assume both sources have explicit knowledge of each other’s information. We assume this fact and express that \(S'\) reads from S (or alternatively that S writes to \(S'\)) as \(S'\in outbound\_links(S)\). Note that in the calculus presented in Fig. 1 these access operations are explicit. By our definition of reputation, we can assume that for every source S referenced by \(S'\), \(w(S)_{S'}\in r(S)\). Hence, the popularity of S is

$$p(S)=\frac{|inbound\_links|+1}{|inbound\_links|+|missing\_links|+2}$$

On its basis, we establish the corresponding order on \(\mathcal {S}\):

Definition 2

(Popularity). For any \(S, S'\in \mathcal {S}, S\le _{P} S' \leftrightarrow p(S)\ge p(S')\).

Finally, we define a third order relation \(\le _{K}\) over sources \(\mathcal {S}\times \mathcal {S}\): intuitively, \(S\le _{K} S'\) means that source S has at least the same knowledgeability as \(S'\), where knowledgeability reflects the number of sources to which S refers. For simplicity, given the definition of p(S) based on r(S), knowledgeability k(S) is the inverse of p(S), computed as

$$k(S)=\frac{|outbound\_links|+1}{|outbound\_links|+|missing\_links|+2}$$

On its basis, we establish the corresponding order on \(\mathcal {S}\):

Definition 3

(Knowledgeability). For any \(S, S'\in \mathcal {S}, S\le _{K} S' \leftrightarrow k(S)\ge k(S')\).

The highest value of knowledgeability corresponds to the totality of the available sources. For simplicity, we include in this count the source itself:

Definition 4

(Source Completeness). A source S satisfies source completeness if \(|outbound\_links|=|\mathcal {S}|\).

The three dimensions of reputation, popularity, and knowledgeability establish a generic computable metric on the trustworthiness of a source S:

Definition 5

(Source Trustworthiness). Source trustworthiness is computed

$$t(S)=\varPhi (\phi (r(S)),\psi (p(S)),\xi (k(S)))$$

with \(\varPhi \) a given function and \(\phi ,\psi ,\xi \) appropriate weights on the parameters.

The choice of \(\phi ,\psi ,\xi \) is essentially contextual, as it determines the role that each parameter has in the computed value of t(s), e.g. to stress knowledgeability as more important than popularity, or reputation as more relevant than knowledgeability. Fixing these parameters to 1 provides the basic evaluation with all equipollent values. \(\varPhi \) can be interpreted e.g. as \(\sum X\), \(\overline{X}\), max(X): again, this choice can be contextually determined.

To distinguish between different semantic strategies for information conflict resolution, we first weight the notion of source trustworthiness with respect to source order and calculate an average value.

Definition 6

(Sources with Higher Trustworthiness). Let \(\mathcal {S}^{\sim }_{<_{t}S}\) denote the set of sources with higher trustworthiness \(<_{t}\) than a given source \(S\in \mathcal {S}\).

We now partition this set as follows: we denote with \(\mathcal {T}\) the subset of \(\mathcal {S}^{\sim }_{<_{t}S}\) such that \(\forall S'\in \mathcal {T}\), \(S'\) trusts information \(\phi \); we denote with \(\mathcal {T_{\bot }}\) the complement of \(\mathcal {T}\).

Definition 7

(Weighted Trustworthiness). Average trustworthiness of \(\mathcal {T}\) is

$$t(\mathcal {T})=\frac{\sum ^{|\mathcal {T}|}_{\forall S'\in \mathcal {T}}t(S')}{|\mathcal {T}|}$$

Let \(t(\mathcal {T}_{\bot })\) denote the average trustworthiness for the complement partition. If \(t(\mathcal {T})>t(\mathcal {T}_{\bot })\), then S trusts \(\phi \), else S trusts \(\lnot \phi \).

In the case of weighted trustworthiness there is a possible parity outcome: either the selection of a different strategy (e.g., the simpler majority trustworthiness) or a random assignment is possible. Finally, on the basis of the trustworthiness assessment, we establish the corresponding order on \(\mathcal {S}\):

Definition 8

(Trustworthiness). For any \(S,S'\in \mathcal {S}, S\le _{t} S' \leftrightarrow t(S)\ge t(S')\).

Note that the general definition allows for a partial order, as it is possible that the trustworthiness values of two distinct sources be equivalent or incomparable. The following resolution strategies assume that a strict order is being obtained.

3 Trustworthiness Selection Strategies

We define several strategies to implement negative trust based on the Trustworthiness relation defined in Sect. 2. Recall that distrust requires an agent to reject incoming contradictory information in favor of currently held data. In this context, we establish such a choice on the basis of higher trustworthiness.

Definition 9

(Distrust). Assume \(S<_{t}S'\), \(S\in outbound\_links(S')\). If \(S'\) trusts \(\phi \) and \(\phi \) is inconsistent with the profile of S, then S distrust \(\phi \) and trusts \(\lnot \phi \).

With this protocol in place, a source with a higher trustworthiness will always reject incoming contradictory information from a lower ranked source. It is also fair to assume that where \(t(S)=t(S')\), a conservative source S will not change its current information. The process of modifying currently held information to accommodate for newly incoming one (mistrust) starts therefore on the assumption that the source of incoming information has lower trustworthiness degree than the receiver. On this basis, implementing a mistrust strategy has a complex dynamic: the user can be more or less inclined to a belief change and it can require more or less evidence for it to happen. Therefore, different strategies can be designed. One strategy requires that a majority of agents with higher trustworthiness agree on the new incoming data. A stronger strategy requires that the totality of agents with higher trustworthiness agree. Reaching the desired number of agents to implement a mistrust strategy might be a dynamic process resulting from a temporally extended analysis of the set of sources. We design the different strategies assuming Definition 6 of the subset \(\mathcal {S}^{\sim }_{<_{t}S}\) of sources with higher trustworthiness as the sources which the receiver S has to consider.

The weakest strategy is defined by an agent which allows for a mistrust operation based on the presence of at least one source with higher reputation that contradicts her current belief state:

Definition 10

(Weak Trustworthiness). If \(\exists S'\in \mathcal {S}^{\sim }_{<_{t}S}\) such that \(S'\) trusts information \(\phi \), then S trusts \(\phi \).

To accommodate a contradicting \(\phi \), the source S has to modify the current set of belief, \(\varGamma \), to some subset \(\varGamma '\) which can be consistently extended with \(\phi \), i.e. removing any formula implying \(\lnot \phi \). A stronger strategy is for the agent to accept the content on which the majority of sources with higher trustworthiness agree:

Definition 11

(Majority Trustworthiness). Assume \(\mathcal {T}\subseteq \mathcal {S}^{\sim }_{<_{t}S}\) such that \(\forall S'\in \mathcal {T}\), \(S'\) trusts information \(\phi \). We denote with \(\mathcal {T_{\bot }}\) the complement of \(\mathcal {T}\). If \(|\mathcal {T}|>|\mathcal {T_{\bot }}|\), then S trusts \(\phi \), else S trusts \(\lnot \phi \).

In the case of a parity outcome, either the selection of a different strategy or a random assignment are possible. Note that the above strategy does not account for the order within the subset \(\mathcal {S}^{\sim }_{<_{t}S}\): it only partitions it according to the truth value of a formula and then selects the partition with higher cardinality. A more refined majority strategy will weight each member \(S'\in \mathcal {T}\) and \(\mathcal {T_{\bot }}\) on the basis of their trustworthiness value \(t(S')\). Then an average value will be assigned to the corresponding partition and the strategy will select the formula held by the partition with a higher value. If the cardinality of the partition has to be considered, the sum of the trustworthiness values of the sources can be assigned to each partition. The strongest strategy requires the agent to change her mind if all other agents with higher trustworthiness agree:

Definition 12

(Complete Trustworthiness). If \(\forall S'\in \mathcal {S}^{\sim }_{<_{t}S}\), \(S'\) trusts information \(\phi \), then S trusts \(\phi \).

The Majority and Complete Trustworthiness strategies above have a strong effect on knowledge diffusion in the presence of full communication. The Consensus rule below holds even if the content from the most trustworthy source is not initially held by the majority of agents.

Proposition 1 (Consensus)

Assume \(S'\in outbound\_links(S)\) holds \(\forall S<S'\in \mathcal {S}^{\sim }\). Then \(\mathcal {S}\) converges towards consensus on the information trusted by the most trustworthy source.

4 Rule-Based Semantics for the Strategies

The natural deduction calculus \(\mathtt {(un)SecureND}\) [20] defines trust, mistrust and distrust protocols according to the informal semantics described in Sect. 1. It formalizes a derivability relation on formulas from sets of assumptions (contexts) as accessibility on resources issued by sources. In this section, we provide an extension of the calculus with a rule-based implementation of the trustworthiness selection strategies from Sect. 3.

Definition 13

(Syntax of \(\mathtt {(un)SecureND}\) ).

$$\begin{aligned} \begin{array}{l} \mathcal {S}^{\sim }:= \{A<_{t} B<_{t} \dots <_{t} N\}\\ BF^{S}:= a^{S}\mid \phi ^{S}_{1}\rightarrow \phi ^{S}_{2}\mid \phi ^{S}_{1}\wedge \phi ^{S}_{2}\mid \phi ^{S}_{1}\vee \phi ^{S}_{2} \mid \bot \\ mode:= Read(BF^{S})\mid Write(BF^{S})\mid Trust(BF^{S})\\ RES^{S}:= BF^{S}\mid mode\mid \lnot RES^{S}\\ \varGamma ^{S}:=\{\phi ^{S}_{1}, \dots , \phi ^{S}_{n}\}\\ \end{array} \end{aligned}$$

Every \(S\in \mathcal {S}\) is a content producer which has a trustworthiness value based on its interactions with any other \(S'\in \mathcal {S}\). Any \(S\in \mathcal {S}\) is ordered with respect to the others by the trustworthiness order.Footnote 1 Formulas in the set \(BF^{S}\) express content produced by source S and they are closed under logical connectives. Functions on contents in the set mode refer to reading, writing and trusting formulas. Every source S is identified by the set of contents it produces, denoted by \(\varGamma ^{S}\) called the profile of S. A formula expresses access from a source S to content issued by another source \(S'\) (metavariables \(S,S'\) are substituted by variables AB):

Definition 14

An \(\mathtt {(un)SecureND}\)-formula \(\varGamma ^{A} \vdash RES^{B}\) says that under the content expressed by source A, some content from source B is validly accessed.

The rule-based semantics of the calculus is given in Fig. 1. Atom establishes derivability of formulas from well-formed contexts and under consistency preserving extensions. We use the judgment \(\varGamma \!:\!profile\) for a profile consistently construed by induction from the empty set. For brevity, we skip here the introduction and elimination rules for logical connectives, see [20] and focus only on the access rules. Differently from other versions of the same calculus, we drop here negation-completeness: a source without access to a content item from another source, will not assume access to its negation, i.e. uncertainty is admissible. \( read \) says that from any well-formed source profile A, formulas from a profile B can be read. \( trust \) says that if a content item is read and it preserves consistency when added to the reading profile, then it can be trusted. \( write \) says that a readable and trustable content can be written. By distrust, source A distrusts content \(\phi ^B\) if it induces contradiction when reading from \(\varGamma ^{A}\) and A has higher trustworthiness than B. Its elimination uses \(\rightarrow \)-introduction to induce write from the receiver profile for any content that follows a distrust operation. This allows \(Write(\lnot \phi ^{B})\) when \(\lnot Trust(\phi ^{B})\) holds. Each of the mistrust rules applies one different strategy from Sect. 3 for a content item \(\phi ^B\) inducing contradiction when reading from \(\varGamma ^{A}\) and A has lower trustworthiness than B. By \(weak\ mistrust\), A accepts \(\phi \) (and removes from its own profile any conflicting information) by the simple presence of B in the set of sources with a higher reputation of A: this formulation is general enough to accommodate for the substitution of B in this condition by any other source that A considers absolutely essential (appeal to authority). \(majority\ mistrust\) requires computing the partitions of the set of sources with higher trustworthiness than A and comparing their cardinality: any content \(\phi \) held by the larger partition will be kept by A (even when this reduces to an application of a distrust rule). In \(weighted\ majority\), the condition is expressed by the higher average reputation of the partition. By \(complete\ mistrust\) the source A requires that every element in the set of sources with higher reputation agrees on \(\phi \). By the rule write, every trusted content can be written.

Fig. 1.
figure 1

The system (un)SecureND: access rules.

5 Evaluation

5.1 Use Case Description

In 2015, a measles outbreak took place in Disneyland, California. This event received much attention online, and a quite strongly polarised discussion followed up the news regarding this event. Public authorities and pro-vaccination sources pointed out the importance of vaccination, and some of them blamed the low vaccination rate as the main reason for this outbreak. On the other hand, the anti-vaccination movement accused the government agencies and the pro-vaccination movement of misinforming the public, since the children involved in the outbreak were vaccinated. Two main factions are at work, the pro and the anti vaccinations. While sources do not always identify themselves as part of one or the other, for many of them it is either clear what their stance is (e.g., when they explicitly ‘attack’ each other), or we can make safe assumptions based on our background knowledge (e.g., by assuming that authorities are pro vaccinations). We have at our disposal a set of assessments of these articles collected by means of user studies involving experts [6]. These assessments cover quality dimensions like accuracy and prediction, and present an overall quality score that is equivalent to the trustworthiness score defined here.

5.2 Data Preprocessing

We select a subset of 10 articles regarding this debate from a corpus of documents regarding the Disneyland measles outbreakFootnote 2. The selection gives a small but diverse set of views on the topic in terms of stance (pro or anti vaccinations) and type of document (news article, official document, blog post, etc.). Provided they all discuss the specific event selected, a clear network of references emerges. However, such a network is rather sparse since a large majority of these sources do not cite each other. As we are interested in capturing their polarity to compute the three trustworthiness dimensions, we reconstruct the network as follows: (1) a source criticizing another source is considered as a negative piece of evidence regarding the reputation of the source mentioned; and (2) a source citing data from another source, even in neutral terms, is considered a piece of evidence regarding the popularity of the source cited. The resulting network of references is represented in Fig. 2 and it illustrates only the relations emerging from the corpus considered, representing a partial view on the real scenario because we derive a source’s trustworthiness using one or more documents published by it as a proxy; the more documents we observe from a source, the better we can assess its trustworthiness value. For example, we estimate the source knowledgeability from the number of citations of other sources. Some sources could be cited only in some articles by the source under consideration. Also, we derive a source’s trustworthiness based on the references it receives from the other sources considered, but we know that the set of sources is limited, and the scenario might change when considering other sources (e.g., the number of citations of currently poorly cited sources could rise). Given these considerations, the smoothing factor added to Definitions 1, 2, and 3, helps to cope with the resulting uncertainty.

Fig. 2.
figure 2

Network of references resulting from the preprocessing of our corpus. Directed arrows indicate positive (continuous line) or negative (dotted line) references.

5.3 Sources Ordering

Based on the network depicted in Fig. 2, and using the formulas presented in Sect. 2, we compute the trustworthiness score for each of the sources in our sample. The trustworthiness score is computed by averaging the reputation, the knowledgeability, and the popularity of the sources, resulting in the scores reported in Table 1. Figure 3 shows a graphical representation of the resulting hierarchy of sources. Since the trustworthiness thus obtained shows a weak correlation (0.2) with the overall scores provided by the users in the user study, we explore alternative ways to aggregate the scores.

Table 1. Trustworthiness scores of the sources considered for our use case. The score is computed by means of a simple average, where each component has the same weight.
Fig. 3.
figure 3

Hierarchical ordering of the sources derived from the scores shown in Table 1

Weighted Trustworthiness. Applying weights to the trustworthiness parameters can yield a different hierarchy. Instead of applying an arbitrary weighing to the scores, we apply linear regression on the parameters, targeting the overall quality scores provided by the users in the study. Once we learn the weights for the parameters, we compute the trustworthiness scores. The resulting scores show a 0.6 correlation with those provided by the users. Moreover, we also run 3-fold cross-validation (split the dataset into 3 parts and, in round, use two parts as a training set for linear regression, and one for validation). For one item only, our model is unable to make a prediction. Excluding such item, the resulting average correlation between predicted and user-provided overall quality is −0.87 (Pearson) and −0.76 (Spearman). We consider these as promising results.

5.4 Applying Trustworthiness Selection Strategies

Here we illustrate how users could apply the selection strategies described in Sect. 3. Figure 4 shows the scenario where the trustworthiness selection strategies are applied. The sources analyzed in the previous step are now shown in white if they present a positive stance with respect to vaccinations, in grey otherwise. C is a new source with an unclear stance that joins the scenario. The stance of C (i.e., whether C trusts vaccines or not) will be determined by comparison with the other sources. Assume that the trustworthiness of C is higher than that of Heavy.com, but lower than the trustworthiness of all the other sources.

Fig. 4.
figure 4

Use case scenario. We adopt the same hierarchy as in Fig. 3. Sources in white trust vaccinations. Sources in grey do not. C denotes an additional source which takes part in the scenario and has not yet a clear stance.

Distrust. When C is confronted with Heavy.com and its lower trustworthiness score, following the distrust rule it will distrust vaccines.

Weak Trustworthiness. Let us follow up on the previous scenario. C now distrusts vaccines. When encountering all the other sources, if the \(\mathtt {weak\ mistrust}\) strategy is applied, C will revise its profile: now C trusts vaccines because of several sources with trustworthiness higher than C trust \(\phi \). Note that \(\mathtt {weak\ mistrust}\) requires at least one source to trust \(\phi \) in order to follow suit.

Majority Trustworthiness. In an alternative scenario, when encountering the other sources, C can evaluate whether to trust \(\phi \) or not based on whether the majority of the sources trusts vaccines. We partition the sources based on vaccines and \(\lnot vaccines\). With any strategy for determining the majority (partition cardinality, average trustworthiness of the sources in the two partitions, sum of the cardinalities in the two partitions), trust in vaccines prevails.

Complete Trustworthiness. When complete trustworthiness is applied, C needs all the sources to agree on vaccines to add it to its profile. Since three sources disagree, by applying this rule, we obtain that C distrusts vaccines.

6 Discussion

The goal of our model is to provide means to mimic human thinking and provide a tool to systematically reason upon sources. The result of such reasoning is a relative reference system of sources. When oracles, fact-checkers, and other sources are available, such a reference system can be turned into an absolute one: if the user knows that a given set of statements is true or false, she can reason about the trustworthiness of the sources incorporating this additional information in the networks. When oracles are not available, the reference system can provide the user with a basis to coherently reason upon the sources she observes.

Frameworks like PageRank and its successors can be considered more evolved and successful alternatives to the present proposal. While PageRank can be applied to one or more networks to rank their sources, our system considers three distinct networks, aggregates them, and can be either extended with other networks or be used as reasoning support as it is. Hence we consider the present a viable complement to existing approaches.

While assessing the veracity of information is not the focal point of our system, the multidimensional approach we take shows promising robustness to possible attacks. Suppose that in an echo-chamber, sources cite each other positively in order to increase their own reputation and popularity. If their citations are limited to the sources in the echo chamber, their knowledgeability (and, thus, their trustworthiness) will necessarily be low. If to remedy this sources start citing others outside the echo chamber, their knowledgeability will rise, but they will also contribute to the popularity of these external sources. Still, vulnerability to the knowledgeability score is possible in sufficiently large echo chambers. Future developments will tackle this aspect more explicitly.

7 Related Work

Assessing the quality of information sources is a long-standing problem largely addressed in the fields of humanities, where specific guidelines and checklists have been proposed to address the issue of “source criticism” [3]. Such work has also been extended to Web sources in [6, 7], where a combination of crowdsourcing and machine learning is adopted. Those works are complementary to the present contribution since they do not compare directly the references among sources. Counting links for a source as employed in this paper aims at mimicking the evaluation of the bibliography mentioned in the source criticism checklist. Another framework based on crowdsourcing is presented in [17].

Using fitness for purpose to assess information quality is a widely adopted strategy, see [12, 13]. In the present work, we start from the assumption that where it is unclear or impossible for an agent to distinguish between contradictory data, source assessment based on trustworthiness is a valuable strategy. We show how such a protocol can be implemented through different selection strategies. A related topic is the one of fake news, tackled for instance in [4, 25].

Research on trust in computational domains has been extensive in the last decades. Crucial aspects of the behavior of trust concern properties like propagation and blocking [8, 10, 14, 16]. Solutions to these problems are various [2, 9, 11]. In the present work, we evaluate trust in information sources not on an absolute scale, but rather with varying degrees. A related approach is presented in [19], where a trust measure on agents is combined with the use of argumentation for reasoning about beliefs. Similarly, we propose a trust evaluation of sources to decide which information to maintain. The logic used in this work originates from a model designed to model trust in resource access control scenario, and to be able to block trust transitivity by design [21, 23]. The logic has been applied to the Minimally Trusted Install Problem software management in [5], its negative counterpart [22], and tested to investigate optimal strategies to minimize false information diffusion [24]. For other accounts of negative trust, see [1, 18].

8 Conclusion

In this paper, we presented an extension of \(\mathtt {(un)SecureND}\), a logic modeling trust on information, with strategies for assessing the trustworthiness of sources as a function (average or otherwise) of their knowledgeability, popularity, and reputation, possibly weighted. We evaluated this extension on a real-life case study on the trustworthiness of Web sources and applied the selection strategies to the resulting source hierarchy. We showed that a linear combination of these parameters presents a decent correlation with user-provided assessments.

We plan to extend this work in two main directions. First, we will work on the automation of the preprocessing phase. We expect to use natural language processing for this and, in particular, author attribution to systematically identify references among the sources, and textual entailment to capture the perspectives taken by the different sources. Second, we will improve the parameters considered for assessing the trustworthiness. For instance, knowledgeability will have to be assessed based on the estimated level of the truthfulness of the statements made by the source. We plan to run an exhaustive user study to guide the design of source trustworthiness assessment and selection. Lastly, we will experiment with network centrality measures as alternative indicators for these parameters.