Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The security of web communication via the Transport Layer Security (TLS) protocol relies on safe distribution of public keys in the form of X.509 certificates. Certificate authorities (CAs) are trusted third parties that endorse the public keys of subjects by performing checks and issuing certificates. Web browsers can accept certificates from hundreds of CAs, and relying parties are unable to determine whether certificates were issued at the request of the subject or fraudulently issued by the CAs, whether by mistake or due to compromise.

In recent years there have been high-profile cases of misissued certificates being used to spoof legitimate websites. For example, in 2011 an intruder managed to issue itself a valid certificate for the domain google.com and its subdomains from the prominent Dutch Certificate Authority DigiNotar [11]. This certificate was issued in July 2011 and may have been used maliciously for weeks before the detection on August 28, 2011, of large-scale man-in-the-middle (MITM) attacks on multiple users in Iran. In another instance, the Comodo Group suffered from an attack which resulted in the issuance of nine fraudulent certificates for domains owned by Google, Yahoo!, Skype, and others [5].

Certificate Transparency (CT) [17, 18] is an experimental protocol originally proposed by Google and standardized by the Internet Engineering Task Force (IETF) Public Notary Transparency working group to mitigate the threat of fraudulently issued certificates by publicly logging certificates. CT provides an open auditing and monitoring system which allows domain owners to verify that no fraudulent certificates have been issued for their domains. The end goal of Certificate Transparency is that web clients should only accept certificates that are publicly logged and that it should be impossible for a CA to issue a certificate for a domain without it being publicly visible. Recent incidents demonstrated the effectiveness of CT logs: Google employees detected unrequested certificates for two of their subdomains issued by a Symantec sub-CA Thawte [30]. The certificates were issued on September 14, 2015 and detected by September 17, 2015; the certificates were revoked immediately, limiting the exposure of the certificates to just three days. In another case, the Facebook security team discovered an issuance of two certificates on multiple subdomains violating Facebook’s internal security policies [14]. The incident was investigated and both certificates revoked within hours, even before they were deployed to production systems.

1.1 The Web PKI and Certificate Transparency

The basic web public key infrastructure (PKI) includes several types of entities which perform different tasks: web servers, certificate authorities, browser vendors and web browsers. The Certificate Transparency framework adds several new entities which help maintain and monitor public logs:

  • Loggers or log servers maintain publicly accessible append-only logs of certificates. These certificates are received from submitters. As a new entry might not be published immediately for operational reasoning, the logger provides each submitter with a promise to log the certificate within a certain amount of time; the promise is called a signed certificate timestamp (SCT).

  • Submitters, submit certificates (or partially completed pre-certificates) to a log server and receive a signed certificate timestamp from the log.

  • Monitors are public or private services that watch for misbehaving logs or suspicious certificates by periodically contacting and downloading information from log servers. They inspect every new entry in a log, keep copies of the entire log, and verify the consistency between published revisions of the log.

  • Auditors verify the correct behaviour of a log, checking that certificates that a logger has promised to include are present in the log. Auditors may be standalone entities or integrated into monitors or web clients.

In CT, the original entities from the web PKI also have some additional tasks:

  • CAs should act as submitters above.

  • Web servers should include their SCT along with the certificate when communicating with clients. Web servers may choose to submit their certificate to a log server if their CA does not do so for them.

  • Web clients, upon receiving an SCT from a web server, may choose to verify that the log named in the SCT actually has publicly logged the certificate (thereby taking on the role of an auditor as above).

  • Browser vendors may push updates that remove CAs or revoke certificates based on claims from monitors and web servers about misbehaving CAs.

Fig. 1.
figure 1

Overview over the interaction between entities in Certificate Transparency; see Sect. 1.1 for details. Solid-line interactions and solid-line, orange entities are captured by the model in our work while dashed-line interactions and dashed-line, gray entities are not captured. Dotted line–connected entities (monitors and auditors or auditors and web clients) might be the same physical entity. (Color figure online)

Figure 1 provides an overview of the involved parties and their interactions in CTFootnote 1. At the submission of a new certificate entry (step A), the logger returns a signed certificate timestamp (SCT) (step B), which is a promise to include the entry in the log. Every log has a published parameter called a maximum merge delay (MMD) which indicates the maximum period between issuing a timestamp and the inclusion of the certificate into the log.

In CT, the logger stores the entries of the log in an append-only Merkle hash tree [24, 25], a form of a tamper-evident history tree [6, 7]. Recall for Merkle trees, data is placed at the leaves of a binary tree and each intermediate node is the hash of its two child nodes; the root of the trees acts as a fingerprint of all included data. In CT, the root of the tree is signed and published by the logger, and is called the signed tree head (STH). The observed fingerprints are exchanged by all parties in the system through a so-called “gossiping” protocol [27].

Gossiping allows monitors, auditors, and web clients to share information they receive from log servers, with the goal of collectively detecting misbehavior of log servers while limiting the damage to user privacy. The parties who hold the same fingerprints of a log are (cryptographically) assured that they have the same view of the log at the point in time represented by the fingerprint. Gossiping can be implemented through SCT feedback (where web clients send SCTs through HTTPS servers), STH pollination (where web clients and CT auditors/monitors use HTTPS servers as STH pools) and trusted auditor streams (where web clients directly communicate with trusted CT auditors/monitors).

To convince other parties that promised certificates are included in a log, and that subsequent published fingerprints are consistent, the logger employs two types of cryptographic proofs: audit proofs and consistency proofs.

An audit proof allows an auditor to verify that a particular certificate/SCT that a logger has promised to include is actually included in the log represented by a fingerprint, shown in steps C and D. In CT, an audit proof is essentially an authentication path in the Merkle tree from the leaf containing the certificate in question to the root hash/fingerprint contained in the signed tree hash.

A consistency proof allows an auditor or monitor to verify that the log is append-only, in particular that the log represented by a fingerprint at one point in time \(t_0\) is a prefix of the log represented by a fingerprint at a later point in time \(t_1 > t_0\), shown in steps G and H. In CT, a consistency proof is a subset of intermediate nodes in the Merkle tree needed to connect the two root hashes.

Monitors can also request that a logger provides them with the full set of entries represented by a fingerprint (steps E and F). In CT, this can be verified by recomputing the Merkle tree hash of the entries.

As a starting point for a threat model, the informational IETF draft “Attack Model for Certificate Transparency” [15] describes potential attack scenarios when Certificate Transparency is used in the context of web public-key infrastructure.

1.2 Our Contribution

Given the practical significance of Certificate Transparency, it is important to have a formal understanding of the security goals of CT and analyse whether CT achieves those goals. The objective of our work is to define security goals of logging schemes using the formalism of provable security, and attempt to prove that CT satisfies these security goals under suitable cryptographic assumptions. Our model of logging schemes does not assume a PKI context, so we do not assume that log entries must have a particular syntax, and thus we leave the threats involving validity or syntax of log entries to existing analyses on certificate validity. Similarly, we omit consideration of threats where an entity fails to act.

As noted above, we will focus on two particular threats in the CT threat model: whether a misbehaving log server can present different views of the log and whether a misbehaving monitor can frame an honest log server for bad behaviour. Thus, our model will focus on two entities: the logger and the monitor/auditor.

Definition of Logging Schemes. In Sect. 3 we formally define logging schemes, naming operations that each entity can perform. This model does not attach any semantic meaning to the entries being logged; in particular, we do not assume that log entries are certificates. Subsequently, we describe the operations of Certificate Transparency as a specific instantiation of the logging scheme framework.

Security Definitions. Next, we introduce cryptographic security properties for logging schemes in Sect. 4 that are inspired by the CT threat model but reflect the corresponding ideas in general terms. More specifically, we treat two types of properties. First, we define security notions which concern a malicious logger:

  • \(\mathsf {entry\text {-}{}coll}\): can a malicious logger present two different sets of entries corresponding to the same fingerprint?

  • \(\mathsf {proof\text {-}{}coll}\): can a malicious logger present an audit proof that claims a single fingerprint represents both a particular entry as well as a set of entries such that the particular entry is not actually in the list of entries?

  • \(\mathsf {entry\text {-}{}cons}\): can a malicious logger present two fingerprints connected by a valid consistency proof and two sets of entries such that the entries corresponding to the first fingerprint are not a prefix of the entries corresponding to the second fingerprint?

Second, we define a security notion concerning a malicious monitor:

  • \(\mathsf {promise\text {-}{}incl}\): can a malicious monitor frame an honest logger for not including a promised entry when it actually has?

Security of Certificate Transparency. Finally, we analyze the security of Certificate Transparency in Sect. 5 and show that CT both prevents logger misbehaviour (i.e., CT satisfies the \(\mathsf {entry\text {-}{}coll}\), \(\mathsf {proof\text {-}{}coll}\), and \(\mathsf {entry\text {-}{}cons}\) security properties) as well as protection from framing of honest loggers by misbehaving monitors (i.e., CT satisfies the \(\mathsf {promise\text {-}{}incl}\) property.) All of these proofs are based on properties of Merkle tree hashing and audit/consistency proofs, all of which ultimately derive from the collision resistance of the hash function. The last property, \(\mathsf {promise\text {-}{}incl}\), also depends on the unforgeability of the signature scheme used by loggers.

Generality of Definitions. Our definition of a logging scheme and its security properties are not specific to CT, and have the potential to be applied to other constructions. In Sect. 3.3, we discuss the applicability of our definitions to CONIKS [23], a logging scheme aimed at transparency of user keys: our logging scheme definitions capture some aspects of CONIKS, but also highlights important differences between the functionality and goals of CT versus CONIKS.

1.3 Related Work

New PKI Technologies. Recent certificate mis-issuances and security breaches in CAs have motivated research in alternatives to having a trusted third party vouching for the binding between domain name and its private key. Public key pinning [10] and DANE [13] are such proposals that allow domain owners to proactively and directly state their trusted public keys for the domain. Certificate Transparency takes a reactive rather than a proactive approach: instead of preventing mis-issuance in the first place, it aims to detect mis-issuance by making certificates visible through a public authenticated log.

History Trees. The data structures in CT are similar to the history trees of Crosby and Wallach [6, 7]. Two of their results [6] connect with our security notions: their Corollary 1 shows that “reconstructed hashes” that are equal imply the entry sets from which they were constructed are equal, where “reconstructed hashes” can mean reconstructed from the leaves directly (like in a full hash tree computation) or from membership proofs. Their Theorem 1 shows that, given a consistency proof between two roots and a membership proof for the same index to each root (two membership proofs total), the leaves at that index must be the same in both trees; this is similar to our \(\mathsf {entry\text {-}{}cons}\) property, though we focus on entry sets rather than membership proofs. A limitation of Crosby’s results is that they assume that each root was computed from an underlying entry set, but one cannot be sure when the adversary generates roots (as in CT); our definitions make no such assumption. We furthermore capture several extensions that CT makes, including delays for entry inclusion and protection of honest loggers from framing (our \(\mathsf {promise\text {-}{}incl}\) property). Finally, our presentation is notably different: Crosby’s descriptions of the history tree operations and the proofs [6, Sect. 3] are generally descriptive rather than algorithmic, whereas we state the operations fully algorithmically and provide complete algorithmic reductions for all proofs.

In recent years a few more approaches have emerged around the concept of transparency logs, including revocation [19, 29] (which we omit in this work as they are not under consideration by the IETF Public Notary Transparency working group) or limitations on certificate issuance, validation, and update [1, 16]. The Electronic Frontier Foundation’s Sovereign Keys Project [9] combines transparency logs with cross-signing of keys. Melara et al. [23] present CONIKS, a system focusing on key transparency in end-to-end encryption/secure messaging scenarios. CONIKS eliminates the need for global third party monitors and aims at additional privacy properties for identity–key bindings, however without providing a formal security model or cryptographic proofs.

Merkle Trees. Introduced by Merkle [24], Merkle trees have been used in many areas of cryptography and computer science, including in the construction of public key signatures from hash functions [25]. Most uses of Merkle trees concern a static dataset, but in CT we are concerned with a dynamic dataset, and in particular the append-only nature of the dataset.

There has been some work on authentication trees and more generally signatures on dynamic data sets. Bellare et al. [2, 3] introduced the notion of incremental cryptography. Naor and Nissim [26] use dynamic Merkle trees in the context of certificate revocation and updates, Li et al. [20] apply them to authenticate index structures in outsourced databases. Villemson [31] and Ogawa et al. [28] investigated the characteristics of (generalizations of) incremental Merkle trees.

Cryptographic PKI Analyses. Maurer [22] introduced a formal model for public key infrastructures (PKIs) which subsequently was further extended [4, 21]. This line of work approaches the dynamic nature of PKI issuance through an event-based system that captures the view of potential users at a certain point in time, using a combination of events that have happened and logical rules that infer certain conclusions from events. Our work differs from this approach by following a game-based approach focusing on the interaction between the parties involved. Our approach also conceptually distinguishes between values generated by honest parties, claims by dishonest parties, and conclusions drawn from events.

2 Cryptographic Building Blocks

Notation. We denote by \(\vec {E}\) an ordered list of entries, where () denotes the empty list. Indexing is 0-based: \(\vec {E}= (e_0, \dots , e_{n-1})\), and we write \(\vec {E}[i]\) to denote \(e_i\) and \(\vec {E}[i:j]\) to denote the sublist \((e_i, \dots , e_{j-1})\). We adopt the convention that \(\vec {E}[-1] = ()\). We write \(e\in \vec {E}\) to indicate that an entry \(e\) is contained in the list \(\vec {E}\). We let \(\vec {E}\Vert \vec {E}'\) denote the concatenation of two entry lists and write \(\vec {E}\prec \vec {E}'\) if \(\vec {E}\) is a prefix of \(\vec {E}'\). If we define \(P \leftarrow (t, e, \sigma )\), then we can later access fields of P using “object-oriented” notation: P.t, P.e, \(P.\sigma \). Moreover, if \(\vec {P}\) is a list \((P_0, \dots , P_{n-1})\), then the notation \(\vec {P}.e\) means the list \((P_0.e, \dots , P_{n-1}.e)\). The expression \(k \leftarrow 2^{\lceil \log _2(n/2) \rceil }\) corresponds to setting k to be the largest power of two less than n, i.e., \(\frac{n}{2} \le k=2^i < n\).

We rely on the standard notion of signature schemes and existential unforgeability under chosen-message attacks [12], and the corresponding advantage \(\mathrm {Adv}^{\mathsf {euf\text {-}{}cma}}_{\mathtt {SIG}}(\mathcal {A})\) of an adversary \(\mathcal {A} \) breaking this notion for a scheme \(\mathtt {SIG}\).

Definition 1

(Hash Collision Finding). Let \(\mathcal {M}\) be a set, let \(\mathrm {H}: \mathcal {M} \rightarrow \{0,1\}^{\lambda }\) be an unkeyed hash function, and let \(\mathcal {A} \) be an algorithm. We say that \(\mathcal {A} \) finds a collision in \(\mathrm {H}\) if \(\mathcal {A} \) outputs a pair \((m, m')\) such that \(m \ne m'\) and \(\mathrm {H}(m) = \mathrm {H}(m')\).

2.1 Merkle Trees

The use of hash trees for authenticating large amounts of data was first proposed by Merkle [24, 25]. Let \(\mathrm {H}: \{0,1\}^* \rightarrow \{0,1\}^{\lambda }\) be a hash function. In a Merkle hash tree for \(\vec {E}\), the values of \(\vec {E}\) are placed at the leaves of a binary tree and each intermediate node is the hash of its two child nodes; the root of the trees acts as a fingerprint of all the data contained in the tree; this is the output of the algorithm \(\mathtt {MTH}_{\mathrm {H}}(\vec {E})\) in Fig. 3. Note the use of prefixes 0 and 1 in hash function calculations provides “domain separation” between hash calculations for leaves (\(\mathrm {H}(0 \Vert \dots )\)) and intermediate nodes (\(\mathrm {H}(1 \Vert \dots )\)); preventing an attacker from gluing part of a tree into a leaf or vice versa.

Fig. 2.
figure 2

Merkle tree consistency proof \(\vec {C}=(\vec {C}[0], \dots , \vec {C}[3])\) between roots \(H_0\) (for a tree of size 3) and \(H_1\) (for a tree of size 6). denotes leaf nodes, denotes inner nodes, denotes nodes corresponding to consistency proof values.

A common technique is the use of an authentication path to demonstrate that a piece of data is in a leaf of a tree corresponding to a particular root. The authentication path generation algorithm \(\mathtt {Path}_{\mathrm {H}}(m,\vec {E})\) and verification algorithm \(\mathtt {CheckPath}_{\mathrm {H}}(e, H, n, \vec {A}, m)\) are shown in Fig. 3.

A lesser-known technique is the use of a consistency proof to demonstrate that the data corresponding to one root is a subset (prefix) of the data corresponding to another root, used, for example, in the context of tamper-evident history trees [6, 7]. In Fig. 2, the consistency proof \(\vec {C}\) shows that the data corresponding to root \(H_0\) is a prefix of the data corresponding to root \(H_1\). Consistency proofs reconstruct each of the two roots from relevant parts of the proof and compare them against the actual roots; the size of the two trees is essential in verifying a consistency proof. Consistency proofs may be viewed as an authentication path from the inner node immediately above the last leaf node in the first tree (i.e., an authentication path from \(\mathrm {H}(e_2) = \vec {C}[0]\) to root \(H_1\) in the right side of Fig. 2). The consistency proof generation algorithm \(\mathtt {ConsProof}_{\mathrm {H}}(m,n,\vec {E})\) and verification algorithm \(\mathtt {CheckConsProof}_{\mathrm {H}}(n_0, H_0, n_1, H_1, \vec {C})\) are shown in Fig. 3. We have reformulated these from how they appear in the RFC [17]: ours use a top-down recursive approach, whereas the RFC versions are bottom-up looping algorithms; the two are equivalent, but our versions are more helpful in proving our theorems.

2.2 Merkle Tree Security Properties

We now note some well-known facts about the collision resistance of Merkle tree hashing and the security of authentication paths in Merkle trees [24, 25]. For completeness, full proofs are given in the full version [8].

Fig. 3.
figure 3

Merkle tree algorithms

Lemma 1

(Collision Resistance of Merkle Trees). If \(\mathrm {H}\) is collision-resistant, then Merkle-tree hashing using \(\mathrm {H}\) is also collision-resistant. More precisely, if \(\mathcal {A} \) finds a collision in \(\mathtt {MTH}_{\mathrm {H}}\), then there exists algorithm \(\mathcal {B} _1^{\mathcal {A}}\) that finds a collision in \(\mathrm {H}\). Moreover, the runtime of \(\mathcal {B} _1^\mathcal {A} \) consists of the runtime of \(\mathcal {A} \), plus at most a quadratic (in the size of the larger list) number of hash evaluations.

Lemma 2

(Authentication Paths Consistency). If \(\mathrm {H}\) is collision-resistant, then no \(\mathtt {CheckPath}_{\mathrm {H}}\) authentication path \(\vec {A}\) can be generated with respect to Merkle-tree hashing \(\mathtt {MTH}_{\mathrm {H}}\) for an entry \(e\) not contained in the Merkle tree. More precisely, if \(\mathcal {A} \) outputs \((e, \vec {E}, \vec {A}, m)\) such that \(\mathtt {CheckPath}_{\mathrm {H}}(e, \mathtt {MTH}_{\mathrm {H}}(\vec {E}), |\vec {E}|, \vec {A}, m) = 1\) and \(e\notin \vec {E}\), then there exists algorithm \(\mathcal {B} _2^{\mathcal {A}}\) that finds a collision in \(\mathrm {H}\). Moreover, the runtime of \(\mathcal {B} _2^\mathcal {A} \) consists of the runtime of \(\mathcal {A} \), plus at most a quadratic (in \(|\vec {E}|\)) number of hash evaluations.

3 Logging Schemes

In this section we specify the algorithms that comprise a logging scheme and formulate CT as a logging scheme.

3.1 Definition of Logging Schemes

Our definition of a logging scheme is based around the certificate transparency functionality, but is designed to be potentially more general. We use non-CT specific language (such as “fingerprint” instead of the CT-specific “signed tree head”), and our logging scheme is not actually about certificates—any type of object can be logged.

Definition 2

(Logging Scheme). A logging scheme \(\mathrm {LS}\) consists of the following algorithms, some of which are run by a logger and some of which are run by a monitor/auditor.

The following algorithm is used by a logger to initialize its log:

  • \(\mathrm {KeyGen}() \mathop {\rightarrow }\limits ^{{\scriptscriptstyle \$}}(st, pk, sk)\): A probabilistic algorithm that returns a state stand a public key/secret key pair \((pk, sk)\).

The following algorithms are used by a logger to add entries to its log, using a two-step process of promising to add an entry to the log and then a batch update actually adding the entries:

  • \(\mathrm {PromiseEntry}(e, t, sk) \mathop {\rightarrow }\limits ^{{\scriptscriptstyle \$}}P\): A probabilistic algorithm that takes as input a log entry \(e\), a time t, and the secret key \(sk\) and outputs a promise \(P\); the promise contains the entry and time as subfields \(P.e\) and \(P.t\).

  • \(\mathrm {UpdateLog}(st, \vec {P}, t, sk) \mathop {\rightarrow }\limits ^{{\scriptscriptstyle \$}}(st', F)\): A probabilistic algorithm that takes as input a state st, a potentially empty ordered list of promises \(\vec {P}\) to add to the log, a time t and the secret key \(sk\) and returns an updated state \(st'\) and a fingerprint \(F\) (where the latter includes the indicated time, denoted as \(F.t\))

The following algorithms are used by a logger to demonstrate various properties to monitors/auditors:

  • \(\mathrm {PresentEntries}(st, F) \rightarrow \vec {E}\) or \(\bot \): A deterministic algorithm that takes as input a state stand a fingerprint \(F\) and outputs an ordered list of log entries \(\vec {E}\), or an error symbol \(\bot \).

  • \(\mathrm {ProveMembership}(st, e, F) \mathop {\rightarrow }\limits ^{{\scriptscriptstyle \$}}\vec {M}\) or \(\bot \): A probabilistic algorithmFootnote 2 that takes as input a state \(st\), a log entry \(e\), and a fingerprint \(F\) and outputs a membership proof \(\vec {M}\), or an error symbol \(\bot \).

  • \(\mathrm {ProveConsistency}(st, F_0, F_1) \mathop {\rightarrow }\limits ^{{\scriptscriptstyle \$}}\vec {C}\) or \(\bot \): A probabilistic algorithm2 that takes as input a state \(st\) and two fingerprints \(F_0\) and \(F_1\) and outputs a consistency proof \(\vec {C}\), or an error symbol \(\bot \).

The following algorithms are used by monitors/auditors to check a log:

  • \(\mathrm {CheckPromise}(P, pk) \rightarrow \{0,1\}\): A deterministic algorithm that takes as input a promise \(P\) (which includes an entry \(P.e\)) and a public key \(pk\) and outputs a bit \(b \in \{0,1\}\).

  • \(\mathrm {CheckFingerprint}(F, pk) \rightarrow \{0,1\}\): A deterministic algorithm that takes as input a fingerprint \(F\) and a public key \(pk\) and outputs a bit \(b \in \{0,1\}\).

  • \(\mathrm {CheckEntries}(\vec {E}, F, pk) \rightarrow \{0,1\}\): A deterministic algorithm that takes as input an ordered list of log entries \(\vec {E}\), a fingerprint \(F\), and a public key \(pk\) and outputs a bit \(b \in \{0,1\}\).

  • \(\mathrm {CheckMembership}(F, e, \vec {M}, pk) \rightarrow \{0,1\}\): A deterministic algorithm that takes as input a fingerprint \(F\), an entry \(e\), a membership proof \(\vec {M}\), and a public key \(pk\) and outputs a bit \(b \in \{0,1\}\).

  • \(\mathrm {CheckConsistency}(F_0, F_1, \vec {C}, pk) \rightarrow \{0,1\}\): A deterministic algorithm that takes as input two fingerprints \(F_0\) and \(F_1\), a consistency proof \(\vec {C}\), and a public key \(pk\) and outputs a bit \(b \in \{0,1\}\).

Correctness of a logging scheme is defined in the natural way and is omitted due to space constraints; see the full version [8].

3.2 Instantiation of Certificate Transparency as a Logging Scheme

Figures 4 and 5 formulate Certificate Transparency using \(\mathrm {H}\) and \(\mathtt {SIG}\) as a logging scheme \(\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}\) (i.e., following Definition 2). A log entry in CT is a chain of X.509 certificates: the certificate (or partially completed pre-certificate) itself, and each intermediate CA’s certificate leading to the root CA’s cert. We treat entries in our formalization of logging schemes as opaque bit strings: our fomulation hence omits any syntactical checks for the entries it manages; adding these checks is independent of the logging properties. The promise \(P\) is called a signed certificate timestamp (SCT). The fingerprint \(F\) is called the signed tree head (STH).

Fig. 4.
figure 4

Certificate Transparency: algorithms run by loggers.

Fig. 5.
figure 5

Certificate Transparency: algorithms run by monitors/auditors.

3.3 CONIKS as a Logging Scheme

CONIKS [23] is a recent transparency log scheme that aims to enable privacy-preserving transparency logging for end-user keys, for applications such as secure messaging. Our definition of logging scheme can capture several aspects of CONIKS’ functionality and security, but also serves to highlight some significant differences between CT and CONIKS.

CONIKS also uses a Merkle tree structure, but in contrast to CT uses a Merkle prefix tree in which some attribute of an entry (e.g., the user’s identity) determines its position. The tree root is computed both from present entries and placeholder values for empty subtrees, allowing efficient calculation over very large but mostly empty trees. It is signed and published by the logger as the signed tree root (STR). Membership proofs can be performed in the standard way using Merkle authentication paths. Signed tree roots are linked over time using a hash chain, including the previous signed tree root. However, this does not enable consistency proofs as in CT: verification that a key that was present in STR\(_i\) is also present in STR\(_j\) requires fresh membership proof of that key’s presence in STR\(_j\). Two core security properties of CONIKS are non-equivocation (a provider cannot present diverging views) and privacy-preserving consistency proofs (privacy here meaning with respect to other entries’ information).

CONIKS can be mapped onto the following notions in our definition of a logging scheme. The \(\mathrm {KeyGen}\) algorithm is run by the logger. CONIKS has no separate notion of promise and log entry, combining \(\mathrm {PromiseEntry}\) and \(\mathrm {UpdateLog}\). \(\mathrm {CheckFingerprint}\) will verify a signed tree root similarly. Aiming at privacy, CONIKS does not include \(\mathrm {PresentEntries}\) and \(\mathrm {CheckEntries}\). \(\mathrm {ProveMembership}\) and \(\mathrm {CheckMembership}\) are supported. \(\mathrm {ProveConsistency}\) and \(\mathrm {CheckConsistency}\) are not directly supported; as noted above, an auditor would need to use \(\mathrm {ProveMembership}\) and \(\mathrm {CheckMembership}\) for each entry.

In terms of security properties, none of ours directly map onto CONIKS’ notions, primarily because of including \(\mathrm {CheckEntries}\). However, some notions are similar. Non-equivocation is similar to \(\mathsf {proof\text {-}{}coll}\), except that it involves two \(\mathrm {CheckMembership}\) computations, rather than one \(\mathrm {CheckMembership}\) and one \(\mathrm {CheckEntries}\) computation (our \(\mathsf {entry\text {-}{}coll}\) and \(\mathsf {proof\text {-}{}coll}\) together imply this new notion). Our \(\mathsf {promise\text {-}{}incl}\) property matches with a similar change from \(\mathrm {CheckEntries}\) to \(\mathrm {CheckMembership}\), and ignoring maximum merge delays. Consistency of STRs in CONIKS is quite a bit different from our \(\mathsf {entry\text {-}{}cons}\) property, as CONIKS’ involves probabilistic spot-checks using membership proofs.

4 Security Goals

For the security properties of logging schemes that can be proved cryptographically, our security definitions follow a provable security game-based approach. We consider three properties involving security against a malicious logger, in which the experiment acts as an honest monitor/auditor which the logger is trying to fool. We also consider one security property involving security against a malicious monitor/auditor, in which the experiment acts as an honest logger which the monitor/auditor is trying to frame for bad behaviour.

Security Against a Malicious Logger. Since the fingerprint (signed tree hash in CT) is used to concisely represent the contents of the log, the first two cryptographic security properties against a malicious logger, shown in Fig. 6, concern the ability of the logger to make the fingerprint represent different, conflicting information. Collision resistance of entries, defined in the experiment \(\mathsf {entry\text {-}{}coll}\), requires that it is hard for a malicious logger to come up with a single fingerprint representing two different sets of entries. Collision resistance of proofs, formalized in the experiment \(\mathsf {proof\text {-}{}coll}\), is about the difficulty for a malicious logger to create a proof that an entry is represented by a fingerprint while simultaneously claiming that the set of entries represented by that fingerprint does not include that particular entry. A scheme that satisfies both of these ensures that a malicious logger cannot make parties who use the same fingerprint believe different things about the log entries represented by that fingerprint.

Fig. 6.
figure 6

Security properties of a logging scheme \(\mathrm {LS}\) against a malicious logger.

Logs are updated over time, but are meant to be append-only. However, since logs are only represented by fingerprints, consistency proofs are used to connect two fingerprints and are meant to prove that the set of entries represented by one fingerprint is a subset of the set of entries represented by a second fingerprint—in other words, that the fingerprints are representative of an append-only log. The final security property in Fig. 6 captures the consistency of entries, i.e., the difficulty for a malicious logger to remove an entry from a log: experiment \(\mathsf {entry\text {-}{}cons}\) is concerned with two fingerprints connected by a single consistency proof. A “multi-hop” version, concerned with a chain of fingerprints connected by consistency proofs, can easily be formulated and shown to follow directly from the “single-hop” version.

Fig. 7.
figure 7

Security properties of a logging scheme \(\mathrm {LS}\) against a malicious monitor/auditor framing a log for failing to include a promised entry.

Security Against a Malicious Monitor/Auditor. The security properties described above are cryptographic, meaning that (under some computational assumptions) it is not possible for a malicious logger to perform certain actions. However, there are some security goals of CT that are not cryptographic. For example, a log could choose to omit an entry that it has promised to log, and no amount of cryptography can prevent it from doing so. Should a log issue a fingerprint after the time by which it has promised to log an entry but the log does not contain an entry, that constitutes evidence of the log’s misbehaviour.

However, to protect honest loggers, it should not be possible to frame an honest logger for misbehaviour that did not actually happen, which is the security guarantee formalized as inclusion of promises in experiment \(\mathsf {promise\text {-}{}incl}\) in Fig. 7. Here the experiment plays the role of an honest logger against a malicious monitor/auditor, so we allow the adversary (the malicious monitor/logger) to interact with experiment oracles that carry out the actions of an honest log, such as adding entries or proving membership. The experiment includes a global time which advances at the adversary’s command, and is parameterized by a maximum merge delay \(\mathsf {MMD}> 0\), within which an honest log is expected to include a promised entry. The list \(\vec {E}_{promised}\) tracks entries that the log has promised to include; in calls to \(\mathsf {OTick}\) the experiment (acting as the honest log) automatically adds the list of promised entries by the end of the maximum merge delay window.

5 Security of Certificate Transparency

We are now ready to prove the security results on Certificate Transparency, namely that its instantiation \(\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}\) within our logging scheme frameworks guarantees collision resistance of entities and proofs, consistency of entries, and inclusion of promises.

Theorems 1 and 2 below connect rather immediately with the security properties of the underlying Merkle tree hash, so we omit the arguments due to space constraints; they appear in the full version [8]. Lemmas 1 and 2 then connect the Merkle tree hash properties to finding a collision in \(\mathrm {H}\), which is infeasible if \(\mathrm {H}\) is collision-resistant. For Theorem 3 we also provide the proof in the full version due to space restrictions; the proof for Theorem 4 is given in Appendix A.

Theorem 1

(Collision Resistance of Entries). If hash function \(\mathrm {H}\) is collision-resistant, then, in Certificate Transparency (with hash function \(\mathrm {H}\)), no malicious logger can present different log entries for the same fingerprint. More precisely, if \(\mathcal {A} \) wins \(\mathrm {Exp}^{\mathsf {entry\text {-}{}coll}}_{\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}}\), then algorithm \(\mathcal {B} ^{\mathcal {A}}\), which runs \(\mathcal {A} \) and then returns the first two components of \(\mathcal {A} \)’s output, finds a collision in \(\mathtt {MTH}_{\mathrm {H}}\). Moreover, the runtime of \(\mathcal {B} ^{\mathcal {A}}\) is the same as that of \(\mathcal {A} \).

Theorem 2

(Collision Resistance of Proofs). If hash function \(\mathrm {H}\) is collision-resistant then, in Certificate Transparency (with hash function \(\mathrm {H}\)) no malicious logger can present a list of log entries under some fingerprint and a membership proof under the same fingerprint for an entry not contained in this list. More precisely, if \(\mathcal {A} \) wins \(\mathrm {Exp}^{\mathsf {proof\text {-}{}coll}}_{\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}}\) by outputting \((e, \vec {E}, F, \vec {M}, pk)\), then algorithm \(\mathcal {B} ^{\mathcal {A}}\), which runs \(\mathcal {A} \) and then returns \((e, \vec {E}, \vec {M}.\vec {A}, \vec {M}.m)\), breaks authentication path consistency in the sense of Lemma 2. Moreover, the runtime of \(\mathcal {B} ^{\mathcal {A}}\) is the same as that of \(\mathcal {A} \).

Theorem 3

(Consistency of Entries). If hash function \(\mathrm {H}\) is collision-resistant, then, in Certificate Transparency (with hash function \(\mathrm {H}\)), no malicious logger can present two lists of entries, two fingerprints, and a consistency proof such that each list corresponds to the fingerprint, and the fingerprints are connected via the consistency proof, but the first list of entries is not a prefix of the second list of entries. More precisely, if \(\mathcal {A} \) wins \(\mathrm {Exp}^{\mathsf {entry\text {-}{}cons}}_{\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}}\), then algorithm \(\mathcal {B} _3^{\mathcal {A}}\) given in the full version [8] finds a collision in \(\mathrm {H}\). Moreover, the runtime of \(\mathcal {B} _3^\mathcal {A} \) consists of the runtime of \(\mathcal {A} \), plus at most a quadratic (in the size of the second list) number of hash evaluations.

Theorem 4

(Inclusion of Promises). If hash function \(\mathrm {H}\) is collision-resistant and signature scheme \(\mathtt {SIG}\) is existentially unforgeable under chosen-message attacks, then, in Certificate Transparency (with hash function \(\mathrm {H}\) and signature scheme \(\mathtt {SIG}\)), no malicious monitor/auditor can frame an honest logger of not including a promised entry within the maximum merge delay. More precisely, if algorithm \(\mathcal {A} \) wins \(\mathrm {Exp}^{\mathsf {promise\text {-}{}incl}}_{\mathtt {CT}_{\mathrm {H},\mathtt {SIG}}}\), then there exist algorithms \(\mathcal {B} ^{\mathcal {A}}\) and \(\mathcal {C} ^{\mathcal {A}}\), described in the proof, that find a collision in \(\mathtt {MTH}_{\mathrm {H}}\) or a forgery in \(\mathtt {SIG}\), respectively. Moreover, the runtimes of \(\mathcal {B} ^{\mathcal {A}}\) and \(\mathcal {C} ^{\mathcal {A}}\) are approximately the same as that of \(\mathcal {A} \).

6 Conclusion and Future Work

Certificate Transparency is a promising approach for providing assurances in the web PKI by using untrusted auditable public logs to detect fraudulently issued certificates. We introduced a generic model for logging schemes and captured Certificate Transparency as one specific instance of our model. Based on the security notions we formalized, we were able to analyze the cryptographic aspects of CT and show how its cryptographic mechanisms prevent both undetected misbehaviour of log servers as well as false accusations of honest loggers.

Although cryptography plays an essential role to establish the trust necessary in a public and auditable logging scheme like Certificate Transparency, there are other components involved that are difficult or even impossible to capture in a cryptographic model. For example, under various conditions on adversary control of the network and with various patterns of honest entity behaviour, how long does it take for the CT gossiping protocol to propagate SCTs and STHs to ensure detection of dishonest log behaviour? Once misbehaviour is detected, what organizational measures should be taken to ensure an appropriate response? Analyzing these components in general as well as their specific relevance in the CT framework is an important task for future work.