1 Introduction

Secure instant messaging (SIM) ensures privacy and security by making sure that only the person you are sending the message to can read the message, a.k.a. end-to-end encryption. With the ever-growing awareness against mass-surveillance of communications, people have become more privacy-aware and the demand for SIM has been steadily increasing. While there have been a range of SIM protocols, the Signal protocol [1] is widely regarded as the gold standard. Not only is it used by the Signal appFootnote 1, the Signal protocol is also used by WhatsApp, Skype, Facebook Messenger among many others, where the number of active users is well over 2 billions. One of the reasons for such popularity is due to the simplicity and the strong security properties it provides, such as forward secrecy and post-compromise secrecy, while simultaneously allowing for the same user experience as any (non-cryptographically secure) instant messaging app.

The Signal protocol consists of two sub-protocols: the X3DH protocol [45] and the double ratchet protocol [44]. The former protocol can be viewed as a type of key exchange protocol allowing two parties to exchange a secure initial session key. The latter protocol is executed after the X3DH protocol and it allows two parties to perform a secure back-and-forth message delivery. Below, we briefly recall the current affair of these two protocols.

The Double Ratchet Protocol. The first attempt at a full security analysis of the Signal protocol was made by Cohn-Gordon et al. [18, 19]. They considered the Signal protocol as one large protocol and analyzed the security guarantees in its entirety. Since the double ratchet protocol was understood to be the root of the complexity, many subsequent works aimed at further abstracting and formalizing (and in some cases enhancing) the security of the double ratchet protocol by viewing it as a stand-alone protocol [2, 9, 26, 36, 37, 49]. Under these works, our understanding of the double ratchet protocol has much matured. Notably, Alwen et al. [2] fully abstracted the complex Diffie-Hellman based double ratchet protocol used by Signal and provided a concrete security model along with a generic construction based on simple building blocks. Since these blocks are instantiable from versatile assumptions, including post-quantum ones, their work resulted in the first post-quantum secure double ratchet protocol. Here, we elucidate that all the aforementioned works analyze the double ratchet protocol as a stand-alone primitive, and hence, it is assumed that any two parties can securely share an initial session key, for instance, by executing a “secure” X3DH protocol.

The X3DH Protocol. In contrast, other than the white paper offered by Signal [45] and those indirectly considered by Cohn-Gordon et al. [18, 19], works focusing on the X3DH protocol seems to be limited. As far as we are aware, there is one recent work that studies the formalization [14] and a few papers that study one of the appealing security properties, known as (off-line) deniability, claimed by the X3DH protocol [51,52,53].

Brendel et al. [14] abstract the X3DH protocol and provides the first generic construction based on a new primitive they call a split key encapsulation mechanism (KEM). However, so far, instantiations of split KEMs with strong security guarantees required for the X3DH protocol are limited to Diffie-Hellman style assumptions. In fact, the recent result of Guo et al. [33] implies that it would be difficult to construct them from one of the promising post-quantum candidates: lattice-based assumptions (and presumably coded-based assumptions). On the other hand, Vatandas et al. [53] study one of the security guarantees widely assumed for the X3DH protocol called (off-line) deniability [45, Section 4.4] and showed that a strong knowledge-type assumption would be necessary to formally prove it. Unger and Goldberg [51, 52] construct several protocols that can be used as a drop-in replacement of the X3DH protocol that achieves a strong flavor of (on-line) deniability from standard assumptions, albeit by making a noticeable sacrifice in the security against key-compromise attacks: a type of attack that exploits leaked secret information of a party. For instance, while the X3DH protocol is secure against key-compromise impersonation (KCI) attacks [11],Footnote 2 the protocols of Unger and Goldberg are no longer secure against such attacks.Footnote 3

Motivation. In summary, although we have a rough understanding of what the X3DH protocol offers [18, 19, 45], the current state of affairs is unsatisfactory for the following reasons, and making progress on these issues will be the focus of this work:

  • It is difficult to formally understand the security guarantees offered by the X3DH protocol or to make a meaningful comparison among different protocols achieving the same functionality as the X3DH protocol without a clearly defined security model.

  • The X3DH protocol is so far only instantiable from Diffie-Hellman style assumptions [14] and it is unclear whether such assumptions are inherent to the Signal protocol.

  • Ideally, similarly to what Alwen et al. [2] did for the double ratchet protocol, we would like to abstract the X3DH protocol and have a generic construction based on simple building blocks that can be instantiated from versatile assumptions, including but not limited to post-quantum ones.

  • No matter how secure the double ratchet protocol is, we cannot completely secure the Signal protocol if the initial X3DH protocol is the weakest link in the chain (e.g., insecure against state-leakage and only offering security against classical adversaries).

1.1 Our Contribution

In this work, we cast the X3DH protocol (see Fig. 1) as a specific type of authenticated key exchange (AKE) protocol, which we call a Signal-conforming AKE protocol, and define its security model based on the vast prior work on AKE protocols. We then provide an efficient generic construction of a Signal-conforming AKE protocol based on standard cryptographic primitives: an (\(\mathsf {IND\text {-}CCA}\) secure) KEM, a signature scheme, and a pseudorandom function (PRF). Since all of these primitives can be based on well-established post-quantum assumptions, this results in the first post-quantum secure replacement of the X3DH protocol. Similarly to the X3DH protocol, our Signal-conforming AKE protocol offers a strong flavor of key-compromise security. Borrowing terminologies from AKE-related literature, our protocol is proven secure in the strong Canetti-Krawczyk (CK) type security models [15, 30, 39, 42], where the exchanged session key remains secure even if all the non-trivial combinations of the long-term secrets and session-specific secrets of the parties are compromised. In fact, our protocol is more secure than the X3DH protocol since it is even secure against KCI-attacks where the parties’ session-specific secrets are compromised (see Footnote 5).Footnote 4 We believe the level of security offered by our Signal-conforming AKE protocol aligns with the level of security guaranteed by the double ratchet protocol where (a specific notion of) security still holds even when such secrets are compromised. Moreover, while our Signal-conforming AKE already provides a weak form of deniability, we can strengthen its deniability by using a ring signature scheme instead of a signature scheme. Likewise to the X3DH protocol [53] although our construction seemingly offers (off-line) deniability, the formal proof relies on a strong knowledge-type assumption. However, relying on such assumptions seems unavoidable considering that all known deniable AKE protocols secure against key-compromise attacks, including the X3DH protocol, rely on them [24, 53, 57].

We implemented our (weakly deniable) Signal-conforming AKE protocol in C, building on the open source libraries PQClean and LibTomCrypt. Our implementationFootnote 5 is fully generic and can thus be instantiated with a wide range of KEMs and signature schemes. We instantiate it with several Round 3 candidates (finalists and alternates) to the NIST post-quantum standardization process, and compare the bandwidth and computation costs that result from these choices. Our protocol performs best with “balanced” schemes, for example most lattice-based schemes. The isogeny-based scheme SIKE offers good bandwidth performance, but entails a significant computation cost. Finally, schemes with large public keys (Classic McEliece, Rainbow, etc.) do not seem to be a good match for our protocol, since these keys are transferred at each run of the protocol.

1.2 Technical Overview

We now briefly recall the X3DH protocol and abstract its required properties by viewing it through the lens of AKE protocols. We then provide an overview of how to construct a Signal-conforming AKE protocol from standard assumptions.

Recap on the X3DH Protocol. At a high level, the X3DH protocol allows for an asynchronous key exchange where two parties, say Alice and Bob, exchange a session key without having to be online at the same time. Even more, the party, say Bob, that wishes to send a secure message to Alice can do so without Alice even knowing Bob. For instance, imagine the scenario where you send a friend request and a message at the same time before being accepted as a friend. At first glance, it seems what we require is a non-interactive key exchange (NIKE) since Bob needs to exchange a key with Alice who is offline, while Alice does not yet know that Bob is trying to communicate with her. Unfortunately, solutions based on NIKEs are undesirable since they either provide weaker guarantees than standard (interactive) AKE or exhibit inefficient constructions [10, 17, 29, 50].

The X3DH protocol circumvents this issue by considering an untrusted server (e.g., the Signal server) to sit in the middle between Alice and Bob to serve as a public bulletin board. That is, the parties can store and retrieve information from the server while the server is not assumed to act honestly. A simplified description of the X3DH protocol, which still satisfies our purpose, based on the classical Diffie-Hellman (DH) key exchange is provided in Fig. 1.Footnote 6 As the first step, Alice sends her DH component \(g^x \in \mathbb {G}\) to the serverFootnote 7 and then possibly goes offline. We point out that Alice does not need to know who she will be communicating with at this point. Bob, who may ad-hocly decide to communicate with Alice, then fetches Alice’s first message from the server and uploads its DH component \(g^y\) to the server. As in a typical DH key exchange, Bob computes the session key \(\mathsf {k}_\mathsf {B} \) using the long-term secret exponent \(b \in \mathbb {Z}_p\) and session-specific secret exponent \(y \in \mathbb {Z}_p\). Since Bob can compute the session key \(\mathsf {k}_\mathsf {B} \) while Alice is offline, he can begin executing the subsequent double ratchet protocol without waiting for Alice to come online. Whenever Alice comes online, she can fetch whatever message Bob sent from the server.

Fig. 1.
figure 1

Simplified description of the X3DH Protocol. Alice and Bob have the long-term key pairs \((\mathsf {lpk} _\mathsf {A}, \mathsf {lsk}_\mathsf {A})\) and \((\mathsf {lpk} _\mathsf {B}, \mathsf {lsk}_\mathsf {B})\), respectively. Alice and Bob agree on a session key \(\mathsf {k}_\mathsf {A} = \mathsf {k}_\mathsf {B} \), where \(\mathsf {KDF} \) denotes a key derivation function.

Casting the X3DH Protocol as an AKE Protocol. It is not difficult to see that the X3DH protocol can be cast as a specific type of AKE protocol. In particular, we can think of the server as an adversary that tries to mount a man-in-the-middle (MIM) attack in a standard AKE protocol. Viewing the server as a malicious adversary, rather than some semi-honest entity, has two benefits: the parties do not need to put trust in the server since the protocol is supposed to be secure even against a malicious server, while the server or the company providing the app is relieved from having to “prove” that it is behaving honestly. One distinguishing feature required by the X3DH protocol when viewed as an AKE protocol is that it needs to be a two-round protocol where the initiator message is generated independently from the receiver. That is, Alice needs to be able to store her first message to the server without knowing who she will be communicating with. In this work, we define an AKE protocol with such functionality as a Signal-conforming AKE protocol.

Regarding the security model for a Signal-conforming AKE protocol, we base it on the vast prior works on AKE protocols. Specifically, we build on the recent formalization of [20, 32] that study the tightness of efficient AKE protocols (including a slight variant of the X3DH protocol) and strengthen the model to also incorporate state leakage compromise; a model where an adversary can obtain session-specific information called session-state. Since the double ratchet protocol considers a very strong form of state leakage security, we believe it would be the most rational design choice to discuss the X3DH protocol in a security model that captures such leakage as well. Informally, we consider our Signal-conforming AKE protocol in the Canetti-Krawczyk (CK) type security model [15, 30, 39, 42], which is a strengthening of the Bellare-Rogaway security model [7] considered by [20, 32]. A detailed discussion and comparison between ours and the numerous other security models of AKE protocols are provided in Sect. 3.

Lack of Signal-Conforming AKE Protocol. The main feature of a Signal-conforming AKE protocol is that the initiator’s message does not depend on the receiver. Although this seems like a very natural feature considering DH-type AKE protocols, it turns out that they are quite unique (see Brendel et al. [14] for some discussion). For instance, as far as we are aware, the only other assumption that allows for a clean analog of the X3DH protocol is based on the gap CSIDH assumption recently introduced by De Kock et al. [22] and Kawashima et al. [38]. Considering the community is still in the process of assessing the concrete parameter selection for standard CSIDH [13, 48], it would be desirable to base the X3DH protocol on more well-established and versatile assumptions. On the other hand, when we turn our eyes to known generic construction of AKE protocols [30, 31, 34, 54, 55] that can be instantiated from versatile assumptions, including post-quantum ones, we observe that none of them is Signal-conforming. That is, they are all either non-2-round or the initiator’s message depends on the public key of the receiver.

Our Construction. To this end, in this work, we provide a new practical generic construction of a Signal-conforming AKE protocol from an (\(\mathsf {IND\text {-}CCA}\) secure) KEM and a signature scheme. We believe this may be of independent interest in other scenarios where we require an AKE protocol that has a flavor of “receiver obliviousness.”Footnote 8 The construction is simple: The construction is simple: Let us assume Alice and Bob’s long-term key consist of KEM key pairs \((\mathsf {ek}_\mathsf {A}, \mathsf {dk}_\mathsf {A})\) and \((\mathsf {ek}_\mathsf {B}, \mathsf {dk}_\mathsf {B})\) and signature key pairs \(({\mathsf{vk}}_\mathsf {A}, {\mathsf{sk}}_\mathsf {A})\) and \(({\mathsf{vk}}_\mathsf {B}, {\mathsf{sk}}_\mathsf {B})\), respectively. The Signal-conforming AKE protocol then starts by Alice (i.e., the initiator) generating a session-specific KEM key \((\mathsf {ek}_T, \mathsf {dk}_T)\) and sending \(\mathsf {ek}_T\) to Bob (i.e., the receiver).Footnote 9 Here, observe that Alice’s message does not depend on who she will be communicating with. Bob then constructs two ciphertexts: one using Alice’s long-term key \((\mathsf {K}_\mathsf {A}, \mathsf {C}_\mathsf {A}) \leftarrow \mathsf {KEM}.\mathsf {Encap}( \mathsf {ek}_\mathsf {A})\) and another using the session-specific key \((\mathsf {K}_T, \mathsf {C}_T) \leftarrow \mathsf {KEM}.\mathsf {Encap}( \mathsf {ek}_T )\). It then signs these ciphertext \(\mathsf {M}:= (\mathsf {C}_\mathsf {A}, \mathsf {C}_T)\) as \(\sigma _\mathsf {B} \leftarrow \mathsf{SIG}.\mathsf{Sign}( {\mathsf{sk}}_\mathsf {B}, \mathsf {M})\), where we include other session-specific components in \(\mathsf {M}\) in the actual construction. Since sending \(\sigma _\mathsf {B} \) in the clear may serve as public evidence that Bob communicated with Alice, Bob will hide this. To this end, he derives two keys, a session key \(\mathsf {k}_\mathsf {AKE}\) and a one-time pad key \(\mathsf {k}_\mathsf{OTP}\), by running a key derivation function on input the random KEM keys \((\mathsf {K}_\mathsf {A}, \mathsf {K}_T)\). Bob then sends \((\mathsf {C}_\mathsf {A}, \mathsf {C}_T, \mathsf {c}:= \sigma _\mathsf {B} \oplus \mathsf {k}_\mathsf{OTP} )\) to Alice and sets the session key as \(\mathsf {k}_\mathsf {AKE}\). Once Alice receives the message from Bob, she decrypts the ciphertexts \((\mathsf {C}_\mathsf {A}, \mathsf {C}_T)\), derives the two keys \((\mathsf {k}_\mathsf {AKE}, \mathsf {k}_\mathsf{OPT})\), and checks if \(\sigma := \mathsf {c}\oplus \mathsf {k}_\mathsf{OTP} \) is a valid signature of Bob’s. If so, she sets the session key as \(\mathsf {k}_\mathsf {AKE}\). At a high level, Alice (explicitly) authenticates Bob through verifying Bob’s signature and Bob (implicitly) authenticates Alice since Alice is the only party that can decrypt both ciphertexts \((\mathsf {C}_\mathsf {A}, \mathsf {C}_T)\). We turn this intuition into a formal proof and show that our scheme satisfies a strong flavor of security where the shared session key remains pseudorandom even to an adversary that can obtain any non-trivial combinations of the long-term private keys (i.e., \(\mathsf {dk}_\mathsf {A}, \mathsf {dk}_\mathsf {B}, {\mathsf{sk}}_\mathsf {A},{\mathsf{sk}}_\mathsf {B} \)) and session-specific secret keys (i.e., \(\mathsf {dk}_T\)). Notably, our protocol satisfies a stronger notion of security compared to the X3DH protocol since it prevents an adversary to impersonate Alice even if her session-specific secret key is compromised [45, Section 4.6].

Finally, our Signal-conforming AKE protocol already satisfies a limited form of deniability where the publicly exchanged messages do not directly leak the participant of the protocol. However, if Alice at a later point gets compromised or turns malicious, she can publicize the signature \(\sigma _\mathsf {B} \) sent from Bob to cryptographically prove that Bob was communicating with Alice. This is in contrast to the X3DH protocol that does not allow such a deniability attack. We, therefore, show that we can protect Bob from such attacks by replacing the signature scheme with a ring signature scheme. In particular, Alice now further sends a session-specific ring signature verification key \({\mathsf{vk}}_T\), and Bob signs to the ring \(\left\{ {\mathsf{vk}}_T, {\mathsf{vk}}_\mathsf {B} \right\} \). Effectively, when Alice outputs a signature from Bob \(\sigma _{\mathsf {B}, T}\), she cannot fully convince a third party whether it originates from Bob since she could have signed \(\sigma _{\mathsf {B}, T}\) using her signing key \({\mathsf{sk}}_T\) corresponding to \({\mathsf{vk}}_T\). Although the intuition is clear, it turns out that turning this into a formal proof is quite difficult. Similar to all previous works on AKE protocols satisfying a strong flavor of key-compromise security [24, 57] (including the X3DH protocol [53]), the proof of deniability must rely on a strong knowledge-type assumption. We leave it as future work to investigate the deniability of our Signal-conforming AKE protocols from more standard assumptions.

2 Preliminaries

The operator \(\oplus \) denotes bit-wise “XOR”, and \(\Vert \) denotes string concatenation. For \(n \in \mathbb {N}\), we write [n] to denote the set \([n] \mathrel {\mathop :}={} \left\{ 1,\ldots ,n \right\} \). For \(j \in [n]\), we write \([n \backslash j]\) to denote the set \([n \backslash j] \mathrel {\mathop :}={} \left\{ 1,\ldots ,n \right\} \setminus \left\{ j \right\} \). We denote by the sampling of an element x uniformly at random from a finite set S. \(\mathsf {PPT}\) (resp. \(\mathsf {QPT}\)) stands for probabilistic (resp. quantum) polynomial time. Due to page limitation, we refer standard definitions to the full version.

3 Security Model for Signal-Conforming AKE Protocols

In this section, we define a security model for a Signal-conforming authenticated key exchange (AKE) protocol; AKE protocols that can be used as a drop-in replacement of the X3DH protocol. We first provide in Sects. 3.1 to 3.3 a game-based security model building on the recent formalization of [20, 32] targeting general AKE protocols. We then discuss in Sect. 3.4 the modifications needed to make it Signal-conforming. A detailed comparison and discussion between ours and other various security models for AKE protocols are provided in Sect. 3.5.

3.1 Execution Environment

We consider a system of \(\mu \) parties \( P _{ 1 }, \ldots , P _{ \mu }\). Each party \( P _{ i }\) is represented by a set of \(\ell \) oracles \(\left\{ \pi ^{1}_{i}, \ldots , \pi ^{\ell }_{i} \right\} \), where each oracle corresponds to a single execution of a protocol, and \(\ell \in \mathbb {N}\) is the maximum number of protocol sessions per party. Each oracle is equipped with fixed randomness but is otherwise deterministic. Each oracle \(\pi ^{s}_{i}\) has access to the long-term key pair \((\mathsf {lpk} _{i}, \mathsf {lsk}_{i})\) of \( P _{ i }\) and the public keys of all other parties, and maintains a list of the following local variables:

  • \(\mathsf {rand}^{s}_{i}\) is the randomness hard-wired to \(\pi ^{s}_{i}\);

  • \(\mathsf {sid}{}^{s}_{i}\) (“session identifier”) stores the identity of the session as specified by the protocol;

  • \(\mathsf {Pid}{}^{s}_{i}\) (“peer id”) stores the identity of the intended communication partner;

  • \(\varPsi ^{s}_{i} \in \left\{ \bot , \mathtt {accept}, \mathtt {reject} \right\} \) indicates whether oracle \(\pi ^{s}_{i}\) has successfully completed the protocol execution and “accepted” the resulting key;

  • \(\mathsf {k}^{s}_{i}\) stores the session key computed by \(\pi ^{s}_{i}\);

  • \(\mathsf {state}^{s}_{i}\) holds the (secret) session-state values and intermediary results required by the session;

  • \(\mathsf {role}^{s}_{i} \in \left\{ \bot , \mathtt {init}, \mathtt {resp} \right\} \) indicates \(\pi ^{s}_{i}\)’s role during the protocol execution.

For each oracle \(\pi ^{s}_{i}\), these variables, except the randomness, are initialized to \(\bot \). An AKE protocol is executed interactively between two oracles. An oracle that first sends a message is called an initiator (\(\mathsf {role}= \mathtt {init}\)) and a party that first receives a message is called a responder (\(\mathsf {role}= \mathtt {resp}\)). The computed session key is assigned to the variable \(\mathsf {k}^{s}_{i}\) if and only if \(\pi ^{s}_{i}\) reaches the \(\mathtt {accept}\) state, that is, \(\mathsf {k}^{s}_{i} \ne \bot \Longleftrightarrow \varPsi ^{s}_{i} = \mathtt {accept}\).

Partnering. To exclude trivial attacks in the security model, we need to define a notion of “partnering” of two oracles. Intuitively, this dictates which oracles can be corrupted without trivializing the security game. We define the notion of partnering via session-identifiers following the work of [15, 23]. Discussions on other possible choices of the definition for partnering is provide in Sect. 3.5.

Definition 3.1

(Partner Oracles). For any \((i, j, s, t) \in [\mu ]^2 \times [\ell ]^2\) with \(i \ne j\), we say that oracles \(\pi ^{s}_{i}\) and \(\pi ^{t}_{j}\) are partners if (1) \(\mathsf {Pid}^{s}_{i} = j\) and \(\mathsf {Pid}^{t}_{j} = i\); (2) \(\mathsf {role}^{s}_{i} \ne \mathsf {role}^{t}_{j}\); and (3) \(\mathsf {sid}^{s}_{i} = \mathsf {sid}^{t}_{j}\).

For correctness, we require that two oracles executing the AKE protocol faithfully (i.e., without adversarial interaction) derive identical session-identifiers. We also require that two such oracles reach the \(\mathtt {accept}\) state and derive identical session keys except with all but a negligible probability. We call a set \(S \subseteq ([\mu ] \times [\ell ])^{2}\) to have a valid pairing if the following properties hold:

  • For all \(((i, s), (j, t)) \in S\), \(i \le j\).

  • For all \((i, s) \in [\mu ] \times [\ell ]\), there exists a unique \((j, t) \in [\mu ] \times [\ell ]\) such that \(i \ne j\) and either \(((i, s), (j, t)) \in S\) or \(((j, t), (i, s)) \in S\).

In other words, a set with a valid pairing S partners off each oracle \(\pi ^{s}_{i}\) and \(\pi ^{t}_{j}\) in a way that the pairing is unique and no oracle is left out without a pair. We define correctness of an AKE protocol as follows.

Definition 3.2

(\((1-\delta )\)-Correctness). An AKE protocol \(\varPi _\mathsf {AKE}\) is \((1-\delta )\)-correct if for any set with a valid pairing \(S \subseteq ([\mu ] \times [\ell ])^{2}\), when we execute the AKE protocol faithfully between all the oracle pairs included in S, it holds that

$$ (1-\delta ) \le \Pr \left[ \begin{array}{l} \pi ^{s}_{i} \text { and }\pi ^{t}_{j} \text { are partners } \wedge \varPsi ^{s}_{i} = \varPsi ^{t}_{j} = \mathtt {accept}\\ \wedge \mathsf {k}^{s}_{i} = \mathsf {k}^{t}_{j} \ne \bot \text { for all }((i, s), (j, t)) \in S \end{array}\right] , $$

where the probability is taken over the randomness used in the oracles.

3.2 Security Game

We define security of an AKE protocol via the following game, denoted by \(G_{\varPi _\mathsf {AKE}}(\mu , \ell )\), played between an adversary \(\mathcal {A}\) and a challenger \(\mathcal {C}\). The security game is parameterized by two integers \(\mu \) (the number of honest parties) and \(\ell \) (the maximum number of protocol executions per party), and is run as follows:

  • Setup: \(\mathcal {C}\) first chooses a secret bit . Then \(\mathcal {C}\) generates the public parameter of \(\varPi _\mathsf {AKE}\) and \(\mu \) long-term key pair \(\left\{ (\mathsf {lpk} _{i}, \mathsf {lsk}_{i}) \mid i \in [\mu ] \right\} \), and initializes the collection of oracles \(\left\{ \pi ^{s}_{i} \mid i \in [\mu ], s \in [\ell ] \right\} \). \(\mathcal {C}\) runs \(\mathcal {A}\) providing the public parameter and all the long-term public keys \(\left\{ \mathsf {lpk} _{i} \mid i \in [\mu ] \right\} \) as input.

  • Phase 1: \(\mathcal {A}\) adaptively issues the following queries any number of times in an arbitrary order:

    • \(\mathsf {Send}(i, s, m)\): This query allows \(\mathcal {A}\) to send an arbitrary message m to oracle \(\pi ^{s}_{i}\). The oracle will respond according to the protocol specification and its current internal state. To start a new oracle, the message m takes a special form: \(\langle \mathtt {START}: \mathsf {role}{}, j \rangle \); \(\mathcal {C}\) initializes \(\pi ^{s}_{i}\) in the role \(\mathsf {role}\), having party \( P _{ j }\) as its peer, that is, \(\mathcal {C}\) sets \(\mathsf {Pid}^{s}_{i} \mathrel {\mathop :}=j\) and \(\mathsf {role}^{s}_{i} \mathrel {\mathop :}=\mathsf {role}\). If \(\pi ^{s}_{i}\) is an initiator (i.e., \(\mathsf {role}{} = \mathtt {init}{}\)), then \(\mathcal {C}\) returns the first message of the protocol.Footnote 10

    • \(\mathsf {RevLTK}(i)\): For \(i \in [\mu ]\), this query allows \(\mathcal {A}\) to learn the long-term secret key \(\mathsf {lsk}_{i}\) of party \( P _{ i }\). After this query, \( P _{ i }\) is said to be corrupted.

    • \(\mathsf {RegisterLTK}(i, \mathsf {lpk} _{i})\): For \(i \in \mathbb {N}\setminus [\mu ]\), this query allows \(\mathcal {A}\) to register a new party \( P _{ i }\) with public key \(\mathsf {lpk} _{i}\). We do not require that the adversary knows the corresponding secret key. After the query, the pair \((i, \mathsf {lpk} _{i})\) is distributed to all other oracles. Parties registered by \(\mathsf {RegisterLTK}\) are corrupted by definition.

    • \(\mathsf {RevState}(i, s)\): This query allows \(\mathcal {A}\) to learn the session-state \(\mathsf {state}^{s}_{i}\) of oracle \(\pi ^{s}_{i}\). After this query, \(\mathsf {state}^{s}_{i}\) is said to be revealed.

    • \(\mathsf {RevSessKey}(i, s)\): This query allows \(\mathcal {A}\) to learn the session key \(\mathsf {k}^{s}_{i}\) of oracle \(\pi ^{s}_{i}\).

  • Test: Once \(\mathcal {A}\) decides that Phase 1 is over, it issues the following special \(\mathsf {Test}\)-query which returns a real or a random key depending on the secret bit b.

    • \(\mathsf {Test}(i, s)\): If \((i, s) \notin [\mu ] \times [\ell ]\) or \(\varPsi ^{s}_{i} \ne \mathtt {accept}\), \(\mathcal {C}\) returns \(\bot \). Else, \(\mathcal {C}\) returns \(k_{b}\), where \(k_{0} \mathrel {\mathop :}=\mathsf {k}^{s}_{i}\) and (where \(\mathcal {K}\) is the session key space).

    After this query, \(\pi ^{s}_{i}\) is said to be tested.

  • Phase 2: \(\mathcal {A}\) adaptively issues queries as in Phase 1.

  • Guess: Finally, \(\mathcal {A}\) outputs a guess \(b' \in \{ 0,1 \} \). At this point, the tested oracle must be fresh. Here, an oracle \(\pi ^{s}_{i}\) with \(\mathsf {Pid}^{s}_{i} = j\)Footnote 11 is fresh if all the following conditions hold:

    1. 1.

      \(\mathsf {RevSessKey}(i, s)\) has not been issued;

    2. 2.

      if \(\pi ^{s}_{i}\) has a partner \(\pi ^{t}_{j}\) for some \(t \in [\ell ]\), then \(\mathsf {RevSessKey}(j, t)\) has not been issued;

    3. 3.

      \( P _{ i }\) is not corrupted or \(\mathsf {state}^{s}_{i}\) is not revealed;

    4. 4.

      if \(\pi ^{s}_{i}\) has a partner \(\pi ^{t}_{j}\) for some \(t \in [\ell ]\), then \( P _{ j }\) is not corrupted or \(\mathsf {state}^{t}_{j}\) is not revealed;

    5. 5.

      if \(\pi ^{s}_{i}\) has no partner oracle, then \( P _{ j }\) is not corrupted.

    If the tested oracle is not fresh, \(\mathcal {C}\) aborts the game and outputs a random bit \(b'\) on behalf of \(\mathcal {A}\). Otherwise, we say \(\mathcal {A}\) wins the game if \(b = b'\).

The advantage of \(\mathcal {A}\) in the security game \(G_{\varPi _\mathsf {AKE}}(\mu , \ell )\) is defined as \( \mathsf {Adv}^{\mathsf {AKE}}_{\varPi _\mathsf {AKE}}(\mathcal {A}) \mathrel {\mathop :}=\left| \Pr \left[ b = b'\right] - \frac{1}{2} \right| .\)

Definition 3.3

(Security of AKE Protocol). An AKE protocol \(\varPi _\mathsf {AKE}\) is secure if \(\mathsf {Adv}^{\mathsf {AKE}{}}_{\varPi _\mathsf {AKE}}(\mathcal {A})\) is negligible for any \(\mathsf {QPT}\) adversary \(\mathcal {A}\).

3.3 Security Properties

In this section, we explain the security properties captured by our security model. Comparison between other protocols is differed to Sect. 3.5.

The freshness clauses Items 1 and 2 imply that we only exclude the reveal of session keys for the tested oracle and its partner oracles. This captures key independence; if the revealed keys are different from the tested oracle’s key, then such keys must not enable computing the session key. Note that key independence implies resilience to “no-match attacks” presented by Li and Schäge [43]. This is because revealed keys have no information on the tested oracle’s key. Moreover, the two items capture implicit authentication between the involved parties. This is because an oracle \(\pi \) that computes the same session key as the tested oracle but disagrees on the peer would not be a partner of the tested oracle, and hence, an adversary can obtain the tested oracle’s key by querying the session key computed by \(\pi \). Specifically, our model captures resistance to unknown key-share (UKS) attacks [12]; a successful UKS attack is a specific type of attack that breaks implicit authentication where two parties compute the same session key but have different views on whom they are communicating with.

The freshness clauses Items 3 to 5 indicate that the game allows the adversary to reveal any subset of the four secret information—the long-term secret keys and the session-states of the two parties (where one party being the party defined by the tested oracle and the other its peer)—except for the combination where both the long-term secret key and session-state of one of the party is revealed. These clauses capture weak forward secrecy [39]: the adversary can obtain the long-term secret keys of both parties if it has been passive in the protocol run of the two oracles. Another property captured by our model is resistance to key-compromise impersonation (KCI) attacks [11]. Recall that KCI attacks are those where the adversary uses a party \( P _{ i }\)’s long-term secret key to impersonate other parties towards \( P _{ i }\). This is captured by our model because the adversary can learn the long-term secret key of a tested oracle without any restrictions. Most importantly, our model captures resistance to state leakage [15, 30, 39, 42] where an adversary is allowed to obtain session-states of both parties. We point out that our security model is strictly stronger than the recent models [20, 32] that do not allow the adversary to learn sessions-states. More discussion on state leakage is provided in Sect. 3.5.

3.4 Property for Signal-Conforming AKE: Receiver Obliviousness

In this work, we care for a specific type of (two-round) AKE protocol that is compatible with the X3DH protocol [45] used by the Signal protocol [1]. As explained in Sect. 1.2, the X3DH protocol can be viewed as a special type of AKE protocol where the Signal server acts as an (untrusted) bulletin board, where parties can store and retrieve information from. More specifically, the Signal server can be viewed as an adversary for an AKE protocol that controls the communication channel between the parties. When casting the X3DH protocol as an AKE protocol, one crucial property is that the first message of the initiator is generated independently of the communication partner. This is because, in secure messaging, parties are often offline during the key agreement so if the first message depended on the communication partner, then we must wait until they become online to complete the key agreement. Since we cannot send messages without agreeing on a session key, such an AKE protocol where the first message depends on the communication partner cannot be used as a substitute for the X3DH protocol.

We abstract this crucial yet implicit property achieved by the X3DH protocol as receiver obliviousness.Footnote 12

Definition 3.4

(Receiver Obliviousness/Signal-Conforming). An AKE protocol is receiver oblivious (or Signal-conforming) if it is two-rounds and the initiator can compute the first-message without knowledge of the peer id and long-term public key of the communication peer.

Many Diffie-Hellman type AKE protocols (e.g., the X3DH protocol used in Signal and some CSIDH-based AKE protocols [22, 38]) can be checked to be receiver oblivious. In contrast, known generic AKE protocols such as [30, 31, 34, 54, 55] are not receiver oblivious since the first message requires the knowledge of the receiver’s long-term public key.

3.5 Relation to Other Security Models

In the literature of AKE protocols, many security models have been proposed: the Bellare-Rogaway (BR) model [7], the Canetti-Krawczyk (CK) model [15], the CK+ model [30, 39], the extended CK (eCK) model [42], and variants therein [3, 20, 21, 32, 34, 35]. Although many of these security models are built based on similar motivations, there are subtle differences. (A comparison between our model and the models listed above can be found in the full version.)

4 Generic Construction of Signal-Conforming AKE \(\varPi _\mathsf {SC\text {-}AKE}\)

In this section, we propose a Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE}\) that can be used as a drop-in replacement for the X3DH protocol. Unlike the X3DH protocol, our protocol can be instantiated from post-quantum assumptions, and moreover, it also provides stronger security against state leakage. The protocol description is presented in Fig. 2. Details follow.

Fig. 2.
figure 2

Our Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE}\) .

Building Blocks. Our Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE} \) consists of the following building blocks.

  • \(\varPi _\mathsf {KEM} = (\mathsf {KEM}.\mathsf {Setup}, \mathsf {KEM}.\mathsf {KeyGen}, \mathsf {KEM}.\mathsf {Encap}, \mathsf {KEM}.\mathsf {Decap})\) is a KEM scheme that is \(\mathsf {IND\text {-}CCA}\) secure and assume we have \((1-\delta _\mathsf {KEM})\)-correctness.Footnote 13

  • \(\varPi _\mathsf{wKEM}= (\mathsf {wKEM}.\mathsf {Setup}, \mathsf {wKEM}.\mathsf {KeyGen}, \mathsf {wKEM}.\mathsf {Encap}, \mathsf {wKEM}.\mathsf {Decap})\) is a KEM schemes that is \(\mathsf {IND\text {-}CPA}\) secure (and not \(\mathsf {IND\text {-}CCA}\) secure) and assume we have \((1-\delta _\mathsf {wKEM})\)-correctness.

  • \(\varPi _{\mathsf{SIG}} = (\mathsf{SIG}.\mathsf{Setup}, \mathsf{SIG}.\mathsf{KeyGen}, \mathsf{SIG}.\mathsf{Sign}, \mathsf{SIG}.\mathsf{Verify})\) is a signature scheme that is \(\mathsf {EUF\text {-}CMA}\) secure and \((1-\delta _\mathsf{SIG})\)-correctness. We denote \(d\) as the bit length of the signature generated by \(\mathsf{SIG}.\mathsf{Sign}\).

  • \(\mathsf {F}: \mathcal {FK}{} \times \{ 0,1 \} ^{*} \rightarrow \{ 0,1 \} ^{\kappa + d}\) is a pseudo-random function family with key space \(\mathcal {FK}\).

  • \(\mathsf {Ext}: \mathcal {S}{} \times \mathcal {KS}{} \rightarrow \mathcal {FK}{}\) is a strong randomness extractor.

Public Parameters. All the parties in the system are provided with the following public parameters as input: \(( s , \mathsf {pp}{}_{\mathsf {KEM}}, \mathsf {pp}{}_{\mathsf {wKEM}}, \mathsf {pp}{}_{\mathsf{SIG}{}})\). Here, \( s {}\) is a random seed chosen uniformly from \(\mathcal {S}\), and \(\mathsf{pp}_\mathsf{X}\) for \(\mathsf{X} \in \left\{ \mathsf {KEM}, \mathsf {wKEM}, \mathsf{SIG} \right\} \) are public parameters generated by \(\mathsf{X.Setup}\).

Long-Term Public and Secret Keys. Each party \( P _{ i }\) runs \((\mathsf {ek}_{i}, \mathsf {dk}_{i}) \leftarrow \mathsf {KEM}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {KEM}})\) and \(({\mathsf{vk}}_{i}, {\mathsf{sk}}_{i}) \leftarrow \mathsf{SIG}.\mathsf{KeyGen}(\mathsf {pp}{}_{\mathsf{SIG}{}})\). Party \( P _{ i }\)’s long-term public key and secret key are set as \(\mathsf {lpk} _{i} = (\mathsf {ek}_{i}, {\mathsf{vk}}_{i})\) and \(\mathsf {lsk}_{i} = (\mathsf {dk}_{i}, {\mathsf{sk}}_{i})\), respectively.

Construction. A key exchange between an initiator \( P _{ i }\) in the s-th session (i.e., \(\pi ^s_i\)) and responder \( P _{ j }\) in the t-th session (i.e., \(\pi ^t_j\)) is executed as in Fig. 2. More formally, we have the following.

  1. 1.

    Party \( P _{ i }\) sets \(\mathsf {Pid}^s_{i} \mathrel {\mathop :}={} j\) and \(\mathsf {role}^s_{i} \mathrel {\mathop :}={} \mathtt {init}\). \( P _{ i }\) computes \((\mathsf {dk}_{T}, \mathsf {ek}_{T}) \leftarrow \mathsf {wKEM}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {wKEM}})\) and sends \(\mathsf {ek}_{T}\) to party \( P _{ j }\). \( P _{ i }\) stores the ephemeral decapsulation key \(\mathsf {dk}_{T}\) as the session-state i.e., \(\mathsf {state}^s_{i} \mathrel {\mathop :}={} \mathsf {dk}_{T}\).Footnote 14

  2. 2.

    Party \( P _{ j }\) sets \(\mathsf {Pid}^t_{j} \mathrel {\mathop :}={} i\) and \(\mathsf {role}^t_{j} \mathrel {\mathop :}={} \mathtt {resp}\). Upon receiving \(\mathsf {ek}_{T}\), \( P _{ j }\) first computes \((\mathsf {K}, \mathsf {C}) \leftarrow \mathsf {KEM}.\mathsf {Encap}(\mathsf {ek}_{i})\) and \((\mathsf {K}_{T}, \mathsf {C}_{T}) \leftarrow \mathsf {wKEM}.\mathsf {Encap}(\mathsf {ek}_{T})\). Then \( P _{ j }\) derives two PRF keys \(\mathsf {K}_{1} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}{})\), \(\mathsf {K}_{2} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}_{T})\). It then defines the session-identifier as \(\mathsf {sid}^t_{j} \mathrel {\mathop :}= P _{ i } \Vert P _{ j } \Vert \mathsf {lpk} _{i} \Vert \mathsf {lpk} _{j} \Vert \mathsf {ek}_{T} \Vert \mathsf {C}\Vert \mathsf {C}_{T}\) and computes \(\mathsf {k}_{j} \Vert \tilde{k}\leftarrow \mathsf {F}_{\mathsf {K}_{1}}(\mathsf {sid}_{j}) \oplus \mathsf {F}_{\mathsf {K}_{2}}(\mathsf {sid}_{j})\), where \(\mathsf {k}_{j} \in \{ 0,1 \} ^\kappa \) and \(\tilde{k}{} \in \{ 0,1 \} ^{d}\), and sets the session key as \(\mathsf {k}^{t}_{j} := \mathsf {k}_j\). \( P _{ j }\) then signs \(\sigma \leftarrow \mathsf{SIG}.\mathsf{Sign}{}({\mathsf{sk}}_{j}, \mathsf {sid}^{t}_{j})\) and encrypts it as \(\mathsf {c}\leftarrow \sigma \oplus \tilde{k}{}\). Finally, it sends \((\mathsf {C}, \mathsf {C}_{T}, \mathsf {c})\) to \( P _{ i }\) and sets \(\varPsi _{j} \mathrel {\mathop :}={} \mathtt {accept}{}\). Here, note that \( P _{ j }\) does not require to store any session-state, i.e., \(\mathsf {state}^t_{j} = \bot \).

  3. 3.

    Upon receiving \((\mathsf {C}, \mathsf {C}_{T}, \mathsf {c})\), \( P _{ i }\) first decrypts \(\mathsf {K}\leftarrow \mathsf {KEM}.\mathsf {Decap}(\mathsf {dk}_{i}, \mathsf {C})\) and \(\mathsf {K}_{T} \leftarrow \mathsf {wKEM}.\mathsf {Decap}(\mathsf {dk}_{T}, \mathsf {C}_{T})\), and derives two PRF keys \(\mathsf {K}_{1} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}{})\) and \(\mathsf {K}_{2} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}_{T})\). It then sets the session-identifier as \(\mathsf {sid}^s_{i} \mathrel {\mathop :}= P _{ i } \Vert P _{ j } \Vert \mathsf {lpk} _{i} \Vert \mathsf {lpk} _{j} \Vert \mathsf {ek}_{T} \Vert \mathsf {C}\Vert \mathsf {C}_{T}\) and computes \(\mathsf {k}_{i} \Vert \tilde{k}\leftarrow \mathsf {F}_{\mathsf {K}_{1}}(\mathsf {sid}_{i}) \oplus \mathsf {F}_{\mathsf {K}_{2}}(\mathsf {sid}_{i})\), where \(\mathsf {k}_{j} \in \{ 0,1 \} ^\kappa \) and \(\tilde{k}{} \in \{ 0,1 \} ^{d}\). \( P _{ i }\) then decrypts \(\sigma \leftarrow \mathsf {c}\oplus \tilde{k}{}\) and checks whether \(\mathsf{SIG}.\mathsf{Verify}({\mathsf{vk}}_{j}, \mathsf {sid}^{s}_{i}, \sigma ) = 1\) holds. If not, \( P _{ i }\) sets \((\varPsi _{i}, \mathsf {k}^s_i, \mathsf {state}_i) \mathrel {\mathop :}=(\mathtt {reject}, \bot , \bot )\) and stops. Otherwise, it sets \((\varPsi _{i}, \mathsf {k}^s_i, \mathsf {state}_i) \mathrel {\mathop :}=(\mathtt {accept}, \mathsf {k}_i, \bot )\). Here, note that \( P _{ i }\) deletes the session-state \(\mathsf {state}^s_{i} = \mathsf {dk}_{T}\) at the end of the key exchange.

Remark 4.1

(A Note on Session-State). The session-state of the initiator \( P _{ i }\) contains the ephemeral decryption key \(\mathsf {dk}_{T}\) and \( P _{ i }\) must store it until the peer responds. Any other information that is computed after receiving the message from the peer is immediately erased when the session key is established. In contrast, the responder \( P _{ j }\) has no session-state because the responder directly computes the session key after receiving the initiator’s message and does not have to store any session-specific information. That is, all states can be erased as soon as a session key is computed.

Remark 4.2

(Signed Prekeys). In the X3DH protocol, the initiator sends the first message with a signature attached called signed prekey. Informally, this allows Bob to explicitly authenticate Alice, while otherwise without the signature, Bob can only implicitly authenticate Alice. Moreover, this signature enhances the X3DH protocol to be perfect forward secret rather than being only weak forward secret, where the former allows the adversary to be active in the protocol run of the two oracles. Indeed, according to [45], the X3DH is considered to have perfect forward secrecy. We observe that adding such signature in our protocol has the same effect as long as the added signature is not included in the session-identifier. This is due to Li and Schäge [43, Appendix D], who showed that adding new messages to an already secure protocol cannot lower the security as long as the derived session keys and the session-identifiers remain the same as the original protocol. Here, note the latter implies that the partnering relation remains the same. Similarly, Cremers and Feltz [21] show that adding a signature to the exchanged messages can enhance weak forward secrecy to perfect forward secrecy for natural classes of AKE protocols.

Security. Correctness holds by a routine check. The following establishes the security or \(\varPi _\mathsf {SC\text {-}AKE}\). We provide a proof overview and refer the full proof to the full version.

Theorem 4.1

(Security of \(\varPi _\mathsf {SC\text {-}AKE}\) ). Assume \(\varPi _\mathsf{wKEM}\) is \(\mathsf {IND\text {-}CPA}\) secure, \(\varPi _\mathsf {KEM}\) is \(\mathsf {IND\text {-}CCA}\) secure, \(\varPi _{\mathsf{SIG}}\) is \(\mathsf {EUF\text {-}CMA}\) secure, and \(\mathsf {F}\) is secure PRF. Then \(\varPi _\mathsf {SC\text {-}AKE}\) is secure AKE protocol with respect to Definition 3.3.

Proof Sketch

Let \(\mathcal {A}\) be an adversary that plays the security game \(G_{\varPi _\mathsf {SC\text {-}AKE}}(\mu , \ell )\). We distinguish between all possible strategies that can be taken by \(\mathcal {A} \). Specifically, \(\mathcal {A} \)’s strategy can be divided into the eight types of strategies listed in Table 1.

Table 1. The strategy taken by the adversary in the security game when the tested oracle is fresh. “Yes” means the tested oracle has some (possibly non-unique) partner oracles and “No” means it has none. “ ” means the secret-key/session-state is revealed to the adversary, “ ” means the secret-key/session-state is not revealed. “-” means the session-state is not defined.

Here, each strategy is mutually independent and covers all possible (non-trivial) strategies. We point out that for our specific AKE construction we have \(\mathsf {state}_\mathtt {resp}:= \bot \) since the responder does not maintain any states (see Remark 4.1). Therefore, the Type-1 (resp. Type-3, Type-7) strategy is strictly stronger than the Type-2 (resp. Type-4, Type-8) strategy. Concretely, for our proof, we only need to consider the following four cases and to show that \(\mathcal {A}\) has no advantage in each cases: (a) \(\mathcal {A}\) uses the Type-1 strategy; (b) \(\mathcal {A}\) uses the Type-3 strategy; (c) \(\mathcal {A}\) uses the Type-5 or Type-6 strategy; (d) \(\mathcal {A}\) uses the Type-7 strategy.

In cases (a), (b) and (d), the session key is informally protected by the security properties of KEM, PRF, and randomness extractor. In case (a), since the ephemeral decapsulation key \(\mathsf {dk}_T\) is not revealed, \(\mathsf {K}_T\) is indistinguishable from a random key due to the \(\mathsf {IND\text {-}CPA}\) security of \(\varPi _\mathsf{wKEM}\). On the other hand, in case (b) and (d), since the initiator’s decapsulation key \(\mathsf {dk}_{\mathtt {init}}\) is not revealed, \(\mathsf {K}\) is indistinguishable from a random key due to the \(\mathsf {IND\text {-}CCA}\) security of \(\varPi _\mathsf {KEM}\). Here, we require \(\mathsf {IND\text {-}CCA}\) security because there are initiator oracles other than the tested oracle that uses \(\mathsf {dk}_{\mathtt {init}}\), which the reduction algorithm needs to simulate. This is in contrast to case (a) where \(\mathsf {dk}_T\) is only used by the tested oracle. Then, in all cases, since either \(\mathsf {K}_T\) or \(\mathsf {K}\) has sufficient high min-entropy from the view of the adversary, \(\mathsf {Ext}{}\) on input \(\mathsf {K}_T\) or \(\mathsf {K}\) outputs a uniformly random PRF key. Finally, we can invoke the pseudo-randomness of the PRF and argue that the session key in the tested oracle is indistinguishable from a random key.

In case (c), the session key is informally protected by the security property of the signature scheme. More concretely, in case (c), the tested oracle is an initiator and the signing key \({\mathsf{sk}}_{\mathtt {resp}}\) included in the long-term key of its peer is not revealed. Then, due to the \(\mathsf {EUF\text {-}CMA}\) security of \(\varPi _{\mathsf{SIG}}\), \(\mathcal {A}\) cannot forge the signature for the session-identifier of the tested oracle \(\mathsf {sid}_{\mathsf {test}}\). In addition, since the tested oracle has no partner oracles, no responder oracle ever signs \(\mathsf {sid}_{\mathsf {test}}\). Therefore, combining these two, we conclude that the tested oracle cannot be in the \(\mathtt {accept}\) state unless \(\mathcal {A} \) breaks the signature scheme. In other words, when \(\mathcal {A}\) queries \(\mathsf {Test}\), the tested oracle always returns \(\bot \). Thus the session key of the tested oracle is hidden from \(\mathcal {A}\).

5 Instantiating Post-quantum Signal-Conforming AKE \(\varPi _\mathsf {SC\text {-}AKE}\)

In this section, we present the implementation details of our post-quantum Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE}\). We take existing implementations of post-quantum KEMs and signature schemes submitted for the NIST PQC standardization. To instantiate our Signal-conforming AKE we pair variants of KEMs and signature schemes corresponding to the same security level. We consider security levels 1, 3 and 5 as defined by NIST for the PQC standardization. With more than 30 variants of KEM and 13 variants of signature schemes, we can create at least 128 different instantiations of post-quantum Signal-conforming AKE protocols. The provided implementation simulates post-quantum, weakly deniable authenticated key exchange between two entities. We study the efficiency of our instantiations through two metrics—the total amount of data exchanged between parties and run-time performance. Our implementation is available at the URL [41].

5.1 Instantiation Details

Our implementation is instantiated with the following building blocks:

  • \( s \): (pseudo)-randomly generated 32 bytes of data calculated at session initialization phase,

  • \(\mathsf {Ext}_{ s }\): uses HMAC-SHA256 as a strong randomness extractor. As an input message we use a key \(\mathsf {K}_{T}\) prepended with byte 0x02 which works as a domain separator (since we also use HMAC-SHA256 as a PRF). Security of using HMAC as a strong randomness extractor is studied in [28],

  • PRF: uses HMAC-SHA256 as a PRF. The session-specific \(\mathsf {sid}\) is used as an input message to HMAC, prepended with byte 0x01. An output from \(\mathsf {Ext}_{ s }\) is used as a key. Security of using HMAC as a PRF is studied in [4],

  • b: depends on the security level of the underlying post-quantum KEM scheme, where \(b \in \{128,192,256\}\),

  • d: depends on the byte length of the signature generated by the post-quantum signature scheme \(\varPi _{\mathsf{SIG}}\),

  • \(\varPi _\mathsf {KEM}\), \(\varPi _\mathsf{wKEM}\), \(\varPi _{\mathsf{SIG}}\): to instantiate \(\varPi _\mathsf {SC\text {-}AKE}\), implementation uses pairs of KEM and signature schemes. List of the schemes used can be found in the table (Table 2) below. We always use the same KEM scheme for \(\varPi _\mathsf {KEM}\) and \(\varPi _\mathsf{wKEM}\).

Table 2. Considered KEM and signature schemes under NIST security level 1, 3, and 5.

At a high level, the implementation is split into 3 main parts. The initiator’s ephemeral KEM key generation (offer function), the recipient’s session key generation (accept function), and initiator’s session key generation (finalize function). Additionally there is an initialization part which performs the generation and exchange of the long-term public keys as well as dynamic initialization of memory. To evaluate the computational cost of \(\varPi _\mathsf {SC\text {-}AKE}\), we instantiate it with concrete parameters as described above. The implementation runs 3 main functions in a loop for a fixed amount of time. We do not include the time spent in the initialization phase, hence the cost of key generation and memory initialization has no impact on the results.

Finally, we use an implementation of post-quantum algorithms that can be found in libOQSFootnote 15. We also use LibTomCryptFootnote 16 which provides an implementation of the building blocks HMAC, HKDF and SHA-256.

5.2 Efficiency Analysis

In this subsection, we provide an assessment of the costs related to running the concrete instantiation of \(\varPi _\mathsf {SC\text {-}AKE}\). We provide two metrics:

  • Communication cost: the amount of data exchanged between two parties trying to establish a session key.

  • Computational cost: number of CPU cycles spent in computation during session establishment by both parties.

The computational cost of the protocol depends on the performance of the cryptographic primitives used. More precisely, the most expensive operations are those done by the post-quantum schemes. \(\varPi _\mathsf {SC\text {-}AKE}\) performs 7 such operations during a session agreement: the initiator runs a KEM key generation, two KEM decapsulations and one signature verification, and the recipient performs two KEM encapsulations and one signing.

For benchmarking, we modeled a scenario in which two parties try to establish a session key. Alice generates and makes her long-term public key \(\mathsf {lpk} _\mathsf {A} \) and ephemeral KEM key \(\mathsf {ek}_{T}\) publicly available. Bob retrieves the pair (\(\mathsf {lpk} _\mathsf {A} \), \(\mathsf {ek}_{T}\)) and uses it to perform his part of the session establishment. Namely, Bob generates the triple (\(\mathsf {C}, \mathsf {C}_{T}, \mathsf {c}\)) and sends it to Alice along with its long-term public key \(\mathsf {lpk} _\mathsf {B} \). Upon receipt, Alice finalizes the process by computing the session key on her side. We note that in the case of the Signal protocol, both parties communicate with a server (e.g., the Signal server), and not directly. For simplicity, we abstract this fact out of our scenario. Further note that in the Signal protocol, the long-term public keys \(\mathsf {lpk} \) must be fetched from the server as the parties do not store the keys \(\mathsf {lpk} \) corresponding to those that they have not communicated with beforeFootnote 17

Table 3 provides the results for Round 3 candidates of the NIST PQC standardization process.Footnote 18 The CPU cycles column is related to the computational cost. It is the number of cycles needed on both the initiator and responder side to run the protocol for a given instantiation. We run benchmarking on the Intel Xeon E3-1220v3 @3.1 GhZ with Turbo Boost disabled. The last four columns relate to communication cost. They contain the byte size of the data exchanged during session key establishment. In particular, the \(\mathsf {lpk} \) column contains the size of the long-term public key. The \(\mathsf {ek}_{T}\) column contains the size of the ephemeral KEM key. The (\(\mathsf {C}, \mathsf {C}_{T}, \mathsf {c}\)) column is the size of the triple generated by Bob. Here, the amount of data transferred from Alice to Bob is the sum of \(\mathsf {lpk} \) and \(\mathsf {ek}_{T}\), while the amount of data transferred from Bob to Alice is the sum of \(\mathsf {lpk} \) and \(\mathsf {C}, \mathsf {C}_{T}, \mathsf {c}\). Finally, the column Total contains the total size of data exchanged between Alice and Bob.

Table 3. Computational and communication cost of running \(\varPi _\mathsf {SC\text {-}AKE}\) instantiated with various post-quantum schemes.

In a scenario as described above, instantiations with Falcon, Dilithium, Saber and Kyber schemes seem to be the most promising when it comes to computational cost. The communication cost can be minimized by using the SIKE scheme as \(\varPi _\mathsf {KEM}\) and \(\varPi _\mathsf{wKEM}\), but this significantly increases the computational cost.

We note that the computational cost is far less absolute as it depends on the concrete implementation of the post-quantum schemes. Our implementation is biased by the fact that it uses unoptimized, portable C code. There are two reasons for such a choice. First, our goal was to show the expected results on a broad number of platforms. Second, the libOQS library that we used does not provide hardware-assisted optimizations for all schemes, hence enabling those optimizations only for some algorithms would provide biased results.

Our implementation is based on open-source libraries, which makes it possible to perform fine-tuning and further analysis. For example, one could imagine a scenario for IoT devices that knows in advance which devices it may communicate with. Then, the long term keys of the devices can be exchanged prior to the session key establishment. In such a scenario, schemes with larger public keys may become more attractive since transferring long-term public keys could be done ahead of time.

Note on Low Quality Network Links. We anticipate \(\varPi _\mathsf {SC\text {-}AKE}\) to be used with handheld devices and areas with a poor quality network connection. In such cases, larger key, ciphertext and signature sizes generated may negatively impact the quality of the connection. Network packet loss is an additional factor which should be considered when choosing schemes for concrete instantiation.

Data on the network is exchanged in packets. The maximum transmission unit (MTU) defines the maximal size of a single packet, usually set to 1500 bytes. Ideally, the size of data sent between participants in a single pass is less than MTU. Network quality is characterized by a packet loss rate. When a packet is lost, the TCP protocol ensures that it is retransmitted, where each retransmission causes a delay. A typical data loss on a high-quality network link is below 1%, while data loss on a mobile network depends on the strength of the network signal.

Depending on the scheme used, increased packet loss may negatively impact session establishment time (see [47]). For example, a scheme instantiated with Falcon512/NTRU hps2048509 requires exchange of \(npacks=7\) packets over the network, where instantiation with SPHINCS-SHAKE256-128f-simple/Saber Light requires 16. Assuming increased packet rate loss of \(5\%\), the probability of losing a packet in the former case is \(1-(1-rate)^{npacks}=30\%\), where in the latter it is \(56\%\). In the latter case, at the median, every other session key establishment will experience packet retransmission and hence a delay.

6 Adding Deniability to Our Signal-Conforming AKE \(\varPi _\mathsf {SC\text {-}AKE}\)

In this section, we provide a theory-oriented discussion on the deniability aspect of our Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE}\) . In the following, we first informally show that \(\varPi _\mathsf {SC\text {-}AKE}\) already has a very weak form of deniability that may be acceptable in some applications. We then show that we can slightly modify \(\varPi _\mathsf {SC\text {-}AKE}\) to satisfy a more stronger notion of deniability. As it is common with all deniable AKE protocols secure against key-compromise attacks [24, 53, 57], we prove deniability by relying on strong knowledge-type assumptions , including a variant of the plaintext-awareness (PA) for the KEM scheme [5, 6, 8].

Weak Deniability of \(\varPi _\mathsf {SC\text {-}AKE}\) . Our Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}AKE}\) already satisfies a weak notion of deniability, where the communication transcript does not leave a trace of the two parties if both parties honestly executed the AKE protocol. Namely, an adversary that is passively collecting the communication transcript cannot convince a third party that communication between two parties took place. Informally, this can be observed by checking that all the contents in the transcript can be simulated by the adversary on its own. We discuss a stronger notion of deniability next.

6.1 Definition of Deniability and Tool Preparation

We follow a simplified definition of deniability for AKE protocols introduced by Di Raimondo et al. [24]. Discussion on the simplification is provided in Remark 6.2. Let \(\varPi \) be an AKE protocol and \(\mathsf {KeyGen}\) be the key generation algorithm. That is, for any integer \(\mu = \mu (\kappa )\) representing the number of parties in the system, define \(\mathsf {KeyGen} (1^\kappa , \mu ) \rightarrow (\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}})\), where \(\mathsf {pp} \) is the public parameter used by the system and \(\overrightarrow{\mathsf {lpk}}:= \left\{ \mathsf {lpk} _i \mid i \in [\mu ] \right\} \) and \(\overrightarrow{\mathsf {lsk}}:= \left\{ \mathsf {lsk}_i \mid i \in [\mu ] \right\} \) are the corresponding long-term public and secret keys of the \(\mu \) parties, respectively.

Let \(\mathcal {M}\) denote an adversary that engages in an AKE protocol with \(\mu \)-honest parties in the system with long-term public keys \(\overrightarrow{\mathsf {lpk}} \), acting as either an initiator or a responder. \(\mathcal {M}\) may run individual sessions against an honest party in a concurrent manner and may deviate from the AKE protocol in an arbitrary fashion. The goal of \(\mathcal {M}\) is not to impersonate someone to an honest party \( P _{ }\) but to collect (cryptographic) evidence that an honest party \( P _{ }\) interacted with \(\mathcal {M}\). Therefore, when \(\mathcal {M}\) interacts with \( P _{ }\), it can use a long-term public key \(\mathsf {lpk} _\mathcal {M} \) that can be either associated to or not to \(\mathcal {M}\) ’s identity (that may possibly be generated maliciously). We then define the view of the adversary \(\mathcal {M}\) as the entire sets of input and output of \(\mathcal {M}\) and the session keys computed in all the protocols in which \(\mathcal {M}\) participated with an honest party. Here, we assume in case the session is not completed by \(\mathcal {M} \), the session key is defined as \(\bot \). We denote this view as \(\mathsf {View}_{\mathcal {M}}(\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}})\).

In order to define deniability, we consider a simulator \(\mathsf {SIM}\) that simulates the view of honest parties (both initiator and responder) to the adversary \(\mathcal {M}\) without knowledge of the corresponding long-term secret keys \(\overrightarrow{\mathsf {lsk}}\) of the honest parties. Specifically, \(\mathsf {SIM}\) takes as input all the input given to the adversary \(\mathcal {M}\) (along with the description of \(\mathcal {M}\)) and simulates the view of \(\mathcal {M}\) with the real AKE protocol \(\varPi \). We denote this simulated view as \(\mathsf {SIM}_{\mathcal {M}}(\mathsf {pp}, \overrightarrow{\mathsf {lpk}})\). Roughly, if the view simulated by \(\mathsf {SIM}_{\mathcal {M}}\) is indistinguishable from those generated by \(\mathsf {View}_{\mathcal {M}}\), then we say the AKE protocol is deniable since \(\mathcal {M} \) could have run \(\mathsf {SIM}_{\mathcal {M}}\) (which does not take any secret information as input) to generate its view in the real protocol. More formally, we have the following.

Definition 6.1

(Deniability). We say an AKE protocol \(\varPi \) with key generation algorithm \(\mathsf {KeyGen} \) is deniable, if for any integer \(\mu = \mathsf {poly}(\kappa )\) and PPT adversary \(\mathcal {M}\), there exist a PPT simulator \(\mathsf {SIM} _\mathcal {M} \) such that the following two distributions are (computationally) indistinguishable for any PPT distinguisher \(\mathcal {D}\):

$$\begin{aligned} \mathcal {F}_\mathsf {Real}&:= \{ \mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \mathsf {View}_{\mathcal {M}}( \mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}}) : (\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}}) \leftarrow \mathsf {KeyGen} ( 1^\kappa , \mu ) \},\\ \mathcal {F}_\mathsf {Sim}&:= \{ \mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \mathsf {SIM}_{\mathcal {M}}( \mathsf {pp}, \overrightarrow{\mathsf {lpk}}) : (\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}}) \leftarrow \mathsf {KeyGen} ( 1^\kappa , \mu ) \}. \end{aligned}$$

When \(\mathcal {M}\) is semi-honest (i.e., it follows the prescribed protocol), we say \(\varPi \) is deniable against semi-honest adversaries. When \(\mathcal {M}\) is malicious (i.e., it takes any efficient strategy), we say \(\varPi \) is deniable against malicious adversaries.

Remark 6.1

(Including Public Information and Session Keys). It is crucial that the two distributions \(\mathcal {F}_\mathsf {Real}\) and \(\mathcal {F}_\mathsf {Sim}\) include the public information \((\mathsf {pp}, \overrightarrow{\mathsf {lpk}})\). Otherwise, \(\mathsf {SIM}_{\mathcal {M}}\) can simply create its own set of \((\mathsf {pp} ', \overrightarrow{\mathsf {lpk}} ', \overrightarrow{\mathsf {lsk}} ')\) and simulate the view to \(\mathcal {M} \). However, this does not correctly capture deniability in the real-world since \(\mathcal {M}\) would not be able to convince anybody with such a view using public information that it cooked up on its own. In addition, it is essential that the value of the session key is part of the output of \(\mathsf {SIM}_{\mathcal {M}}\). This guarantees that the contents of the sessions authenticated by the session key can also be denied.

Remark 6.2

(Comparison between Prior Definition). Our definition is weaker than the deniability notion originally proposed by Di Raimondo et al. [24]. In their definition, an adversary \(\mathcal {M}\) (and therefore the simulator \(\mathsf {SIM}_{\mathcal {M}}\)) is also provided as input some auxiliary information \(\mathsf {aux} \) that can depend non-trivially on \((\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}})\).Footnote 19 For instance, this allows to capture information that \(\mathcal {M}\) may have obtained by eavesdropping conversations between honest parties (which is not modeled by \(\mathsf {View}_{\mathcal {M}}\)). Since our goal is to provide a minimal presentation on the deniability of our protocol, we only focus on the weaker definition where \(\mathcal {M}\) does not obtain such auxiliary information. We leave it as future work to prove our protocol deniable in the sense of Di Raimondo et al. [24]. We also note that stronger forms of deniability are known and formalized in the universally composable (UC) model [25, 51, 52], however, AKE protocols satisfying such a strong deniability notion are known to achieve weaker security guarantees. For instance, as noted in [52], an AKE protocol cannot be on-line deniable while also being secure against KCI attacks.

Remark 6.3

(Extending to Malicious Quantum Adversaries). We only consider classical deniability above. Although we can show deniability for semi-honest quantum adversaries, we were not able to do so for malicious quantum adversaries. This is mainly due to the fact that to prove deniability against malicious classical adversaries, we require a strong knowledge type assumption (i.e., plaintext-awareness for KEM) that assumes an extractor can invoke the adversary multiple of times on the same randomness. We leave it as an interesting problem to formally define a set of tools that allow to show deniability even against malicious quantum adversaries.

Required Tools. To argue deniability in the following section we rely on the following tools: ring signature, plaintext-aware (\(\mathsf {PA\text {-}1}\)) secure KEM scheme, and a non-interactive zero-knowledge (\(\mathsf{NIZK}\)) argument.Footnote 20 We use standard notions of ring signatures and \(\mathsf{NIZK}\) arguments. On the other hand, we use a slightly stronger variant of \(\mathsf {PA\text {-}1}\) secure KEM schemes than those originally defined in [5, 6, 8]. Informally, a KEM scheme is \(\mathsf {PA\text {-}1}\) secure if for any adversary \(\mathcal {M} \) that outputs a valid ciphertext \(\mathsf {C}\), there is an extractor \(\mathsf {Ext}_\mathcal {M} \) that outputs the matching session key \(\mathsf {K}\). In our work, we require \(\mathsf {PA\text {-}1}\) security to hold even when \(\mathcal {M} \) is given multiple public keys rather than a single public key [46]. We note that although Di Raimondo et al. [24] considered the standard notion of \(\mathsf {PA\text {-}1}\) security, we observe that their proof only works in the case where multiple public keys are considered. Finally, we further require the extractor \(\mathsf {Ext}_\mathcal {M} \) to be efficiently computable given \(\mathcal {M} \).

6.2 Deniable Signal-Conforming AKE \(\varPi _\mathsf {SC\text {-}DAKE}\) Against Semi-Honest Adversaries

We first provide a Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}DAKE}\) that is deniable against semi-honest adversaries. The construction of \(\varPi _\mathsf {SC\text {-}DAKE}\) is a simple modification of \(\varPi _\mathsf {SC\text {-}AKE}\) where a standard signature is replaced by a ring signature. In the subsequent section, we show how to modify \(\varPi _\mathsf {SC\text {-}DAKE} \) to a protocol that is deniable against malicious adversaries by relying on further assumptions. The high-level idea presented in this section naturally extends to the malicious setting. An overview of \(\varPi _\mathsf {SC\text {-}DAKE}\) and \(\varPi '_\mathsf {SC\text {-}DAKE}\) is provided in Fig. 3.

Fig. 3.
figure 3

Deniable Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}DAKE}\) and \(\varPi '_\mathsf {SC\text {-}DAKE}\). The components that differ from the non-deniable protocol \(\varPi _\mathsf {SC\text {-}AKE}\) is indicated by a box. The protocol with (resp. without) the gray and dotted-box component satisfies deniability against malicious (resp. semi-honest) adversaries.

Building Blocks. Our deniable Signal-conforming AKE protocol \(\varPi _\mathsf {SC\text {-}DAKE}\) against semi-honest adversaries consists of the following building blocks.

  • \(\varPi _\mathsf {KEM} = (\mathsf {KEM}.\mathsf {Setup}, \mathsf {KEM}.\mathsf {KeyGen}, \mathsf {KEM}.\mathsf {Encap}, \mathsf {KEM}.\mathsf {Decap})\) is a KEM scheme that is \(\mathsf {IND\text {-}CCA}\) secure and assume we have \((1-\delta _\mathsf {KEM})\)-correctness.Footnote 21

  • \(\varPi _\mathsf{wKEM}= (\mathsf {wKEM}.\mathsf {Setup}, \mathsf {wKEM}.\mathsf {KeyGen}, \mathsf {wKEM}.\mathsf {Encap}, \mathsf {wKEM}.\mathsf {Decap})\) is a KEM schemes that is \(\mathsf {IND\text {-}CPA}\) secure (and not \(\mathsf {IND\text {-}CCA}\) secure) and assume we have \((1-\delta _\mathsf {wKEM})\)-correctness.

  • \(\varPi _\mathsf{RS} = (\mathsf {RS}.\mathsf {Setup}, \mathsf {RS}.\mathsf {KeyGen}, \mathsf {RS}.\mathsf {Sign}, \mathsf {RS}.\mathsf {Verify})\) is a ring signature scheme that is anonymous and unforgeable and assume we have \((1-\delta _\mathsf {RS})\)-correctness. We denote \(d\) as the bit length of the signature generated by \(\mathsf {RS}.\mathsf {Sign}\).

  • \(\mathsf {F}: \mathcal {FK}{} \times \{ 0,1 \} ^{*} \rightarrow \{ 0,1 \} ^{\kappa +d}\) is a pseudo-random function family with key space \(\mathcal {FK}\).

  • \(\mathsf {Ext}: \mathcal {S}{} \times \mathcal {KS}{} \rightarrow \mathcal {FK}{}\) is a strong randomness extractor.

Public Parameters. All the parties in the system are provided the following public parameters as input: \(( s , \mathsf {pp}_{\mathsf {KEM}}, \mathsf {pp}_{\mathsf {wKEM}}, \mathsf {pp}_{\mathsf {RS}})\). Here, \( s {}\) is a random seed chosen uniformly from \(\mathcal {S}\), and \(\mathsf{pp}_\mathsf{X}\) for \(\mathsf{X} \in \left\{ \mathsf {KEM}, \mathsf {wKEM}, \mathsf {RS} \right\} \) are public parameters generated by \(\mathsf{X.Setup}\).

Long-Term Public and Secret Keys. Each party \( P _{ i }\) runs \((\mathsf {ek}_{i}, \mathsf {dk}_{i}) \leftarrow \mathsf {KEM}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {KEM}})\) and \((\mathsf {vk}_{i}, \mathsf {sk}_{i}) \leftarrow \mathsf {RS}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {RS}{}})\). Party \( P _{ i }\)’s long-term public key and secret key are set as \(\mathsf {lpk} _{i} = (\mathsf {ek}_{i}, \mathsf {vk}_{i})\) and \(\mathsf {lsk}_{i} = (\mathsf {dk}_{i}, \mathsf {sk}_{i})\) , respectively.

Construction. A key exchange between an initiator \( P _{ i }\) in the s-th session (i.e., \(\pi ^s_i\)) and responder \( P _{ j }\) in the t-th session (i.e., \(\pi ^t_j\)) is executed as in Fig. 2. More formally, we have the following.

  1. 1.

    Party \( P _{ i }\) sets \(\mathsf {Pid}^s_{i} \mathrel {\mathop :}={} j\) and \(\mathsf {role}^s_{i} \mathrel {\mathop :}={} \mathtt {init}\). \( P _{ i }\) computes \((\mathsf {dk}_{T}, \mathsf {ek}_{T}) \leftarrow \mathsf {wKEM}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {wKEM}})\) and \((\mathsf {vk}_{T}, \mathsf {sk}_{T}) \leftarrow \mathsf {RS}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {RS}})\), and sends \((\mathsf {ek}_{T}, \mathsf {vk}_{T})\) to party \( P _{ j }\). \( P _{ i }\) erases the signing key \(\mathsf {sk}_{T}\) and stores the ephemeral decapsulation key \(\mathsf {dk}_{T}\) as the session-state i.e., \(\mathsf {state}^s_{i} \mathrel {\mathop :}={} \mathsf {dk}_{T}\).Footnote 22

  2. 2.

    Party \( P _{ j }\) sets \(\mathsf {Pid}^t_{j} \mathrel {\mathop :}={} i\) and \(\mathsf {role}^t_{j} \mathrel {\mathop :}={} \mathtt {resp}\). Upon receiving \((\mathsf {ek}_{T}, \mathsf {vk}_{T})\), \( P _{ j }\) first computes \((\mathsf {K}, \mathsf {C}) \leftarrow \mathsf {KEM}.\mathsf {Encap}(\mathsf {ek}_{i})\) and \((\mathsf {K}_{T}, \mathsf {C}_{T}) \leftarrow \mathsf {wKEM}.\mathsf {Encap}(\mathsf {ek}_{T})\) and derives two PRF keys \(\mathsf {K}_{1} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}{})\), \(\mathsf {K}_{2} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}_{T})\). It then defines the session-identifier as \(\mathsf {sid}^t_{j} \mathrel {\mathop :}= P _{ i } \Vert P _{ j } \Vert \mathsf {lpk} _{i} \Vert \mathsf {lpk} _{j} \Vert \mathsf {ek}_{T} \Vert \mathsf {vk}_T \Vert \mathsf {C}\Vert \mathsf {C}_{T}\) and computes \(\mathsf {k}_{j} \Vert \tilde{k}\leftarrow \mathsf {F}_{\mathsf {K}_{1}}(\mathsf {sid}_{j}) \oplus \mathsf {F}_{\mathsf {K}_{2}}(\mathsf {sid}_{j})\), where \(\mathsf {k}_{j} \in \{ 0,1 \} ^\kappa \) and \(\tilde{k}{} \in \{ 0,1 \} ^{d}\). \( P _{ j }\) sets the session key as \(\mathsf {k}^{t}_{j} := \mathsf {k}_j\). \( P _{ j }\) then signs \(\sigma \leftarrow \mathsf {RS}.\mathsf {Sign}{}(\mathsf {sk}_{j}, \mathsf {sid}^{t}_{j}, \left\{ \mathsf {vk}_{T}, \mathsf {vk}_{j} \right\} )\) and encrypts it as \(\mathsf {c}\leftarrow \sigma \oplus \tilde{k}{}\). Finally, it sends \((\mathsf {C}, \mathsf {C}_{T}, \mathsf {c})\) to \( P _{ i }\) and sets \(\varPsi _{j} \mathrel {\mathop :}={} \mathtt {accept}{}\). Here, note that \( P _{ j }\) does not require to store any session-state, i.e., \(\mathsf {state}^t_{j} = \bot \).

  3. 3.

    Upon receiving \((\mathsf {C}, \mathsf {C}_{T}, \mathsf {c})\), \( P _{ i }\) first decrypts \(\mathsf {K}\leftarrow \mathsf {KEM}.\mathsf {Decap}(\mathsf {dk}_{i}, \mathsf {C})\) and \(\mathsf {K}_{T} \leftarrow \mathsf {wKEM}.\mathsf {Decap}(\mathsf {dk}_{T}, \mathsf {C}_{T})\), and derives two PRF keys \(\mathsf {K}_{1} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}{})\) and \(\mathsf {K}_{2} \leftarrow \mathsf {Ext}_{ s }(\mathsf {K}_{T})\). It then sets the session-identifier as \(\mathsf {sid}^s_{i} \mathrel {\mathop :}= P _{ i } \Vert P _{ j } \Vert \mathsf {lpk} _{i} \Vert \mathsf {lpk} _{j} \Vert \mathsf {ek}_{T} \Vert \mathsf {vk}_T\Vert \mathsf {C}\Vert \mathsf {C}_{T}\) and computes \(\mathsf {k}_{i} \Vert \tilde{k}\leftarrow \mathsf {F}_{\mathsf {K}_{1}}(\mathsf {sid}_{i}) \oplus \mathsf {F}_{\mathsf {K}_{2}}(\mathsf {sid}_{i})\), where \(\mathsf {k}_{i} \in \{ 0,1 \} ^\kappa \) and \(\tilde{k}{} \in \{ 0,1 \} ^{d}\). \( P _{ i }\) then decrypts \(\sigma \leftarrow \mathsf {c}\oplus \tilde{k}{}\) and checks whether \(\mathsf {RS}.\mathsf {Verify}(\left\{ \mathsf {vk}_{T}, \mathsf {vk}_{j} \right\} , \mathsf {sid}^{s}_{i}, \sigma ) = 1\) holds. If not, \( P _{ i }\) sets \((\varPsi _{i}, \mathsf {k}^s_i, \mathsf {state}_i) \mathrel {\mathop :}=(\mathtt {reject}, \bot , \bot )\) and stops. Otherwise, \( P _{ i }\) sets \((\varPsi _{i}, \mathsf {k}^s_i, \mathsf {state}_i) \mathrel {\mathop :}=(\mathtt {accept}, \mathsf {k}_i, \bot )\). Here, note that \( P _{ i }\) deletes the session-state \(\mathsf {state}^s_{i} = \mathsf {dk}_{T}\) at the end of the key exchange.

Security. We first check that \(\varPi _\mathsf {SC\text {-}DAKE}\) is correct and secure as a standard AKE protocol. Since the proof is similar in most parts to the non-deniable protocol \(\varPi _\mathsf {SC\text {-}AKE}\), we defer the details to the full version. The main difference from the security proof of \(\varPi _\mathsf {SC\text {-}AKE}\) is that we have to make sure that using a ring signature instead of a standard signature does not allow the adversary to mount a key-compromise impersonation (KCI) attack (see Sect. 3.3 for the explanation on KCI attacks).

The following guarantees deniability of our protocol \(\varPi _\mathsf {SC\text {-}DAKE}\) against semi-honest adversaries.

Theorem 6.1

(Deniability of \(\varPi _\mathsf {SC\text {-}DAKE}\) against Semi-Honest Adversaries). Assume \(\varPi _\mathsf{RS} \) is anonymous. Then, the Signal-conforming protocol \(\varPi _\mathsf {SC\text {-}DAKE}\) is deniable against semi-honest adversaries.

Proof

Let \(\mathcal {M} \) be any PPT semi-honest adversary. We explain the behavior of the simulator \(\mathsf {SIM}_{\mathcal {M}}\) by considering three cases: (a) \(\mathcal {M} \) initializes an initiator \( P _{ i }\), (b) \(\mathcal {M} \) queries the initiator \( P _{ i }\) on message \((\mathsf {C}, \mathsf {C}_T, \mathsf {c})\), and (c) \(\mathcal {M} \) queries the responder \( P _{ j }\) on message \((\mathsf {ek}_T, \mathsf {vk}_T)\). In case (a), \(\mathsf {SIM}_{\mathcal {M}}\) runs the honest initiator algorithm and returns \((\mathsf {ek}_T, \mathsf {vk}_T)\) as specified by the protocol. In case (b), since \(\mathcal {M} \) is semi-honest, we are guaranteed that it runs the honest responder algorithm to generate \((\mathsf {C}, \mathsf {C}_T, \mathsf {c})\). In particular, since \(\mathcal {M} \) is run on randomness sampled by \(\mathsf {SIM}_{\mathcal {M}}\), \(\mathsf {SIM}_{\mathcal {M}}\) gets to learn the key \(\mathsf {K}\) that was generated along with \(\mathsf {C}\). Therefore, \(\mathsf {SIM}_{\mathcal {M}}\) runs the real initiator algorithm except that it uses \(\mathsf {K}\) extracted from \(\mathcal {M}\) rather than computing \(\mathsf {K}\leftarrow \mathsf {KEM}.\mathsf {Decap}(\mathsf {dk}_i, \mathsf {C})\). Here, note that \(\mathsf {SIM}_{\mathcal {M}}\) cannot run the latter since it does not know the corresponding \(\mathsf {dk}_i\) held by an honest initiator party \( P _{ i }\). In case (c), similarly to case (b), \(\mathsf {SIM}_{\mathcal {M}}\) learns \(\mathsf {dk}_T\) and \(\mathsf {sk}_T\) used by \(\mathcal {M} \) to generate \(\mathsf {ek}_T\) and \(\mathsf {vk}_T\). Therefore, \(\mathsf {SIM}_{\mathcal {M}}\) runs the honest responder algorithm except that it runs \(\sigma \leftarrow \mathsf {RS}.\mathsf {Sign}{}(\mathsf {sk}_{T}, \mathsf {sid}_{j}, \left\{ \mathsf {vk}_{T}, \mathsf {vk}_{j} \right\} )\) instead of running \(\sigma \leftarrow \mathsf {RS}.\mathsf {Sign}{}(\mathsf {sk}_{j}, \mathsf {sid}_{j}, \left\{ \mathsf {vk}_{T}, \mathsf {vk}_{j} \right\} )\) as in the real protocol. Here, note that \(\mathsf {SIM}_{\mathcal {M}}\) cannot run the latter since it does not know the corresponding \(\mathsf {sk}_j\) held by an honest responder party \( P _{ j }\).

Let us analyze \(\mathsf {SIM}_{\mathcal {M}}\). First, for case (a), the output by \(\mathsf {SIM}_{\mathcal {M}}\) is distributed exactly as in the real transcript. Next, for case (b), the only difference between the real distribution and \(\mathsf {SIM}_{\mathcal {M}}\)’s output distribution (which is the derived session key \(\mathsf {k}\)) is that \(\mathsf {SIM}_{\mathcal {M}}\) uses the KEM key \(\mathsf {K}\) output by \(\mathsf {KEM}.\mathsf {Encap}\) to compute the session key rather than using the KEM key decrypted using \(\mathsf {KEM}.\mathsf {Decap}\) with the initiator party \( P _{ i }\)’s decryption key \(\mathsf {dk}_i\). However, by \((1-\delta _\mathsf {KEM})\)-correctness of \(\varPi _\mathsf {KEM}\), these two KEM keys are identical with probability at least \((1-\delta _\mathsf {KEM})\). Hence, the output distribution of \(\mathsf {SIM}_{\mathcal {M}}\) and the real view are indistinguishable. Finally, for case (c), the only difference between the real distribution and \(\mathsf {SIM}_{\mathcal {M}}\)’s output distribution (which is the derived session key and the message sent \((\mathsf {C}, \mathsf {C}_T, \mathsf {c})\)) is how the ring signature is generated. While the real protocol uses the signing key \(\mathsf {sk}_j\) of the responder party \( P _{ j }\), the simulator \(\mathsf {SIM}_{\mathcal {M}}\) uses \(\mathsf {sk}_T\). However, the signatures outputted by these two distributions are computationally indistinguishable assuming the anonymity of \(\varPi _\mathsf{RS}\). Hence, the output distribution of \(\mathsf {SIM}_{\mathcal {M}}\) and the real view are indistinguishable.

Combining everything together, we conclude the proof.    \(\square \)

6.3 Deniable Signal-Conforming AKE \(\varPi '_\mathsf {SC\text {-}DAKE}\) Against Malicious Adversaries

We discuss security of our Signal-conforming AKE protocol \(\varPi '_\mathsf {SC\text {-}DAKE}\) against malicious adversaries. As depicted in Fig. 3, to achieve deniability against malicious adversaries, we modify the protocol so that the initiator party adds a \(\mathsf{NIZK}\) proof attesting to the fact that it constructed the verification key of the ring signature \(\mathsf {vk}_T\) honestly. Formally, we require the following additional building blocks.

Building Blocks. Our deniable Signal-conforming AKE protocol \(\varPi '_\mathsf {SC\text {-}DAKE}\) against malicious adversaries requires the following primitives in addition to those required by \(\varPi _\mathsf {SC\text {-}DAKE}\) in the previous section.

  • \(\varPi _\mathsf {KEM} = (\mathsf {KEM}.\mathsf {Setup}, \mathsf {KEM}.\mathsf {KeyGen}, \mathsf {KEM}.\mathsf {Encap}, \mathsf {KEM}.\mathsf {Decap})\) is an \(\mathsf {IND\text {-}CCA}\) secure KEM scheme as in the previous section that additionally satisfies \(\mathsf {PA_{\mu }\text {-}1}\) security with an efficiently constructible extractor, where \(\mu \) is the number of parties in the system.

  • \(\varPi _\mathsf{NIZK} = (\mathsf{NIZK}.\mathsf {Setup}, \mathsf{NIZK}.\mathsf {Prove}, \mathsf{NIZK}.\mathsf {Verify})\) is a \(\mathsf{NIZK}\) argument system for the relation \(\mathcal {R} _\mathsf {RS}\) where \((\mathsf {X}, \mathsf {W}) \in \mathcal {R} _\mathsf {RS}\) if and only if the statement \(\mathsf {X}= (\mathsf {pp}, \mathsf {vk})\) and witness \(\mathsf {W}= (\mathsf {sk}, \mathsf {rand})\) satisfy \((\mathsf {vk}, \mathsf {sk}) = \mathsf {RS}.\mathsf {KeyGen}(\mathsf {pp}; \mathsf {rand})\).

Additional Assumption. We require a knowledge-type assumption to prove deniability against malicious adversaries. Considering that all of the previous AKE protocols satisfying a strong form of security and deniability require such knowledge-type assumptions [24, 53, 57], this seems unavoidable. On the other hand, there are protocols achieving a strong form of deniability from standard assumptions [25, 51, 52], however, they make a significant compromise in the security such as being vulnerable to KCI attacks and state leakages.

The following knowledge assumption is defined similarly in spirit to those of Di Raimondo et al. [24] that assumed that for any adversary \(\mathcal {M} \) that outputs a valid MAC, then there exists an extractor algorithm \(\mathsf {Ext}\) that extracts the corresponding MAC key. Despite it being a strong knowledge-type assumption in the standard model, we believe it holds in the random oracle model if we further assume the \(\mathsf{NIZK}\) comes with an online knowledge extractorFootnote 23 like those provide by Fischlin’s \(\mathsf{NIZK}\) [27]. We leave it to future works to investigate the credibility of the following assumption and those required to prove deniability of the X3DH protocol [53].

Assumption 6.2

(Key-Awareness Assumption for \(\varPi '_\mathsf {SC\text {-}DAKE}\) ). We say that \(\varPi '_\mathsf {SC\text {-}DAKE} \) has the key-awareness property if for all PPT adversaries \(\mathcal {M} \) interacting with a real protocol execution in the deniability game as in Definition 6.1, there exists a PPT extractor \(\mathsf {Ext}_\mathcal {M} \) such that for any choice of \((\mathsf {pp}, \overrightarrow{\mathsf {lpk}}, \overrightarrow{\mathsf {lsk}}) \in \mathsf {KeyGen} ( 1^\kappa , \mu )\), whenever \(\mathcal {M} \) outputs a ring signature verification key \(\mathsf {vk}\) and a \(\mathsf{NIZK}\) proof \(\pi \) for the language \(\mathcal {L} _\mathsf {RS}\), then \(\mathsf {Ext}_\mathcal {M} \) taking input the same input as \(\mathcal {M}\) (including its randomness) outputs a signing key \(\mathsf {sk}\) such that \((\mathsf {vk}, \mathsf {sk}) \in \mathsf {RS}.\mathsf {KeyGen}(\mathsf {pp}_{\mathsf {RS}})\) for any \(\mathsf {pp}_{\mathsf {RS}} \in \mathsf {RS}.\mathsf {Setup}(1^\kappa )\).

With the added building blocks along with the key-awareness assumption, we prove the following theorem. The high-level approach is similar to the previous proof against semi-honest adversaries but the concrete proof requires is rather involved. The main technicality is when invoking the \(\mathsf {PA_{\mu }\text {-}1} \) security: if we do the reduction naively, the extractor needs the randomness used to sample the ring signature key pairs of the honest party but the simulator of the deniability game does not know such randomness. We circumvent this issue by hard-wiring the verification key of the ring signature of the adversary and considering \(\mathsf {PA_{\mu }\text {-}1} \) security against non-uniform adversary. The proof is presented in the full version.

Theorem 6.3

(Deniability of \(\varPi '_\mathsf {SC\text {-}DAKE}\) against Malicious Adversaries). Assume \(\varPi _\mathsf {KEM} \) is \(\mathsf {PA_{\mu }\text {-}1} \) secure with an efficiently constructible extractor, \(\varPi _\mathsf{RS} \) is anonymous, \(\varPi _\mathsf{NIZK} \) is sound,Footnote 24 and the key-awareness assumption in Assumption 6.2 holds. Then, the Signal-conforming protocol \(\varPi '_\mathsf {SC\text {-}DAKE}\) with \(\mu \) parties is deniable against malicious adversaries.

Finally, we show \(\varPi '_\mathsf {SC\text {-}DAKE}\) is correct and secure as a standard Signal-conforming AKE protocol in the full version.