Secure Primitive for Big Data Utilization
 3.9k Downloads
Abstract
In this chapter, we describe two security primitives for big data utilization. One is a privacypreserving data integration among databases distributed in different organizations. This primitive integrates the same data among databases kept in different organizations while keeping any different data in an organization secret to other organizations. Another is a privacypreserving classification. This primitive executes a procedure for server’s classification rule to client’s input database and outputs only the result to the client while keeping the client’s input database secret to the server and server’s classification rule to the client. These primitives can be executed not only independently but also jointly. That is, after we integrate databases from distributed organization by executing the privacypreserving data integration, we can execute a privacypreserving classification.
3.1 PrivacyPreserving Data Integration
3.1.1 Introduction
Medical organizations often store the data accumulated through medical analyses. However, detailed data analysis sometimes requires separate datasets to be integrated without violating patient or commercial privacy. Consider the scenario in which the occurrence of similar accidents can be attributed to a particular defective product. Such defective products should be identified as quickly as possible. However, the databases related to accidents are maintained separately by different organizations. Thus, investigating the causes of accidents is often timeconsuming. For example, assume child A has broken her/his leg at school, but it is not clear whether the accident was caused by defective equipment. In this case, information relating to A’s injury, such as the patient’s name and type of injury, is stored in hospital database \(S_{1}\). Information pertaining to A’s accident, such as their name and the location of the swing at the school, is stored in database \(S_{2}\), which is held by the fire department. Finally, information relating to the insurance claim following A’s accident, such as the name and medical costs, is maintained in the insurance company’s database, \(S_{3}\). Computing the intersection of these databases, \(S_{1} \cap S_{2} \cap S_{3}\), without compromising privacy would enable us to combine the separate sets of information, which may allow the cause of the accident to be identified. Let us consider another situation. Several clinics, denoted as \(\mathsf{P}_i\), maintain separate databases, represented as \(S_{i}\). The clinics wish to know the patients they have in common to enable them to share treatment details; however, \(\mathsf{P}_i\) should not be able to access any information about patients not stored in their own dataset. In this case, the intersection of the set must not reveal private information.
These examples illustrate the need for the Multiparty Private Set Intersection (MPSI) protocol [1, 2, 3, 4]. MPSI is executed by multiple parties who jointly compute the intersection of their private datasets. Ultimately, only designated parties can access this intersection. Previous protocols are impractical because the bulk of the computation depends on the number of players. One previous study required the size of the datasets maintained by the different players to be equal [1, 2]. Another study [3] computed only the approximate number of intersections, whereas other researchers [4] required more than two trusted thirdparties.
In this section, we propose a practical MPSI with the following features:
1. The size of the datasets maintained by each party is independent of those maintained by the other parties.
2. The computational complexity for each party is independent of the number of parties. This is accomplished by introducing an outsourcing provider, \(\mathcal{O}\). In fact, all computations related to the number of parties are carried out by \(\mathcal{O}\). Thus, the number of parties is irrelevant.
3.1.2 Preliminaries
In this section, we summarize the DDH assumption, Bloom filter, and ElGamal encryption. We consider security according to the honestbutcurious model [5]: all players act according to their prescribed actions in the protocol. A protocol that is secure in an honestbutcurious model does not allow any player to gain information about other players’ private input sets, besides that that can be deduced from the result of the protocol. Note that the term adversary here refers to insiders, i.e., protocol participants. Outsider adversaries are not considered. In fact, behavior by outsider adversaries can be mitigated via standard network security techniques.
Our protocol is based on the following security assumption.
Definition 3.1
(DDH Assumption) Let t be a security parameter. A decisional Diffie–Hellman (DDH) parameter generator \(\mathcal {IG}\) is a probabilistic polynomial time (ppt) algorithm, a finite field \({\mathbb F}_{p}\), and a basepoint \(g \in {\mathbb F}_{p}\) with prime order q. We say that \(\mathcal {IG}\) satisfies the DDH assumption if \(\left p_1p_2\right \) is negligible (in \(\kappa \)) for all ppt algorithms A, where \(p_1={\small \Pr } [ ({\mathbb F}_{p}, g) \leftarrow \mathcal {IG}(1^{\kappa }); y_1=g^{x_1}, y_2= g^{x_2} \leftarrow {\mathbb F}_{p}: A({\mathbb F}_{p}, g, y_1, y_2, g^{x_1x_2}) = 0]\) and \(p_2={\small \Pr } [ ({\mathbb F}_{p}, g) \leftarrow \mathcal {IG}(1^{\kappa }); y_1=g^{x_1}, y_2= g^{x_2}, z \leftarrow {\mathbb F}_{p}: A({\mathbb F}_{p}, g, y_1, y_2, z) = 0]\).
A Bloom filter [6], denoted by \(\mathsf{BF}\), consists of m arrays and has a spaceefficient probabilistic data structure. The \(\mathsf{BF}\) can check whether an element x is included in a set S by encoding S with at most w elements. The encoded Bloom filter of S is denoted by \(\mathsf{BF}(S)\).
The \(\mathsf{BF}\) uses a set of k independent uniform hash functions \(\mathcal {H}= \left\{ H_0, \ldots , H_{k1} \right\} \), where \(H_i:\{0, 1 \}^* \longrightarrow \{ 0,1, \ldots , m1 \}\) for \(0 \le \forall i \le k1\). The \(\mathsf{BF}\) consists of two functions: \(\mathsf{Const}\) embeds a given set S into \(\mathsf{BF}(S)\) and \(\mathsf{ElementCheck}\) checks whether an element x is included in S. \(\mathsf{SetCheck}\), an extension of \(\mathsf{ElementCheck}\), checks whether an element x in \(S'\) is in \(S' \cap S\) (see Algorithm 3.3). In \(\mathsf{Const}\) (see Algorithm 3.1), \(\mathsf{BF}(S)\) is constructed for a given set S by first setting all bits in the array to 0. To embed an element \(x \in S\) into the filter, the element is hashed using k hash functions to obtain k index numbers, and the bits at these indexes are set to 1, i.e., set \(\mathsf{BF}\) \([H_i(x)] = 1\) for \(0 \le i \le k1\). In \(\mathsf{ElementCheck}\) (see Algorithm 3.2), we check all locations where x is hashed; x is considered to be not in S if any bit at these locations is 0; otherwise, x is probably in S.
Homomorphic encryption under addition is useful for processing encrypted data. A typical homomorphic encryption under addition was proposed by Paillier [8]. However, because Paillier encryption cannot reduce the order of a composite group, it is computationally expensive compared with the following ElGamal encryption. Our protocol requires matching without revealing the original messages, for which exponential ElGamal encryption (exElGamal) is sufficient [9]. In fact, the decrypted results of exElGamal encryption can distinguish whether two messages \(m_1\) and \(m_2\) are equal, although the exElGamal scheme cannot decrypt messages itself. Furthermore, exElGamal can be used in (n, n)threshold distributed decryption [10], where the decryption must be performed by all players acting together. An exElGamal encryption with (n, n)threshold distributed decryption consists of three functions:
Key generation:
Let \({\mathbb F}_{p}\) be a finite field, \(g \in {\mathbb F}_{p}\), with prime order q. Each player \(\mathsf{P}_i\) chooses \(x_i \in {\mathbb Z}_{q}\) at random and computes \(y_i=g^{x_i} \pmod {p}\). Then, \(y=\prod _{i=1}^{n}y_i \pmod {p}\) is a public key and each \(x_i\) is a share for each player to decrypt a ciphertext.
Encryption: \(\mathsf{thrEnc}[m] \rightarrow (u,v)\)
Let \( m \in \mathbb {Z}_{q}^{*}\) be a message. Choose \(r \in {\mathbb Z}_{q}\) at random, and compute both \(u=g^r \pmod {p}\) and \(v=g^my^r \pmod {p}\) for the input message \(m \in {\mathbb Z}_{q}\) and a public key y. Output (u, v) as a ciphertext of m.
Decryption: \(\mathsf{thrDec}[(u,v)] \rightarrow g^m\)
Each player \(\mathsf{P}_i\) computes \(z_i = u^{x_i} \pmod {p}\). All players then compute \(z = \prod _{i=1}^{n} z_i \pmod {p}\) jointly.^{1} Finally, each player can decrypt the ciphertext as \(g^m = v/z \pmod {p}\).
ExElGamal encryption with (n, n)threshold decryption has the following features:
(1) homomorphic under addition: \(\mathsf{Enc}(m_1) \mathsf{Enc}(m_2)=\mathsf{Enc}(m_1 + m_2)\) for messages \(m_1, m_2 \in {\mathbb Z}_{p}\).
(2) homomorphic under scalar operations: \(\mathsf{Enc}(m)^k = \mathsf{Enc}(km)\) for a message m and \(k \in {\mathbb Z}_{q}\).
3.1.3 Previous Work
This section summarizes prior works on PSI between a server and a client and MPSI among n players. In PSI, let \(S=\{s_1,\ldots ,s_v\}\) and \(C=\{c_1,\ldots ,c_w\}\) be server and client datasets, respectively, where \(S=v\) and \(C=w\). In MPSI [1], we assume that each player holds the same number of datasets.
PSI protocol based on polynomial representation: The main idea is to represent the elements in C as the roots of a polynomial. The encrypted polynomial is sent to the server, where it is evaluated on the elements in S, as originally proposed by Freedman [11]. This is secure against honestbutcurious adversaries under secure public key encryption. The computational complexity is O(vw) exponentiations, and the communication overhead is \(O(v+w)\). The computational complexity can be reduced to \(O(v \log \log w)\) exponentiations using the balanced allocation technique [12]. Kissner and Song extended this protocol to MPSI [1], which requires \(O(nw^2)\) exponentiations and O(nw) communication overhead. The MPSI version is secure against honestbutcurious and malicious adversaries (in the random oracle model) using generic zeroknowledge proofs.
PSI protocol based on DHkey agreement: The main objective here is to apply the DHkey agreement protocol [13]: after representing the server and client datasets as hash values \(\{h(s_i)\}\) and \(\{h(c_i)\}\), respectively, the client encrypts the dataset as \(\{h(c_i)^{r_i}\}\) using a random number \(r_i\) and sends the encrypted set to the server. The server encrypts the client set \(\{h(c_i)^{r_i}\}\) and the server set \(\{h(s_i)\}\) using a random number r, which gives \(\{h(c_i)^{rr_i}\}\) and \(\{h(s_i)^{r}\}\), respectively, and returns these sets to the client. Finally, the client evaluates \(S \cap C\) by decrypting to \(\{h(c_i)^{r}\}\). This is secure against honestbutcurious adversaries under the DDH assumption. The total computational complexity is \(O(v+w)\) exponentiations, and the total communication overhead is \(O(v+w)\). The security of this approach can be enhanced against malicious adversaries in the random oracle model [14] by using a blind signature. However, no extensions to MPSI based on the DHkey agreement protocol have been proposed.
PSI protocol based on BF : This protocol was originally proposed in [4]. As the Bloom filter itself reveals information about the other player’s dataset, the set of players is separated into two groups: input players who have datasets and privacy players who perform private computations under shared secret information. In [15], the privacy of each player’s dataset is protected by encrypting each array of the Bloom filter using Goldwasser–Micali encryption [16]. In an honestbutcurious version, the computational complexity is O(kw) hash operations and O(m) public key operations, and the communication overhead is O(m), where m and k are the number of arrays and hash functions, respectively, used in the Bloom filter. The Bloom filter is used in the Oblivious transfer extension [17, 18] and the newly constructed garbled Bloom filter [19]. The main novelty in the garbled Bloom filter is that each array requires \(\lambda \) bits rather than the single bit needed for the conventional Bloom filter. To embed an element \(x \in S\) to a garbled Bloom filter, x is split into k shares with \(\lambda \) bits using XORbased secret sharing \((x=x_1 \bigoplus \cdots \bigoplus x_k)\). The \(x_i\) are then mapped to an index of \(H_i(x)\). An element y is queried by subjecting all bit strings at \(H_i(y)\) to an XOR operation. If the result is y, then y is in S; otherwise, y is not in S. The client uses a Bloom filter \(\mathsf{BF}(C)\), and the server uses a garbled Bloom filter \(\mathsf{GBF}(S)\). If x is in \(C \cap S\), then for every position i it hashes to, \(\mathsf{BF}(C)[i]\) must be 1 and \(\mathsf{GBF}(S)[i]\) must be \(x_i\). Thus, the client can compute \(C \cap S\). The computational complexity of this method is O(kw) hash operations and O(m) public key operations, and the communication overhead is O(m). The number of public key operations can be changed to \(O(\lambda )\) using the Oblivious transfer extension. This is secure against honestbutcurious adversaries if the Oblivious transfer protocol is secure. Finally, some researchers have computed the approximate number of multiparty set unions [3].
3.1.4 Practical MPSI
This section presents a practical MPSI that is secure under the honestbutcurious model.
3.1.4.1 Notation and Privacy Definition
In the remainder of this paper, the following notations are used.

\(\mathsf{P}_i\): ith player, \(i = 1, \ldots , n\)

\(\mathcal{O}\): outsourcing provider with no knowledge of the inputs or outputs

\(S_i = \{ s_{i,1}, s_{i, 2},\ldots , s_{i, w_i} \}\): dataset held by \(\mathsf{P}_i\), where \(S_i = \omega _i\)

\(\cap S_j\): intersection of all n players

\(\mathsf{thrEnc}\) and \(\mathsf{thrDec}\): (n, n)threshold exElGamal encryption and decryption, respectively

m and k: number of arrays and hashes used in \(\mathsf{BF}\)

\(\varvec{\ell }=[\ell , \ldots , \ell ]\) (\(1 \le \ell \le n\)): an ndimensional array, where all strings in the array are set to \(\ell \)

\(\mathsf{BF}(S_i)= [\mathsf{BF}_{i}[0], \ldots , \mathsf{BF}_{i}[m1]]\): Bloom filter applied to a set \(S_i\)

\(\mathsf{IBF}(\cap S_i)=[ \sum _{i=1}^{n} \mathsf{BF}_i[0], \ldots , \sum _{i=1}^{n} \mathsf{BF}_i[m1]]\): integrated Bloom filter of n sets \(\{S_i\}\), where \(\sum _{i=1}^{n} \mathsf{BF}_i[j]\) is the sum of all players’ arrays
We introduce an outsourcing provider \(\mathcal{O}\) to reduce the computational burden on all players. The dealer has no information regarding the elements of any player’s set. The privacy issues faced by MPSI with an outsourcing provider can be informally written as follows.
Definition 3.2
\(\mathsf{P}_i\) does not learn anything about the elements of other players’ datasets except for the elements that \(\mathsf{P}_i\) originally possesses.
the outsourcing provider \(\mathcal{O}\) does not learn anything about the elements of any player’s set.
3.1.4.2 Proposed MPSI
Our MPSI comprises four phases: (i) initialization, (ii) Bloom filter construction and the encryption of \(\mathsf{P}_i\) data, (iii) the \(\mathcal{O}\)’s randomization of \(\mathsf{thrEnc}(\mathsf{IBF}(\cup S_i) \mathbf {n})\), and (iv) the computation of \(\cap \mathsf{P}_i\). The computation of \(\cap \mathsf{P}_i\) consists of three steps: (a) joint decryption of an (n, n)threshold exElGamal among n players, (b) Bloom filter check, and (c) output intersection.
Our protocol proceeds as follows.
 1.
\(\mathsf{P}_i\) generates \(x_i \in {\mathbb Z}_{q}\), computes \(y_i=g^{x_i} \in {\mathbb Z}_{q}\), and publishes \(y_i\) to the other players as a public key, where the corresponding secret key is \(x_i\).
 2.
\(\mathsf{P}_i\) computes \(y=\prod _i y_i\), where y is the nplayer public key. Note that no player knows the corresponding secret key \(x = \sum x_i\) before executing the joint decryption.
 1.
\(\mathsf{P}_i\) executes \(\mathsf{Const}(S_i) \longrightarrow \mathsf{BF}(S_i)=[\mathsf{BF}_i[0], \ldots , \mathsf{BF}_i[m1]]\) (Algorithm 3.1).
 2.\(\mathsf{P}_i\) encrypts \(\mathsf{BF}(S_i)  \varvec{1}\) using \(\mathsf{thrEnc}_y\):where y is an nplayer public key.$$ \mathsf{thrEnc}_y(\mathsf{BF}(S_i)  \varvec{1}) =[\mathsf{thrEnc}_y(\mathsf{BF}_i[0] 1), \ldots , \mathsf{thrEnc}_y(\mathsf{BF}_i[m1]1)], $$
 3.
\(\mathsf{P}_i\) sends \(\mathsf{thrEnc}_y(\mathsf{BF}(S_i)  \varvec{1})\) to \(\mathcal{O}\).
 1.\(\mathcal{O}\) encrypts \(\mathsf{IBF}(\cap S_i) \mathbf {n}\) without knowing \(\mathsf{IBF}(\cap S_i)\) using an additive homomorphic feature and multiplying by \(\mathsf{thrEnc}_y(\mathsf{BF}(S_i) \varvec{1})\) as follows:$$ \mathsf{thrEnc}_y(\mathsf{IBF}(\cap S_i) \mathbf {n}) = \prod _{i=1}^{n} \mathsf{thrEnc}_y(\mathsf{BF}(S_i) \varvec{1}). $$
 2.\(\mathcal{O}\) randomizes \(\mathsf{thrEnc}_y(\mathsf{IBF}(\cap S_i) \mathbf {n})\) by \(\mathbf {r}= [r_0, \ldots , r_{m1}] \in {\mathbb Z}_{q}^m\):$$ \mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \mathbf {n})) =(\mathsf{thrEnc}_y(\mathsf{IBF}(\cup S_i)  \mathbf {n}))^{\mathbf {r}}. $$
 3.
\(\mathcal{O}\) broadcasts \(\mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \mathbf {n}))\) to \(\mathsf{P}_i\).
 1.
All players decrypt \(\mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \mathbf {n}))\) jointly.
 2.
\(\mathsf{P}_i\) computes \(\mathsf{SetCheck}(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \mathbf {n}), S_i)\) and obtains \(\cap S_i\).
The above protocol satisfies the correctness requirement. This is because each array position of \(\mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \mathbf {n}))\) is decrypted to 1, where \(x \in \cap S_i\) is embedded by each hash function; however, each array position for which \(x \not \in \cap S_i\) is embedded by each hash function is decrypted to a random value.
3.1.4.3 Security Proof
The security of our MPSI protocol is as follows.
Theorem 3.1
For any coalition of fewer than n players, the MPSI is playerprivate against an honestbutcurious adversary under the DDH assumption.
Proof
 1.
Set nplayer public key \(y = \overline{g}^{\beta }\) and choose random numbers \(d_0,\ldots ,d_{m1}\) and \(r_1,\ldots ,r_{m1}\) from \({\mathbb Z}_{q}\).
 2.
Send \([(\overline{g}^\alpha , \overline{g}^{d_0} \cdot \overline{g}^\gamma ), ((\overline{g}^\alpha )^{r_1},\overline{g}^{d_1} \cdot (\overline{g}^\gamma )^{r_1}), \ldots , ((\overline{g}^\alpha )^{r_{m1}},\overline{g}^{d_{m1}} \cdot (\overline{g}^\gamma )^{r_{m1}}) ]\) as \(\overline{\mathsf{thrEnc}_y(\mathsf{BF}_{m,k}(S_i))}\) to \(\mathcal {D}\).
If \((\overline{g}, \overline{g}^\alpha , \overline{g}^\beta , \overline{g}^\gamma )\) is a DHkeyagreementprotocol element, i.e., \(\gamma =\alpha \beta \), then \(\overline{\mathsf{thrEnc}_y(\mathsf{BF}_{m,k}(S_i))}\) is distributed in the same way as when constructed by the MPSI scheme. Thus, \(\mathcal {D}\) must output 1. If \((\overline{g}, \overline{g}^\alpha , \overline{g}^\beta , \overline{g}^\gamma )\) is not a DH tuple, then \(\overline{\mathsf{thrEnc}_y(\mathsf{BF}_{m,k}(S_i))}\) is randomly distributed, and \(\mathcal {D}\) has to output 0. Therefore, \(\overline{\text{ SIM }}\) can use the output of \(\mathcal {D}\) to respond to the DDH challenge correctly. Therefore, \(\mathcal {D}\) can answer correctly with negligible advantage over random guessing. Furthermore, as all inputs of each player are encrypted until the decryption is performed, and decryption cannot be performed by fewer than n players, nothing can be learned by any player prior to decryption.
As for the views of \(\mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}_{m,k}(\cap S_i)  \mathbf {n}))\), the same argument holds. Therefore, for any coalition of fewer than n players, MPSI is playerprivate under the honestbutcurious model.
Next, we present dandover MPSI. The procedures of dandover MPSI are the same as those of MPSI until \(\mathcal{O}\) computes \(\mathsf{thrEnc}_y(\mathsf{IBF}(\cap S_i))\). Thus, we describe the procedure after \(\mathcal{O}\) computes \(\mathsf{thrEnc}_y(\mathsf{IBF}(\cap S_i))\).
 1.
Encrypt \(\mathsf{IBF}(\cap S_i) \varvec{\ell }\) randomized by \(\mathbf {r}= [r_0, \ldots , r_{m1}] \in {\mathbb Z}_{q}^m (d \le \ell \le n)\): \( \mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \varvec{\ell })) =(\mathsf{thrEnc}_y(\mathsf{IBF}(\cap S_i)) \cdot \mathsf{thrEnc}_y (\varvec{\ell }))^{\mathbf {r}}.\)
 2.
Broadcast \(\{ \mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \varvec{\ell })) \}_{\ell }\) (\(d \le \ell \le n\)) to \(\mathsf{P}_i\).
 1.
All \(\mathsf{P}_i\) jointly decrypt \(\{ \mathsf{thrEnc}_y(\mathbf {r}(\mathsf{IBF}(\cap S_i)  \varvec{\ell })) \}_{\ell }\).
 2.
Let \(\mathsf{CBF}_{\ell }\) be an marray for \(d \le \ell \le n\), where an array is set to 1 if and only if the corresponding array of \(\mathbf {r}\mathsf{IBF}(\cap S_i) \varvec{\ell }\) is 1, and others are set to 0.
 3.
Set \(\mathsf{CBF}= \mathsf{CBF}_{\ell } \vee \cdots \vee \mathsf{CBF}_n\).
 4.
Execute \(\mathsf{SetCheck}_{m,k}(\mathsf{CBF}, S_i) \longrightarrow \cap ^{\ge d} S[i]\) and output \(\cap ^{\ge d} S[i]\).
The correctness of dandover MPSI follows from the fact that if an element \(x \in \cap ^{\ell } S\) for \(d \le \exists \ell \le n\), the corresponding array locations in \(\mathsf{IBF}(\cap S_i)  \mathbf {j}\) for \(\ell \le \exists j \le n\), where x is mapped by k hashes, are an encryption of 0, which are decrypted to 1; otherwise, it is an encryption of randomized value.
3.1.5 Efficiency
Although many PSI protocols have been proposed, to the best of our knowledge, relatively few consider the multiparty scenario [1, 2, 3, 4]. Our target is multiparty private set intersection, and the final result must be obtained by all players acting together, without a trusted thirdparty (TTP). Among previous MPSI protocols, the approach in [3] computes only the approximate number of intersections, and that in [4] requires more than two TTPs. In contrast, [2] follows almost the same method as [1] and thus has a similar complexity. The only difference exists in the security model. Hence, we only compare our scheme with that of [1].
The computational and communication efficiency of the proposed protocol and [1] are compared in Table 3.1. These approaches are secure against honestbutcurious adversaries without a TTP under exElGamal encryption (DDH security) and Paillier encryption (Decisional Composite Residue (DCR) security), respectively. The Bloom filter parameters (m, k) used in our protocol are set as follows: \(k = 80\) and \(m=80 \omega /\ln 2\), where \(\omega \) is the maximum \(S_i = \omega _i\). Then, the probability of false positives is given by \(p=2^{80}\).
Efficiency of [1] and the proposed protocol
[1]  Ours  

Computational complexity  \(O(n\omega ^2)\)  \(\mathsf{P}_i: O(\omega _i)\), \(\mathcal{O}: O(n\omega )\) 
Communication overhead  \(O(n\omega )\)  \(\mathsf{P}_i: O(\omega + n)\), \(\mathcal{O}: O(n\omega )\) 
Restriction on set size  \(S_1=\cdots =S_n\)  None 
Protected values  \(S_i (\forall i \in [1,n])\)  \(S_i,S_i (\forall i \in [1,n])\) 
3.1.6 System and Performance
PSI or MPSI implicitly assumes that every attendee can provide data, any attendee can retrieve data from the shared data, and all attendees can communicate with each other. If PSI or MPSI is implemented straightforwardly, such implementation should become a system like a peertopeer (P2P) network system. Although a fully distributed system like P2P network has attractive features, such as high availability and scalability, it incurs some unfavorable features.
The network address and port translation (NAPT) is a major obstacle for P2P network systems. Modern P2P network systems take advantage of NAPT traversal technologies to overcome NAPT, but it should be costly to make the architecture complex. The absence of trusted node is also an obstacle for attendee or group management. Making consensus on a P2P network system is difficult or highly costly. Additionally, unpredictable node joining and leaving are reasons that make the P2P network systems complex. To avoid the complexities of P2P networks, we designed a system based on the client server model.
Then, we discuss the design of PSI or MPSI’s client server model. There are 2 main functionalities of PSI or MPSI: (1) First, the data sharing is a functionality for sharing data among attendees. (2) Next, the data retrieving from the shared data is a functionality. Any attendee can retrieve data from the shared data, but the retrieving avoids correcting privacy sensitive data by using privacy preserving techniques described above.
However, we do not assume that every attendee provides and retrieves data. Imagine that an incident analysis situation in which data are provided by several organizations which employ labor and operate some machines, and a research institute collects data from the organizations and analyzes it. In such a situation, data providers do not need the data retrieving functionality, and data analysts do not need the data sharing functionality.
Therefore, we define 3 roles for our MPSI application design as follows.

Parties: entities for data providing

Clients: entities for data retrieving

Dealer: an entity for forwarding requests between parties and clients
We measured performance of our MPSI application written in Python language on an Amazon’s EC2 server (2.4 GHz CPU, 1 GB Memory). Figure 3.4 shows the results when there are from 2 to 4 parties which provide data including 10,000 entries. The results show that it takes approximately 280 s to accomplish data retrieval and that the computational amount does not depend on the number of parties.
3.2 Classification
In this section, we present a secure classification protocol, a type of secure computation protocols. We assume two participants Alice and Bob of the protocol. Alice has private data x, and Bob has a classification model C. The task is that Alice learns C(x) at the end of the protocol while preserving the privacy of x and C. That is, Alice can learn only C(x) and Bob can learn nothing. Our construction is based on a codebased publickey encryption scheme called HQC [20], which is a candidate of NIST’s PostQuantum Cryptography standardization [21].
3.2.1 ErrorCorrecting Code
We start with several fundamental notions for errorcorrecting codes.
Definition 3.3
(Linear code) A code \(\mathbb {C}\) such that \(c_1+c_2 \in \mathbb {C}\) always holds for any codeword \(c_1, c_2 \in \mathbb {C}\) is called a linear code. The code \(\mathbb {C}\) of code length n and information bit number k is described as “a” code.
Definition 3.4
Definition 3.5
Definition 3.6
Definition 3.7
(Cyclic shift) The operation of shifting \((c_0,\dots ,c_{n1})\) to the right by one position with respect to ndimensional vector \(c_i~(i=0,\dots ,n2)\) and moving \(c_{n1}\) to the beginning of the vector is called cyclic shift. That is, for any n dimensional vector \((c_0,\dots ,c_{n1})\), it is a mapping \(\sigma :(c_0,c_1,\dots ,c_{n1})\mapsto (c_{n1},c_0,\dots ,c_{n2})\).
Definition 3.8
(Quasicyclic code) Let \(\varvec{c}=(\varvec{c}_0,\dots ,\varvec{c}_{s1})\in (\mathbb {F}_2^n)^s\) be an arbitrary codeword of code \(\mathbb {C}\) and let \(\sigma \) be a cyclic shift operation. If \((\sigma (\varvec{c}_0),\dots ,\sigma (\varvec{c}_{s1})\in \mathbb {C}\), \(\mathbb {C}\) is called the squasicyclic code. In particular, when s = 1, \(\mathbb {C}\) is called a cyclic code.
Definition 3.9
3.2.2 Security Assumptions
As mentioned above, the security of the publickey cryptosystem HQC is based on the computational difficulty of the quasi cyclic syndrome decoding problem. More specifically, its security is proved under the following quasi cyclic syndrome decoding decision assumptions.
Definition 3.10
(quasicyclic syndrome decoding assumption) The quasicyclic syndrome decoding decision problem of a squasicyclic code in which n and w are integers and the number of blocks is \(s\ge 2\) is \((\mathbf {H},\varvec{y}^\top )\) when the parity check matrix \(\mathbf {H}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {F}^{(snn)\times sn}\) and the matrix \(\varvec{y}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {F}^{snn}\) of random systematic quasicyclic code are given, every efficient algorithm distinguish only with negligible probability whether it is quasicyclic syndrome decoding distribution or the uniform distribution over \(\mathbb {F}^{(snn)\times sn}\times \mathbb {F}^{(snn)}\).
As will be described later, since the security of the secure computation protocol proposed in this section is reduced to the security of HQC, the secure computation protocol of this section is proved to be secure under this assumption as well as under HQC.
3.2.3 Security Requirements for 2PC
Secure twoparty computation is a subproblem of multiparty secure computation. The studies have been conducted by many researchers since it is closely related to many cryptographic protocols. The purpose of 2PC is to construct a generalpurpose protocol so that arbitrary functions can be jointly computed without sharing the input values of the two parties with the other. One of the bestknown examples of 2PCs is the millionaire problem [22] in Yao, where Alice and Bob do not reveal their money and decide who is richer. Specifically, suppose that Alice has a yen, and Bob has b yen. The problem is to decide whether \(a\ge b\) or not while keeping each other secret. Generally speaking, the security requirement of 2PC is that the computation of any function is performed using a protocol without leaking the two inputs to the other, and only the computation result is known.
A twoparty linear function evaluation is a kind of 2PC that satisfies the 2PC security requirements. In other words, the participants perform the evaluation without notifying the other party of their input. In addition, the function of the protocol is the evaluation of linear functions. Specifically, linear function secure computation protocol computes \(f(m)=a\cdot m+b\). The participants in the protocol are called Alice and Bob. Alice’s input is m, and Bob’s input is linear function parameters a, b. Alice gets only the result of \(f(m)=a\cdot m+b\) through the protocol, and Bob gets nothing.
Below we define the security requirements for twoparty linear function secure computation.
Definition 3.11
(Security against semihonest adversaries) Let \(f=(f_A,f_B)\) be the function that maps the input x of Alice(A) and the input y of Bob(B) to \(f_A(x,y)\),\(f_B(x,y)\). A aims to obtain \(f_A(x,y)\) and B aims to obtain \(f_B(x,y)\).
Let \(f=(f_A,f_B)\) be a function of probabilistic polynomial time, and \(\pi \) be a twoway protocol for computing function f. Let the view of A with (x, y) execution \(\pi (x,y)\) and the security parameter n be \(\mathrm{view}^\pi _A(x,y,n)\) and the view of B be \(\mathrm{view}^\pi _B (x,y,n)\). The output of A is \(\mathrm{output}^\pi _A(x,y,n)\) and the output of B is \(\mathrm{output}^\pi _B(x,y,n)\). In addition, the joint output of the two is denoted as \(\mathrm{output}^\pi (x,y,n)=(\mathrm{output}^\pi _A(x,y,n),\mathrm{output}^\pi _B(x,y,n))\).
3.2.4 HQC Encryption Scheme
The protocols proposed in this section are based on the Hamming QuasiCyclic cryptosystem of Gaborit et al. First, we introduce the cryptosystem proposed by Gaborit et al. [20], which is a public key cryptosystem based on the quasicyclic syndrome decoding problem. In this cryptosystem, two kinds of codes quasicyclic code and errorcorrecting code \(\mathbb {C}\) are used. The errorcorrecting code \(\mathbb {C}\) is an arbitrary linear code (such as a BCH code) used for message encoding and decoding and with sufficient error correction capability. A quasicyclic code is used for a security requirement of this public key cryptosystem to generate noise that an adversary cannot decrypt.
 1.
Global parameter settings:
Parameters param = \((n,k,\delta ,w_x,w_r,w_e)\) and the sign \(\mathbb {C}\) generation matrix \({\mathbb G}\in \mathbb {F}^{k \times n}\).
 2.
Key generation:
A generates random \(\varvec{h} {\mathop {\longleftarrow }\limits ^{\$}} \mathbb {R}\).
Furthermore, \((\varvec{x}, \varvec{y}) {\mathop {\longleftarrow }\limits ^{\$}} \mathbb {R}^2\) is generated, and the Hamming weight of \(\varvec{x}, \varvec{y}\) is \(w_x\).
Secret information sk = \((\varvec{x}, \varvec{y})\) Public information pk = \((\varvec{h}, \varvec{s} = \varvec{x} + \varvec{h} \cdot \varvec{y})\). A sends public information pk to B.
 3.
Encryption:
B generates a random \(\varvec{e}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}, (\varvec{r_1},\varvec{r_2}){\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^2\).
The Hamming weight of \(\varvec{e}\) is \(w_e\), and the Hamming weight of \(\varvec{r_1}\) and \(\varvec{r_2}\) is \(w_r\).
Then, we compute \(\varvec{u}=\varvec{r_1 + h} \cdot \varvec{r_2}\) and \(\varvec{v}=\varvec{m}\cdot {\mathbb G}+ \varvec{s} \cdot \varvec{r_2 + e}\) on input \(\varvec{m}\). B sends the ciphertext \(\varvec{u,v}\) back to A.
 4.
Decryption:
A uses the decoding function \(\mathbb {C}\).Decode\((\varvec{vu \cdot y})\) of the errorcorrecting code \(\mathbb {C}\) to recover the message \(\varvec{m}\) of B.
In the HQC cryptosystem, public information \(\varvec{s}\) is added to the message \(\varvec{m}\) encoded by the errorcorrecting code when it is encrypted. Since \(\varvec{s}\) is noise with a large Hamming weight generated by the quasicyclic code, security is guaranteed by the quasicyclic syndrome decoding decision assumption introduced above. In addition, A can use the secret key for the encrypted errorprotected ciphertext in the decryption stage, and can remove a large amount of noise from \(\varvec{s}\). However, some noise of \(\varvec{x\cdot r_2r_1\cdot y+e}\) remains. If the weight of this noise is smaller than the maximum number of correctable errors \(\delta \) of the errorcorrecting code, correct decoding is possible. Hamming weights \(w,w_r,w_e = \mathcal {O}(\sqrt{n})\) are assumed and analyzed. Moreover, the conclusion that the probability of becoming \(\omega (\varvec{x\cdot r_2+ey\cdot r_1})\le \delta \) increases as the code space n becomes larger is shown in the paper of Gaborit et al. In addition, the HQC cryptosystem is INDCPA secure under the quasicyclic syndrome decoding decision assumption.
3.2.5 Proposed Protocol
3.2.5.1 Linear Function Evaluation
We introduce the secure evaluation protocol of the linear functions between two parties.
We use two codes, quasicyclic code and arbitrary errorcorrecting code \(\mathbb {C}\), based on Gaborit’s HQC cryptosystem. The participants in the protocol are Alice (A) and Bob (B). A’s input is \(m\in \mathbb {F}_2\), B’s input is \(a,b\in \mathbb {F}_2\), B’s output is nothing, and A’s output is \(a\cdot m+b\). The protocol is given in Protocol 3.2.5.1.
Protocol
input  A: \(m\in \mathbb {F}_2\) B: \(a,b\in \mathbb {F}_2\) 
output  A: \(a\cdot m+b\) B: \(\perp \) 
 1.
Global parameter param = \((n,k, \delta , w_x, w_r, w_e)\) and the sign \(\mathbb {C}\) generation matrix \({\mathbb G}\in \mathbb {F}^{k \times n}\) are chosen.
 2.
A generates the random \(\varvec{h} {\mathop {\longleftarrow }\limits ^{\$}} \mathbb {R}\). Furthermore, \((\varvec{x},\varvec{y}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^2)\) is generated, and the Hamming weight of \(\varvec{x}\) and \(\varvec{y}\) is w. Secret information sk = \((\varvec{x}, \varvec{y})\), Public information pk = \((\varvec{h}, \varvec{s} = \varvec{x} + \varvec{h} \cdot \varvec{y})\).
 3.
By padding the input m with 0, A makes \(\varvec{m} = (m, 0, \dots , 0)\) of dimension k. A generates a random \(\varvec{r_A, r_u, r_v} {\mathop {\longleftarrow }\limits ^{\$}} \mathbb {R}\). Here, the Hamming weight of \(\varvec{r_A, r_u, r_v}\) is \(w_r\). Then, we compute \((\varvec{u = h \cdot r_A + r_u}, \varvec{v = m} \cdot {\mathbb G}+ \varvec{s \cdot r_A + r_v})\). A sends public information \(\varvec{h, s}\) and ciphertext pair \(\varvec{u, v}\) to B.
 4.
Let B be \(\varvec{b} = (b,0, \dots ,0)\). Generate \(\varvec{r_B}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\) and \((\varvec{e_u},\varvec{e_v}){\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^2\). Here, the Hamming weight of \(\varvec{r_B}\) is \(w_r\), and the Hamming weight of \(\varvec{e_u}\) and \(\varvec{e_v}\) is \(w_e\). B computes \(\varvec{u}'=a\cdot \varvec{u+h\cdot r_B + e_u}\) and \(\varvec{v}'=a\cdot \varvec{v+b\cdot {\mathbb G}+s\cdot r_B + e_v}\). B sends \(\varvec{u}', \varvec{v}'\) back to A.
 5.
A uses \(\mathbb {C}\). Decode(\(\varvec{v'  u' \cdot y}\)) to decode the errorcorrecting code \(\mathbb {C}\), and recovers \(a\cdot m+b\) by taking the first bit of the result.
First, we set global parameters. n is the code length of the code, k is the number of information bits, \(\delta \) is the maximum number of correctable errors in the errorcorrecting code, and \(w_x, w_r, w_e\) are Hamming weights set in advance. For example, it is half the weight of \(\mathcal {O}(\sqrt{n})\) assumed by Gaborit et al. The public parameter \({\mathbb G}\) is a generator matrix of errorcorrecting code \(\mathbb {C}\), which maps messages and codewords as \(\mathbb {F}^k_2\rightarrow \mathbb {F}^n_2\).
A generates random \(\varvec{h}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\) and \((\varvec{x},\varvec{y}){\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^2\) and computes \(\mathbf {s=x + h \cdot y}\). Here,
A pads the input m with 0, making \(\varvec{m} = (m, 0, \dots , 0)\) with dimension k. A generates \(\varvec{r_A,r_u,r_v}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\), encodes the value of \(\varvec{m}\) with an errorcorrecting code, and rerandomizes it. A generates a ciphertext pair of \((\varvec{u=h \cdot r_A + r_u}, \varvec{v=m} \cdot {\mathbb G}+ \varvec{s \cdot r_A + r_v})\) and send it to B. As for B, \(\varvec{v}\) has a noise \(\varvec{s}\) that cannot be decoded, and has no secret information that can be removed, so B cannot learn \(\varvec{m}\).
3.2.5.2 Correctness and Security of the Proposed Protocol
The security requirements of the proposed protocol are described above. In this section, we prove the security against semihonest adversaries.
Theorem 3.2
Under the quasicyclic syndrome decoding assumption, the 2PC protocol securely computes linear functions for semihonest adversaries.
Proof
 1.
Generate \(\varvec{\widetilde{h},\widetilde{r_0},\widetilde{r_A},\widetilde{r_u},\widetilde{r_v},\widetilde{u'},\widetilde{v'}}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\) randomly.
Here, the Hamming weight of \(\varvec{\widetilde{r_A},\widetilde{r_u},\widetilde{r_v}}\) is \(w_r\).
 2.
Output \((\varvec{m},\varvec{x},\varvec{y};\varvec{\widetilde{h},\widetilde{r_A},\widetilde{r_u},\widetilde{r_v};\widetilde{u'},\widetilde{v'}})\).
 1.
Randomly generate \(\varvec{\widetilde{h},\widetilde{s},\widetilde{u},\widetilde{v},\widetilde{r_B},\widetilde{e_u},\widetilde{e_v}}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\). Here, the Hamming weight of \(\varvec{\widetilde{r_B}}\) is \(w_r\), and the Hamming weight of \(\varvec{\widetilde{e_u} and \widetilde{e_v}}\) is \(w_e\)
 2.
Output \((a,b;\varvec{\widetilde{h},\widetilde{s},\widetilde{u},\widetilde{v},\widetilde{r_B},\widetilde{e_u},\widetilde{e_v}})\).
The above protocol works over \(\mathbb {F}_2\), but one can see that this can be easily extended to a larger field \(\mathbb {F}_q\) by using appropriate errorcorrecting linear codes over \(\mathbb {F}_q\).
3.2.5.3 Secure Comparison
Twoparty secure comparison protocol proposed in this section is based on the size comparison method used in the secure decision tree classification protocol of Wu et al. [23]. In this section, we used the following criteria given in Proposition 3.1 for comparison.
Proposition 3.1
In this section, we introduce the proposed protocol for twoparty secret comparison protocol. The proposed protocol for twoparty secret comparison protocol uses a quasicyclic code and an arbitrary errorcorrecting code (For example, ReedSolomon code) on \({\mathbb F}_{q}\). The participants in the protocol are Alice (A) and Bob (B). The input of A is \(c\in \mathbb {N}\), and the input of B is \(d\in \mathbb {N}\). The output of A is the result of the comparison between c and d, and the output of B is none.
The flow of twoparty secret comparison is shown as follows:
Protocol
Input  A : \(c\in \mathbb {N}\) B : \(d\in \mathbb {N}\) 
Output  A : Comparison result of c and d B : \(\perp \) 
 1.
A and B perform binary expansion of c and d for each input so that \(\varvec{c}=c_1c_2\dots c_l, \varvec{d}=d_1d_2\dots d_l\). Then, each bit \(c_i,d_i\) is padded to make \(\varvec{c_i, d_i}, i\in [l]\) of k bits. In addition, they set the global parameter param = \((n,k,\delta ,w_x,w_r)\) and the generator matrix \({\mathbb G}\in {\mathbb F}_{q}^{k \times n}\) of code \(\mathbb {C}\).
 2.
A generates random \(\varvec{h}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\). Furthermore, \((\varvec{x},\varvec{y}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^2)\) with Hamming weight \(w_x\) is generated. Private key \(sk = (\varvec{x},\varvec{y})\), and public key \(pk = (\varvec{h,s=x + h \cdot y})\).
 3.
A generates a random \(\varvec{r_{Ai}},\varvec{r_{ui}},\varvec{r_{vi}}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}, i\in [l]\) with Hamming weight \(w_r\). Then, A computes \(\varvec{u_i=h \cdot r_{Ai}+r_{ui}}\) and \(\varvec{v_i=c_i \cdot G + s \cdot r_{Ai}+r_{vi}}\) for l pairs and sends l pairs of ciphertext \(\varvec{u_i,v_i}\) to B.
 4.
B generates \((\varvec{r_{Bi}},\varvec{e_{ui}},\varvec{e_{vi}}){\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}^3\) with Hamming weight \(w_r^*\) and computes the expression \(c_id_i+1+3\sum _{w<i}(c_w\oplus d_w)\) for \(c_i\). Specifically, B substitutes plaintext \(d_i\) for \(i\in [l]\) in the above formula and sets appropriate \(a_{1i},a_{2i},\dots ,a_{li},\varvec{b_i}\). B computes \(\varvec{u_i}'=a_{1i}\cdot \varvec{u_1}+\dots +\varvec{h\cdot r_{Bi}+e_{ui}}\) and \(\varvec{v_i}'=a_{1i}\cdot \varvec{v_1}+\dots +\varvec{b_i}\cdot {\mathbb G}+\varvec{s\cdot r_{Bi}+e_{vi}}\) for l pairs. Then, the order of \((\varvec{u_i}',\varvec{v_i}')\) of l pairs is randomly replaced and sent to A in a random order.
 5.
A computes \(\varvec{v_i}'\varvec{u_i}'\cdot \varvec{y}\) for each \(i\in [l]\) and decrypts the result. If there is 0 in the first bit of the decoded results, \(c<d\) is output. Conversely, if there is no 0, \(c\ge d\) is output.
 1.
In step 1, A and B expand c and d of each input to lbit binary input, so that \(\varvec{c}=c_1c_2\dots c_l\) and \(\varvec{d}=d_1d_2\dots d_l\). Where \(c_i, d_i, i\in [l]\) is the ith digit of \(\varvec{c,d}\), and l is the bit length. To encode, pad each input to \(\varvec{c_i, d_i}, i\in [l]\) with bit length k.
In addition, set global parameters. n is the code length, k is the number of information bits, \(\delta \) is the maximum number of errors that can be corrected by the errorcorrecting code, and \(w_x\) and \(w_r\) are the Hamming weights set in advance. The public parameter \({\mathbb G}\) is the generator matrix(For example, the ReedSolomon code generator matrix) of the errorcorrecting code \(\mathbb {C}\), which maps the message and code length as \({\mathbb F}_{q}^k\rightarrow {\mathbb F}_{q}^n\).
 2.
In step 2, A generates a private key and public key for HQC encryption scheme.
 3.
In step 3, A uses the public key and encrypts each of the \(\varvec{c_i}\) pieces. Send \((\varvec{u_i},\varvec{v_i}) , i\in [l]\) of the encrypted result to B.
 4.Step 4 uses Proposition 3.1 for the evaluation of \(c_id_i+1+3\sum _{w<i}(c_w\oplus d_w)\). In other words, \(c<d\) if \(i\in [l]\) exists such thatIn particular, since B has plaintext \(d_i\) and encrypted \(c_i\), Eq. (3.20) can be regarded as an equation with \(c_i\) as an unknown and can be computed. In addition, for XOR operations, B can transform \(x_i \oplus y_i\) into$$\begin{aligned} c_id_i+1+3\sum _{w<i}(c_w\oplus d_w)=0. \end{aligned}$$(3.20)Therefore, the XOR operation requires only the additive homomorphism of HQC encryption scheme.$$\begin{aligned} x_i \oplus y_i = \left\{ \begin{array}{ll} x_i &{} (y_i = 0) \\ 1x_i &{} (y_i = 1). \end{array} \right. \end{aligned}$$(3.21)That is, B substitutes plaintext \(d_i, i\in [l]\) into the above equation, sets the appropriate \(a_{1i},a_{2i},\dots ,a_{li},\varvec{b_i}\), and computes as follows:$$\begin{aligned}&\varvec{u_i}'=a_{1i} \!\cdot \!\varvec{u_1}\!+\!\cdots \!+\!a_{li}\!\cdot \!\varvec{u_l}\!+\!\varvec{h\!\cdot \!r_{Bi}\!+\!e_{ui}}. \end{aligned}$$(3.22)Here, the Hamming weight of \(\varvec{r_{Bi}},\varvec{e_{ui}},\varvec{e_{vi}}, i\in [l]\) is \(w_r^*\).$$\begin{aligned}&\varvec{v_i}'=a_{1i} \!\cdot \!\varvec{v_1}\!+\!\cdots \!+\!a_{li}\!\cdot \!\varvec{v_l}\!+\!\varvec{b_i \!\cdot \!G\!+\!s\!\cdot \!r_{Bi}\!+\!e_{vi}}. \end{aligned}$$(3.23)
Furthermore, to not leak the information about which bits are different to A, B needs to replace the order of each \((\varvec{u_i}',\varvec{v_i}')\) computed at random.
 5.In step 5, A computes \(\varvec{v_i}'\varvec{u_i}'\cdot \varvec{y}, i\in [l]\). The result isThen, the evaluation result is decoded by the errorcorrecting code. A takes out the first 1 bit of each of l decoding results, and outputs \(c<d\) if there is 0 in it. If there is no 0, \(c\ge d\) is output.$$\begin{aligned} \begin{aligned}&\varvec{v_i}'  \varvec{u_i}' \cdot \varvec{y} \\ =&~(a_{1i}\cdot \varvec{m_1}+\cdots +a_{li}\cdot \varvec{m_l})\cdot {\mathbb G}\\&+ \varvec{x}\cdot (a_{1i}\cdot \varvec{r_{A1}}+\cdots +a_{li}\cdot \varvec{r_{Al}}+\varvec{r_{Bi}}) \\& \varvec{y}\cdot (a_{1i}\cdot \varvec{r_{u1}}+\cdots +a_{li}\cdot \varvec{r_{ul}}+\varvec{e_{ui}}) \\&+(a_{1i}\cdot \varvec{r_{v1}}+\cdots +a_{li}\cdot \varvec{r_{vl}}+\varvec{e_{vi}}). \end{aligned} \end{aligned}$$(3.24)
3.2.5.4 Correctness and Security of the Proposed Protocol
Correctness
Next, we analyze the validity of the proposed protocol.
The following proposition holds for the Hamming weight of the error.
Proposition 3.2
Proof
In addition, the following analysis is the same as the validity analysis in Gaborit et al. [20]. According to the analysis result of [20], in the case of \(\mathbb {F}_2\), the decoding failure rate can be controlled by setting an appropriate code space size n and noise Hamming weights \(w_x\) and \(w_r\). Therefore, in the case of \({\mathbb F}_{q}\), it can be expected that the decoding failure rate can be controlled by setting the appropriate parameters.
Security
This section describes the security of the proposed secret comparison protocol.
 1.
Generates \(\varvec{\widetilde{h}},\{\widetilde{\varvec{r_{Ai}}}\}^l_{i=1},\{\widetilde{\varvec{r_{ui}}}\}^l_{i=1},\{\widetilde{\varvec{r_{vi}}}\}^l_{i=1},\{\widetilde{\varvec{u_i}'}\}^l_{i=1},\{\widetilde{\varvec{v_i}'}\}^l_{i=1}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\) at random. Here, the Hamming weight of \(\{ \widetilde{\varvec{r_{Ai}}}\}^l_{i=1},\{ \widetilde{\varvec{r_{ui}}}\}^l_{i=1},\{ \widetilde{\varvec{r_{vi}}}\}^l_{i=1}\) is \(w_r\). It also selects random \(i*\in [l]\), the first bit of \(\widetilde{\varvec{u_{i*}}'}\widetilde{\varvec{v_{i*}}'}\cdot \varvec{y}\) is 0, and the first bit of other \(\{ \widetilde{\varvec{u_i}'}\widetilde{\varvec{v_i}'}\cdot \varvec{y}\}^l_{i=1,i\ne i*}\) is nonzero.
 2.
This replaces \(\{ \widetilde{\varvec{u_i}'}\}^l_{i=1},\{ \widetilde{\varvec{v_i}'}\}^l_{i=1}\) at random to make \(\{ \widetilde{\varvec{u_j}'}\}^l_{j=1},\{ \widetilde{\varvec{v_j}'}\}^l_{j=1}\) in random order.
 3.
This outputs \((c,\varvec{x,y};\varvec{\widetilde{h}},\{ \widetilde{\varvec{r_{Ai}}}\}^l_{i=1},\{ \widetilde{\varvec{r_{ui}}}\}^l_{i=1},\{ \widetilde{\varvec{r_{vi}}}\}^l_{i=1},\{ \widetilde{\varvec{u_j}'}\}^l_{j=1},\{ \widetilde{\varvec{v_j}'}\}^l_{j=1})\).
Semihonest adversary A and \(\mathrm{output}_A=(c\ge d)\) are the same as the security proof in the case of \(\mathrm{output}_A=(c<d)\), so details are omitted.
 1.
Generates \(\varvec{\widetilde{h},\widetilde{s}},\{ \widetilde{\varvec{u_i}}\}^l_{i=1},\{ \widetilde{\varvec{v_i}}\}^l_{i=1},\{ \widetilde{\varvec{r_{Bi}}}\}^l_{i=1},\{ \widetilde{\varvec{e_{ui}}}\}^l_{i=1},\{ \widetilde{\varvec{e_{vi}}}\}^l_{i=1}{\mathop {\longleftarrow }\limits ^{\$}}\mathbb {R}\) at random. Here, the Hamming weight of \(\{ \widetilde{\varvec{r_{Bi}}}\}^l_{i=1},\{ \widetilde{\varvec{e_{ui}}}\}^l_{i=1},\{ \widetilde{\varvec{e_{vi}}}\}^l_{i=1}\) is \(w_r^*\).
 2.
This outputs \((\varvec{d};\varvec{\widetilde{h},\widetilde{s}},\{ \widetilde{\varvec{u_i}}\}^l_{i=1}\!,\!\{ \widetilde{\varvec{v_i}}\}^l_{i=1}\!,\!\{ \widetilde{\varvec{r_{Bi}}}\}^l_{i=1}\!,\!\{ \widetilde{\varvec{e_{ui}}}\}^l_{i=1}\!,\!\{ \widetilde{\varvec{e_{vi}}}\}^l_{i=1})\).
3.2.6 Support Vector Machine from Secure Linear Function Evaluation and Secure Comparison
We can construct a codebased protocol for a support vector machine from the protocols for evaluation of linear functions and comparison described above. Note that the result of secure evaluation of linear function is in \(\mathbb {F}_q\) while that of secure composition is a bit string. Therefore, we need to provide secure bitdecomposition protocol. The bitdecomposition protocols have been already studied well in the research area of secure computation, and indeed, we can use the bitdecomposition protocol given in [24] with secure computation protocol from a threshold homomorphic encryption [25]. (It is straightforward to construct a threshold version of HQC scheme by setting \(sk_A=(\varvec{x}_1,\varvec{y}_1)\) and \(sk_B=(\varvec{x}_2,\varvec{y}_2)\) as distributed decryption keys for A and B. Then, the encryption key is \((\varvec{h}, (\varvec{x}_1+\varvec{x}_2)+\varvec{h}\cdot (\varvec{y}_1+\varvec{y}_2)\)).
We describe the overview of the protocol below. For simplification, we denote [m] as the ciphertext for m under HQC encryption scheme over \(\mathbb {F}_q\).
Protocol
Input  A : \(m\in \mathbb {F}_q\) B : \(a,b,t\in \mathbb {F}_q\) 
Output  A : \(a\cdot m+b>t\) or not B : \(\perp \) 
 1.
A and B perform the secure linear evaluation protocol over \(\mathbb {F}_q\). Then, B sends A \([a\cdot m+b]\) at step 4 in the original protocol.
 2.
A and B start the secure bitdecomposition protocol on \([a\cdot m + b]\).
 3.
From the result of the bitdecomposition protocol, B obtains the binary representation \([(a\cdot m + b)_1],\ldots ,[(a\cdot m + b)_\ell ]\).
 4.
A and B perform the secure comparison protocol from step 4.
Footnotes
 1.
The computational complexity of z for each player can be made independent of the number of players in various ways. For example, set \(z=1\). \(\mathsf{P}_1\) computes \(z=z \cdot z_1\) and sends z to \(\mathsf{P}_2\), \(\mathsf{P}_2\) computes \(z=z \cdot z_2\) and sends z to \(\mathsf{P}_3\), and, finally, \(\mathsf{P}_n\) computes \(z=z \cdot z_n\) and shares z among all players. If we place all players in a binary tree, the communication complexity can be reduced, but each player’s computational complexity is still independent of the number of players.
References
 1.L. Kissner, D. Song, Privacypreserving set operations, in CRYPTO 2005. LNCS, vol. 3621 (Springer, Berlin, 2005), pp. 241–257Google Scholar
 2.Y. Sang, H. Shen, Efficient and secure protocols for privacypreserving set operations. ACM Trans. Inf. Syst. Secur. 13(1), 9:1–9:35 (2009)CrossRefGoogle Scholar
 3.R. Egert, M. Fischlin, D. Gens, S. Jacob, M. Senker, J. Tillmanns, Privately computing setunion and setintersection cardinality via bloom filters, in ACISP 2015. LNCS, vol. 9144 (Springer, Berlin, 2015), pp. 413–430Google Scholar
 4.D. Many, M. Burkhart, X. Dimitropoulos, Fast private set operations with sepia. Technical Report, 345 (2012)Google Scholar
 5.O. Goldreich, Secure multiparty computation. Manuscript, Preliminary version (1998)Google Scholar
 6.B.H. Bloom, Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefGoogle Scholar
 7.A. Broder, M. Mitzenmacher, Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)MathSciNetCrossRefGoogle Scholar
 8.P. Paillier, Publickey cryptosystems based on composite degree residuosity classes, in EUROCRYPT 1999. LNCS, vol. 1592 (Springer, Berlin, 1999), pp. 223–238Google Scholar
 9.R. Cramer, R. Gennaro, B. Schoenmakers, A secure and optimally efficient multiauthority election scheme. Eur. Trans. Telecommun. 8(5), 481–490 (1997)CrossRefGoogle Scholar
 10.Y. Desmedt, Y. Frankel, Threshold cryptosystems, in CRYPTO 1989. LNCS, vol. 1462 (Springer, Berlin, 1989), pp. 307–315Google Scholar
 11.M.J. Freedman, K. Nissim, B. Pinkas, Efficient private matching and set intersection, in EUROCRYPT 2004. LNCS, vol. 3027 (Springer, Berlin, 2004), pp. 1–19Google Scholar
 12.Y. Azar, A.Z. Broder, A.R. Karlin, E. Upfal, Balanced allocations. SIAM J. Comput. 29(1), 180–200 (1999)MathSciNetCrossRefGoogle Scholar
 13.E. De Cristofaro, G. Tsudik, Practical private set intersection protocols with linear complexity, in FC 2010. LNCS, vol. 6052 (Springer, Berlin, 2010), pp. 143–159Google Scholar
 14.E. De Cristofaro, J. Kim, G. Tsudik, Linearcomplexity private set intersection protocols secure in malicious model, in ASIACRYPT 2010. LNCS, vol. 6477 (Springer, Berlin, 2010), pp. 213–231Google Scholar
 15.F. Kerschbaum, Outsourced private set intersection using homomorphic encryption, in ACMCCS 2012 (ACM, 2012), pp. 85–86Google Scholar
 16.S. Goldwasser, S. Micali, Probabilistic encryption. J. Comput. Syst. Sci. 28(2), 270–299 (1984)MathSciNetCrossRefGoogle Scholar
 17.Y. Ishai, J. Kilian, K. Nissim, E. Petrank, Extending oblivious transfers efficiently, in CRYPTO 2003. LNCS, vol. 2729 (Springer, Berlin, 2003), pp. 145–161Google Scholar
 18.M.O. Rabin, How to exchange secrets with oblivious transfer. Technical Memo, TR81 (1981)Google Scholar
 19.C. Dong, L. Chen, Z. Wen, When private set intersection meets big data: an efficient and scalable protocol, in ACMCCS 2013 (ACM, 2013), pp. 789–800Google Scholar
 20.C. Aguilar, O. Blazy, J.C. Deneuville, P. Gaborit, G. Zémor, Efficient encryption from random quasicyclic codes. IEEE Trans. Inf. Theory 64(5), 3927–3943 (2018)MathSciNetCrossRefGoogle Scholar
 21.National Institute of Standards and Technology. Postquantum cryptography, round 2 submissions (2019), https://csrc.nist.gov/projects/postquantumcryptography/round2submissions
 22.A.C.C. Yao, How to generate and exchange secrets, in Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science (1986), pp. 162–167Google Scholar
 23.D.J. Wu, T. Feng, M. Naehrig, K. Lauter, Privately evaluating decision trees and random forests, in Proceeding on Privacy Enhancing Technologies, vol. 4 (2016), pp. 1–21Google Scholar
 24.I. Dangaard, M. Fitzi, E. Kiltz, J.B. Nielsen, T. Toft, Unconditionally secure constantrounds multiparty computation for equality, comparison, bits and exponentiation, in TCC2006: Theory of Cryptography (2006), pp. 285–304Google Scholar
 25.R. Cramer, I. Damgaard, J.B. Nielsen, Multiparty computation from threshold encryption, in Eurocrypt (2001), pp. 280–299Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.