Tradeoff Between the Price of Distributing a Database and Its Collusion Resistance Based on Concatenated Codes

Bui, Thach V.; Nguyen, Thuc D.; Sonehara, Noboru; Echizen, Isao

doi:10.1007/978-3-319-27122-4_12

Thach V. Bui^17,18,
Thuc D. Nguyen¹⁹,
Noboru Sonehara^17,18 &
…
Isao Echizen^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9529))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1320 Accesses

Abstract

The purchasing of customer databases, which is becoming more and more common, has led to a big problem: the use of purchased databases to mount a collusion attack, which is when purchasers of a database illegally combine their versions of it in order to de-anonymize the private information it contains. However, the purchasing of customer database is only available in the black market. In this paper, we first investigated the relationship between the price of distributing a database and its collusion resistance. A fingerprint is embedded in database so that illegal distributors can be identified. The fingerprints are constructed on the basic of concatenated codes. After the fingerprint is embedded, the price of distributing the database and its collusion resistance are modelled as decreasing functions. The less expensive the database is, the less collusion resistance the database owner deals with. There are upper and lower bounds for the collusion capabilities. To the best of our knowledge, this scheme is unique in that the tradeoff between the price of distributing a database and its collusion resistance is based on a mathematical model. Second, we propose a guideline to sell customer database legally with profit and risk evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A zettabyte is 1,000,000,000,000,000,000,000 bytes. Imagine that every person in Vietnam (population of 92.5 million in 2014) took a digital photo every second of every day for over three months. All of those photos put together would equal about one zettabyte.
2.
Big Data is a term that refers to “large, diverse, complex, longitudinal, and/or distributed datasets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future” [30].

References

White House. Big Data: Seizing Opportunities, Preserving Values (2014)
Google Scholar
Financial Times. Digital hunter-gatherers (2013). http://www.ft.com/intl/cms/s/0/f840dbc0-d34f-11e2-b3ff-00144feab7de.html#axzz3eS2n1CZx
The Guardian. How much is your personal data worth? (2014). http://www.theguardian.com/news/datablog/2014/apr/22/how-much-is-personal-data-worth
Forbes. The black market price of your personal info (2010). http://www.forbes.com/2010/11/29/black-market-price-of-your-info-personal-finance.html
Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze Future 2007, 1–16 (2012)
Google Scholar
Kieseberg, P., Schrittwieser, S., Mulazzani, M., Echizen, I., Weippl, E.: An algorithm for collusion-resistant anonymization and fingerprinting of sensitive microdata. Electron. Markets 24(2), 113–124 (2014)
Article Google Scholar
Motwani, R., Xu, Y.: Efficient algorithms for masking and finding quasi-identifiers. In: Proceedings of the Conference on Very Large Data Bases (VLDB), pp. 83–93 (2007)
Google Scholar
Lodha, S.P., Thomas, D.: Probabilistic anonymity. In: Bonchi, F., Malin, B., Saygın, Y. (eds.) PInKDD 2007. LNCS, vol. 4890, pp. 56–79. Springer, Heidelberg (2008)
Chapter Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain., Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)
Article MathSciNet MATH Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain., Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Article MathSciNet MATH Google Scholar
El Emam, K., Dankar, F.K., Isaa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., et al.: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. 16(5), 670–682 (2009)
Article Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International (1998)
Google Scholar
Schrittwieser, S., Kieseberg, P., Echizen, I., Wohlgemuth, S., Sonehara, N., Weippl, E.: An algorithm for k-anonymity-based fingerprinting. In: Shi, Y.Q., Kim, H.-J., Perez-Gonzalez, F. (eds.) IWDW 2011. LNCS, vol. 7128, pp. 439–452. Springer, Heidelberg (2012)
Chapter Google Scholar
Willenborg, L., Kardaun, J.: Fingerprints in microdata sets. In: CBS (1999)
Google Scholar
Bui, T.V., Nguyen, B.Q., Nguyen, T.D., Sonehara, N., Echizen, I.: Robust fingerprinting codes for database. In: Aversa, R., Kołodziej, J., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013, Part II. LNCS, vol. 8286, pp. 167–176. Springer, Heidelberg (2013)
Chapter Google Scholar
Bui, T.V., Nguyen, B.Q., Nguyen, T.D., Sonehara, N., Echizen, I.: Robust fingerprinting codes for database using non-adaptive group testing. Int. J. Big Data Intell. 2(2), 81–90 (2015)
Article Google Scholar
Li, Y., Swarup, V., Jajodia, S.: Fingerprinting relational databases: schemes and specialties. IEEE Trans. Dependable Secure Comput. 2(1), 34–45 (2005)
Article Google Scholar
Guo, F., Wang, J., Li, D.: Fingerprinting relational databases. In: Proceedings of the ACM Son Applied Computing, pp. 487–492. ACM (2006)
Google Scholar
Agrawal, R., Kiernan, J.: Watermarking relational databases. In: Proceedings of the 28th International Conference on VLDB, pp. 155–166 (2002)
Google Scholar
Forney Jr., G.D.: Concatenated codes. DTIC Document (1965)
Google Scholar
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)
Article MathSciNet MATH Google Scholar
Wicker, S.B., Bhargava, V.K.: Reed-Solomon Codes and Their Applications. Wiley-IEEE Press, New York (1999)
Book MATH Google Scholar
Gilbert, E.N.: A comparison of signalling alphabets. Bell Syst. Tech. J. 31(3), 504–522 (1952)
Article Google Scholar
Varshamov, R.R.: Estimate of the number of signals in error correcting codes. Dokl. Akad. Nauk SSSR 117(5), 739–741 (1957)
MathSciNet MATH Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Article Google Scholar
McAfee, A., Brynjolfsson, E.: Big data: the management revolution. Harvard Bus. Rev. 90, 60–68 (2012)
Google Scholar
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–27 (2014)
Article Google Scholar
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
NSF-NIH Interagency Initiative. Core techniques and technologies for advancing big data science and engineering (BIGDATA) (2012)
Google Scholar
Google. http://investor.fb.com/releasedetail.cfm?ReleaseID=893395
Google. http://www.google.com/zeitgeist/2012/#the-world

Download references

Author information

Authors and Affiliations

Department of Multidisciplinary Sciences, School of Informatics, SOKENDAI (The Graduate University for Advanced Studies), 1560-35 Kamiyamaguchi, Hayama, Kanagawa Prefecture, 240-0115, Japan
Thach V. Bui, Noboru Sonehara & Isao Echizen
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Thach V. Bui, Noboru Sonehara & Isao Echizen
Faculty of Information Technology, Ho Chi Minh City University of Science, 225 Nguyen Van Cu Street, District 5, Ho Chi Minh City, Vietnam
Thuc D. Nguyen

Authors

Thach V. Bui
View author publications
You can also search for this author in PubMed Google Scholar
Thuc D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Noboru Sonehara
View author publications
You can also search for this author in PubMed Google Scholar
Isao Echizen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thach V. Bui .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University , Changsha, China
Kenli Li

A Omitted Proofs from Sect. 5

Proof of Theorem 19

Proof

Set $n = n_1 n_2$. Since C is a matrix of $n_1n_2 \times 2^{k_1k_2}$, each codeword of C has a length of $n_1n_2$. Each database is represented by a codeword of C. Since the Hamming weight in a codeword is $n_1w$, each entry of a codeword is randomly assigned to 1 with probability $p_j = \frac{n_1 w_j}{n_1 n_2} = \frac{w_j}{n_2}$.

Colluders achieve perfect collusion if $\prod _{i \in C, i=1}^{|C|} x_{ij}=0$ for all $j=1,\ldots ,n$. The probability of a row contains at least one 0 is:

$$\begin{aligned} 1 - \prod _{i=1}^c p_{j_i} = 1 - \frac{\prod _{i=1}^{c}w_{j_i}}{n_2^c} \le 1 - \left( \frac{w_{min}}{n_2} \right) ^c \end{aligned}$$

(3)

The probability of n rows whose each row containing at least one 0 each is:

$$\begin{aligned} \left( 1 - \prod _{i=1}^c p_{j_i} \right) ^n \le \left( 1 - \left( \frac{w_{min}}{n_2} \right) ^c \right) ^{n_1 n_2} \end{aligned}$$

(4)

When $C_{in}$ is a constant codeword weight code, $w_{min} = w_{max} = w_i$. Therefore, the Eq. 4 holds.

Proof of Theorem 22

Proof

In accordance with Definition 12, the price of distributing a database is

$$\begin{aligned} \frac{1}{wt(\texttt {a~codeword}) + 1}< & {} price(\texttt {a~database})= \frac{1}{wt(\texttt {a~codeword}) + \frac{sum(\texttt {a~codeword}) - wt(\texttt {a~codeword})}{sum(\texttt {a~codeword})}} \nonumber \\< & {} \frac{1}{w_{min}} \end{aligned}$$

(5)

Since C has $2^{k_1k_2}$ codewords, the total price of distributing databases embedded with these codewords is

$$\begin{aligned} \sum _{i=1}^{2^{k_1 k_2}} \frac{1}{wt(\texttt {a~codeword}) + \frac{sum(\texttt {a~codeword}) - wt(\texttt {a~codeword})}{sum(\texttt {a~codeword})}} < 2^{k_1 k_2} \times \frac{1}{n_1w_{min}} \end{aligned}$$

Since the number of attributes of the database is unchanged, the block length of $C_{in}$ is unchanged. Suppose that there is another $w'_{max}$-weight code $C'_{in}$ $[n_2, k'_2, d'_2]_2$. The price of distributing the database when using $C'$ generated using $C_{out}$ and $C'_{in}$ is:

$$\begin{aligned} \sum _{i=1}^{2^{k_1 k'_2}} \frac{1}{wt(\texttt {a~codeword}) + \frac{sum(\texttt {a~codeword}) - wt(\texttt {a~codeword})}{sum(\texttt {a~codeword})}} > 2^{k_1 k'_2} \times \frac{1}{n_1 w'_{max} + 1} \end{aligned}$$

To complete our proof, we prove that if the price of distributing the databases when using $C'$ is larger than the price of distributing the databases when using C, $k_2 > k'_2$. Indeed, we have:

$$\begin{aligned} 2^{k_1 k_2} \times \frac{1}{n_1w_{min}}> & {} \sum _{i=1}^{2^{k_1 k_2}} \frac{1}{wt(\texttt {a~codeword}) + \frac{sum(\texttt {a~codeword}) - wt(\texttt {a~codeword})}{sum(\texttt {a~codeword})}}\end{aligned}$$

(6)

$$\begin{aligned}> & {} \sum _{i=1}^{2^{k_1 k'_2}} \frac{1}{wt(\texttt {a~codeword}) + \frac{sum(\texttt {a~codeword}) - wt(\texttt {a~codeword})}{sum(\texttt {a~codeword})}}\end{aligned}$$

(7)

$$\begin{aligned}> & {} 2^{k_1 k'_2} \times \frac{1}{n_1 w'_{max} + 1}\end{aligned}$$

(8)

$$\begin{aligned} \Rightarrow 2^{k_1 (k_2 - k'_2)}> & {} \frac{n_1 w_{min}}{n_1 w'_{max} + 1} \end{aligned}$$

(9)

We consider three cases:

1.
If $k_2 < k'_2$,
$$\begin{aligned} 2^{k_1 (k_2 - k'_2)} \le \frac{1}{2^{k_1}} < \frac{1}{n_1} < \frac{n_1 w_{min}}{n_1 w'_{max} + 1} \end{aligned}$$
(10)
because $0 < w_{min}, w'_{max} \le n_2 \le n_1$. It is contradicted to inequality 6.
2.
If $k_2 = k'_2$, $d_2 = d'_2$ because $k_2$ and $k'_2$ are the largest numbers such that $k_2 \le n_2 - d_2 + 1$ and $k'_2 \le n_2 \ d'_2 + 1$. Therefore, the price of distributing the databases when using $C'$ is equal to the price of distributing the databases when using C. It is contradicted to our hypothesis.
3.
If $k_2 > k'_2$, the inequality 6 holds.

Thus, if the price of distributing the databases when using $C'$ is larger than the price of distributing the databases when using C, $k_2 > k'_2$. If $k'_2 < k_2$, $d'_2 > d_2$.

According to Corollary 20, if the relative distance of $C_{in}$ increases (decreases), the probability of perfect collusion decreases (increases). According to Corollary 15, if the code rate of $C_{in}$ increases, the price of distributing databases using C increases. Therefore, the lower the price of distributing a database, the less collusion resistance the database owner deals with.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, T.V., Nguyen, T.D., Sonehara, N., Echizen, I. (2015). Tradeoff Between the Price of Distributing a Database and Its Collusion Resistance Based on Concatenated Codes. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-27122-4_12
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tradeoff Between the Price of Distributing a Database and Its Collusion Resistance Based on Concatenated Codes

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Omitted Proofs from Sect. 5

A Omitted Proofs from Sect. 5

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation