The Conflict Between Big Data and Individual Privacy

Xu, Lei; Jiang, Chunxiao; Qian, Yi; Ren, Yong

doi:10.1007/978-3-319-77965-2_1

Lei Xu⁵,
Chunxiao Jiang⁶,
Yi Qian⁷ &
…
Yong Ren⁸

1253 Accesses
2 Citations

Abstract

With the growing popularity of big data applications, data mining technologies has attracted more and more attention in recent years. In the meantime, the fact that data mining may bring serious threat to individual privacy has become a major concern. How to deal with the conflict between big data and individual privacy is an urgent issue. In this chapter, we review the privacy issues related to data mining in a systematic way, and investigate various approaches that can help to protect privacy. According to the basic procedure of data mining, we identify four different types of users involved in big data applications, namely data provider, data collector, data miner and decision maker. For each type of user, we discuss its privacy concerns and the methods it can adopt to protect sensitive information. Basics of related research topics are introduced, and state-of-the-art approaches are reviewed. We also present some preliminary thoughts on future research directions. Specifically, we emphasize the game theoretical approaches that are proposed for analyzing the interactions among different users in a data mining scenario. By differentiating the responsibilities of different users with respect to information security, we’d like to provide some useful insights into the trade-off between data exploration and privacy protection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.netflix.com .

References

J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques. Morgan Kaufmann, 2006.
Google Scholar
L. Brankovic and V. Estivill-Castro, “Privacy issues in knowledge discovery and data mining,” in Australian institute of computer ethics conference, 1999, pp. 89–99.
Google Scholar
R. Agrawal and R. Srikant, “Privacy-preserving data mining,” SIGMOD Rec., vol. 29, no. 2, pp. 439–450, 2000.
Google Scholar
Y. Lindell and B. Pinkas, “Privacy preserving data mining,” in Advances in Cryptology-CRYPTO 2000. Springer, 2000, pp. 36–54.
Google Scholar
C. C. Aggarwal and S. Y. Philip, A general survey of privacy-preserving data mining models and algorithms. Springer, 2008.
Google Scholar
N. Nethravathi, V. J. Desai, P. D. Shenoy, M. Indiramma, and K. Venugopal, “A brief survey on privacy preserving data mining techniques,” Data Mining and Knowledge Engineering, vol. 8, no. 9, pp. 267–273, 2016.
Google Scholar
L. Xu, C. Jiang, Y. Chen, J. Wang, and Y. Ren, “A framework for categorizing and applying privacy-preservation techniques in big data mining,” Computer, vol. 49, no. 2, pp. 54–62, Feb 2016.
Google Scholar
L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: Privacy and data mining,” IEEE Access, vol. 2, pp. 1149–1176, 2014.
Google Scholar
E. Rasmusen and B. Blackwell, Games and information. Cambridge, 1994, vol. 2.
Google Scholar
R. Gibbons, A primer in game theory. Harvester Wheatsheaf Hertfordshire, 1992.
Google Scholar
D. C. Parkes, “Iterative combinatorial auctions: Achieving economic and computational efficiency,” Ph.D. dissertation, Philadelphia, PA, USA, 2001.
Google Scholar
S. Carter, “Techniques to pollute electronic profiling,” Apr. 26 2007, US Patent App. 11/257,614. [Online]. Available: https://www.google.com/patents/US20070094738
V. C. Inc., “2013 data breach investigations report,” 2013. [Online]. Available: http://www.verizonenterprise.com/resources/reports/rp_data-breach-investigations-report-2013_en_xg.pdf
A. Narayanan and V. Shmatikov, “How to break anonymity of the netflix prize data set,” The University of Texas at Austin, 2007.
Google Scholar
B. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Computing Surveys (CSUR), vol. 42, no. 4, p. 14, 2010.
Google Scholar
L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.
Google Scholar
B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” ACM SIGKDD Explorations Newsletter, vol. 10, no. 2, pp. 12–22, 2008.
Google Scholar
X. Wu, X. Ying, K. Liu, and L. Chen, “A survey of privacy-preservation of graphs and social networks,” in Managing and mining graph data. Springer, 2010, pp. 421–453.
Google Scholar
S. Sharma, P. Gupta, and V. Bhatnagar, “Anonymisation in social network: a literature survey and classification,” International Journal of Social Network Mining, vol. 1, no. 1, pp. 51–66, 2012.
Google Scholar
W. Peng, F. Li, X. Zou, and J. Wu, “A two-stage deanonymization attack against anonymized social networks,” Computers, IEEE Transactions on, vol. 63, no. 2, pp. 290–303, Feb 2014.
Google Scholar
C. Sun, P. S. Yu, X. Kong, and Y. Fu, “Privacy preserving social network publication against mutual friend attacks,” arXiv preprint arXiv:1401.3201, 2013.
Google Scholar
C. Tai, P. Yu, D. Yang, and M. Chen, “Structural diversity for resisting community identification in published social networks,” 2013.
Google Scholar
M. Hafez Ninggal and J. Abawajy, “Attack vector analysis and privacy-preserving social network data publishing,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 2011, pp. 847–852.
Google Scholar
N. Medforth and K. Wang, “Privacy risk in graph stream publishing for social network data,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011, pp. 437–446.
Google Scholar
C. Tai, P. Tseng, P. Yu, and M. Chen, “Identity protection in sequential releases of dynamic social networks,” 2013.
Google Scholar
G. Ghinita, Privacy for Location-based Services, ser. Synthesis Lectures on Information Security, Privacy, and Trust. Morgan & Claypool Publishers, 2013.
Google Scholar
M. Wernke, P. Skvortsov, F. Dürr, and K. Rothermel, “A classification of location privacy attacks and approaches,” Personal and Ubiquitous Computing, vol. 18, no. 1, pp. 163–175, 2014.
Google Scholar
M. Terrovitis and N. Mamoulis, “Privacy preservation in the publication of trajectories,” in Mobile Data Management, 2008. MDM’08. 9th International Conference on. IEEE, 2008, pp. 65–72.
Google Scholar
R. Chen, B. Fung, N. Mohammed, B. C. Desai, and K. Wang, “Privacy-preserving trajectory data publishing by local suppression,” Information Sciences, vol. 231, pp. 83–97, 2013.
Google Scholar
M. Ghasemzadeh, B. Fung, R. Chen, and A. Awasthi, “Anonymizing trajectory data for passenger flow analysis,” Transportation Research Part C: Emerging Technologies, vol. 39, pp. 63–79, 2014.
Google Scholar
G. Poulis, S. Skiadopoulos, G. Loukides, and A. Gkoulalas-Divanis, “Distance-based kˆ m-anonymization of trajectory data,” in Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, vol. 2. IEEE, 2013, pp. 57–62.
Google Scholar
F. Bonchi, L. V. Lakshmanan, and H. W. Wang, “Trajectory anonymity in publishing personal mobility data,” ACM Sigkdd Explorations Newsletter, vol. 13, no. 1, pp. 30–42, 2011.
Google Scholar
X. Xiao and Y. Tao, “Personalized privacy preservation,” in Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 2006, pp. 229–240.
Google Scholar
B. Wang and J. Yang, “Personalized (α, k)-anonymity algorithm based on entropy classification,” Journal of Computational Information Systems, vol. 8, no. 1, pp. 259–266, 2012.
Google Scholar
Y. Xua, X. Qina, Z. Yanga, Y. Yanga, and K. Lia, “A personalized k-anonymity privacy preserving method,” 2013.
Google Scholar
S. Yang, L. Lijie, Z. Jianpei, and Y. Jing, “Method for individualized privacy preservation.” International Journal of Security & Its Applications, vol. 7, no. 6, 2013.
Google Scholar
A. Halevy, A. Rajaraman, and J. Ordille, “Data integration: the teenage years,” in Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, 2006, pp. 9–16.
Google Scholar
V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,” ACM Sigmod Record, vol. 33, no. 1, pp. 50–57, 2004.
Google Scholar
R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 207–216.
Google Scholar
V. S. Verykios, “Association rule hiding methods,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 28–36, 2013.
Google Scholar
D. Jain, P. Khatri, R. Soni, and B. K. Chaurasia, “Hiding sensitive association rules without altering the support of sensitive item (s),” in Advances in Computer Science and Information Technology. Networks and Communications. Springer, 2012, pp. 500–509.
Google Scholar
M. N. Dehkordi, “A novel association rule hiding approach in olap data cubes,” Indian Journal of Science & Technology, vol. 6, no. 2, 2013.
Google Scholar
J. Bonam, A. R. Reddy, and G. Kalyani, “Privacy preserving in association rule mining by data distortion using pso,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol II. Springer, 2014, pp. 551–558.
Google Scholar
T. Mielikäinen, “On inverse frequent set mining,” in Workshop on Privacy Preserving Data Mining, 2003, pp. 18–23.
Google Scholar
X. Chen and M. Orlowska, “A further study on inverse frequent set mining,” in Advanced Data Mining and Applications. Springer, 2005, pp. 753–760.
Google Scholar
Y. Guo, “Reconstruction-based association rule hiding,” in Proceedings of SIGMOD2007 Ph. D. Workshop on Innovative Database Research, vol. 2007, 2007, pp. 51–56.
Google Scholar
J. Brickell and V. Shmatikov, “Privacy-preserving classifier learning,” in Financial Cryptography and Data Security. Springer, 2009, pp. 128–147.
Google Scholar
P. K. Fong and J. H. Weber-Jahnke, “Privacy preserving decision tree learning using unrealized data sets,” Knowledge and Data Engineering, IEEE Transactions on, vol. 24, no. 2, pp. 353–364, 2012.
Google Scholar
M. A. Sheela and K. Vijayalakshmi, “A novel privacy preserving decision tree induction,” in Information & Communication Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013, pp. 1075–1079.
Google Scholar
O. Goldreich, “Secure multi-party computation,” Manuscript. Preliminary version, 1998. [Online]. Available: http://www.wisdom.weizmann.ac.il/~oded/PS/prot.ps
M. E. Skarkala, M. Maragoudakis, S. Gritzalis, and L. Mitrou, “Privacy preserving tree augmented naïve Bayesian multi-party implementation on horizontally partitioned databases,” in Trust, Privacy and Security in Digital Business. Springer, 2011, pp. 62–73.
Google Scholar
F. Zheng and G. I. Webb, “Tree augmented naive bayes,” in Encyclopedia of Machine Learning. Springer, 2010, pp. 990–991.
Google Scholar
J. Vaidya, B. Shafiq, A. Basu, and Y. Hong, “Differentially private naive bayes classification,” in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1. IEEE, 2013, pp. 571–576.
Google Scholar
C. Dwork, “Differential privacy,” in Automata, languages and programming. Springer, 2006, pp. 1–12.
Google Scholar
H. Xia, Y. Fu, J. Zhou, and Y. Fang, “Privacy-preserving svm classifier with hyperbolic tangent kernel,” Journal of Computational Information Systems6, vol. 5, pp. 1415–1420, 2010.
Google Scholar
K.-P. Lin and M.-S. Chen, “On the design and analysis of the privacy-preserving svm classifier,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 11, pp. 1704–1717, 2011.
Google Scholar
R. Rajalaxmi and A. Natarajan, “An effective data transformation approach for privacy preserving clustering,” Journal of Computer Science, vol. 4, no. 4, p. 320, 2008.
Google Scholar
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28–34, 2002.
Google Scholar
J. Vaidya and C. Clifton, “Privacy-preserving k-means clustering over vertically partitioned data,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003, pp. 206–215.
Google Scholar
S. Jha, L. Kruger, and P. McDaniel, “Privacy preserving clustering,” in Computer Security–ESORICS 2005. Springer, 2005, pp. 397–417.
Google Scholar
R. Akhter, R. J. Chowdhury, K. Emura, T. Islam, M. S. Rahman, and N. Rubaiyat, “Privacy-preserving two-party k-means clustering in malicious model,” in Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual. IEEE, 2013, pp. 121–126.
Google Scholar
I. De and A. Tripathy, “A secure two party hierarchical clustering approach for vertically partitioned data set with accuracy measure,” in Recent Advances in Intelligent Informatics. Springer, 2014, pp. 153–162.
Google Scholar
Y. L. Simmhan, B. Plale, and D. Gannon, “A survey of data provenance in e-science,” ACM Sigmod Record, vol. 34, no. 3, pp. 31–36, 2005.
Google Scholar
O. Hartig, “Provenance information in the web of data.” in LDOW, 2009.
Google Scholar
L. Moreau, “The foundations for provenance on the web,” Foundations and Trends in Web Science, vol. 2, no. 2–3, pp. 99–241, 2010.
Google Scholar
G. Barbier, Z. Feng, P. Gundecha, and H. Liu, “Provenance data in social media,” Synthesis Lectures on Data Mining and Knowledge Discovery, vol. 4, no. 1, pp. 1–84, 2013.
Google Scholar
M. Tudjman and N. Mikelic, “Information science: Science about information, misinformation and disinformation,” Proceedings of Informing Science+ Information Technology Education, pp. 1513–1527, 2003.
Google Scholar
M. J. Metzger, “Making sense of credibility on the web: Models for evaluating online information and recommendations for future research,” Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 2078–2091, 2007.
Google Scholar
F. Yang, Y. Liu, X. Yu, and M. Yang, “Automatic detection of rumor on sina weibo,” in Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012, p. 13.
Google Scholar
S. Sun, H. Liu, J. He, and X. Du, “Detecting event rumors on sina weibo automatically,” in Web Technologies and Applications. Springer, 2013, pp. 120–131.
Google Scholar
R. K. Adl, M. Askari, K. Barker, and R. Safavi-Naini, “Privacy consensus in anonymization systems via game theory,” in Data and Applications Security and Privacy XXVI. Springer, 2012, pp. 74–89.
Google Scholar
K. Barker, J. Denzinger, and R. Karimi Adl, “A negotiation game: Establishing stable privacy policies for aggregate reasoning,” 2012.
Google Scholar
H. Kargupta, K. Das, and K. Liu, “Multi-party, privacy-preserving distributed data mining using a game theoretic framework,” in Knowledge Discovery in Databases: PKDD 2007. Springer, 2007, pp. 523–531.
Google Scholar
A. Miyaji and M. S. Rahman, “Privacy-preserving data mining: a game-theoretic approach,” in Data and Applications Security and Privacy XXV. Springer, 2011, pp. 186–200.
Google Scholar
X. Ge, L. Yan, J. Zhu, and W. Shi, “Privacy-preserving distributed association rule mining based on the secret sharing technique,” in Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on. IEEE, 2010, pp. 345–350.
Google Scholar
N. R. Nanavati and D. C. Jinwala, “A novel privacy preserving game theoretic repeated rational secret sharing scheme for distributed data mining,” dcj, vol. 91, p. 9426611777, 2013.
Google Scholar
M. Halkidi and I. Koutsopoulos, “A game theoretic framework for data privacy preservation in recommender systems,” in Machine Learning and Knowledge Discovery in Databases. Springer, 2011, pp. 629–644.
Google Scholar
S. Ioannidis and P. Loiseau, “Linear regression as a non-cooperative game,” in Web and Internet Economics. Springer, 2013, pp. 277–290.
Google Scholar
S. L. Chakravarthy, V. V. Kumari, and C. Sarojini, “A coalitional game theoretic mechanism for privacy preserving publishing based on< i> k</i>-anonymity,” Procedia Technology, vol. 6, pp. 889–896, 2012.
Google Scholar
R. Nix and M. Kantarciouglu, “Incentive compatible privacy-preserving distributed classification,” Dependable and Secure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 451–462, 2012.
Google Scholar
M. Kantarcioglu and W. Jiang, “Incentive compatible privacy-preserving data analysis,” Knowledge and Data Engineering, IEEE Transactions on, vol. 25, no. 6, pp. 1323–1335, 2013.
Google Scholar
A. Panoui, S. Lambotharan, and R. C.-W. Phan, “Vickrey-clarke-groves for privacy-preserving collaborative classification,” in Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. IEEE, 2013, pp. 123–128.
Google Scholar
A. Ghosh and A. Roth, “Selling privacy at auction,” in Proceedings of the 12th ACM conference on Electronic commerce. ACM, 2011, pp. 199–208.
Google Scholar
L. K. Fleischer and Y.-H. Lyu, “Approximately optimal auctions for selling privacy when costs are correlated with data,” in Proceedings of the 13th ACM Conference on Electronic Commerce. ACM, 2012, pp. 568–585.
Google Scholar
K. Ligett and A. Roth, “Take it or leave it: Running a survey when privacy comes at a cost,” in Internet and Network Economics. Springer, 2012, pp. 378–391.
Google Scholar
K. Nissim, S. Vadhan, and D. Xiao, “Redrawing the boundaries on purchasing data from privacy-sensitive individuals,” in Proceedings of the 5th conference on Innovations in theoretical computer science. ACM, 2014, pp. 411–422.
Google Scholar
C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez, “For sale: your data: by: you,” in Proceedings of the 10th ACM WORKSHOP on Hot Topics in Networks. ACM, 2011, p. 13.
Google Scholar
A. Meliou, W. Gatterbauer, and D. Suciu, “Reverse data management,” Proceedings of the VLDB Endowment, vol. 4, no. 12, 2011.
Google Scholar
B. Glavic, J. Siddique, P. Andritsos, and R. J. Miller, “Provenance for data mining,” in Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX Association, 2013, p. 5.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Lei Xu
Tsinghua Space Center, Tsinghua University, Beijing, China
Chunxiao Jiang
Peter Kiewit Institute 206B, University of Nebraska-Lincoln, Omaha, Nebraska, USA
Yi Qian
Department of Electronic Engineering, Tsinghua University, Beijing, China
Yong Ren

Authors

Lei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chunxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yong Ren
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, L., Jiang, C., Qian, Y., Ren, Y. (2018). The Conflict Between Big Data and Individual Privacy. In: Data Privacy Games. Springer, Cham. https://doi.org/10.1007/978-3-319-77965-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-77965-2_1
Published: 25 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77964-5
Online ISBN: 978-3-319-77965-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics