Abstract
With the growing popularity of big data applications, data mining technologies has attracted more and more attention in recent years. In the meantime, the fact that data mining may bring serious threat to individual privacy has become a major concern. How to deal with the conflict between big data and individual privacy is an urgent issue. In this chapter, we review the privacy issues related to data mining in a systematic way, and investigate various approaches that can help to protect privacy. According to the basic procedure of data mining, we identify four different types of users involved in big data applications, namely data provider, data collector, data miner and decision maker. For each type of user, we discuss its privacy concerns and the methods it can adopt to protect sensitive information. Basics of related research topics are introduced, and state-of-the-art approaches are reviewed. We also present some preliminary thoughts on future research directions. Specifically, we emphasize the game theoretical approaches that are proposed for analyzing the interactions among different users in a data mining scenario. By differentiating the responsibilities of different users with respect to information security, we’d like to provide some useful insights into the trade-off between data exploration and privacy protection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques. Morgan Kaufmann, 2006.
L. Brankovic and V. Estivill-Castro, “Privacy issues in knowledge discovery and data mining,” in Australian institute of computer ethics conference, 1999, pp. 89–99.
R. Agrawal and R. Srikant, “Privacy-preserving data mining,” SIGMOD Rec., vol. 29, no. 2, pp. 439–450, 2000.
Y. Lindell and B. Pinkas, “Privacy preserving data mining,” in Advances in Cryptology-CRYPTO 2000. Springer, 2000, pp. 36–54.
C. C. Aggarwal and S. Y. Philip, A general survey of privacy-preserving data mining models and algorithms. Springer, 2008.
N. Nethravathi, V. J. Desai, P. D. Shenoy, M. Indiramma, and K. Venugopal, “A brief survey on privacy preserving data mining techniques,” Data Mining and Knowledge Engineering, vol. 8, no. 9, pp. 267–273, 2016.
L. Xu, C. Jiang, Y. Chen, J. Wang, and Y. Ren, “A framework for categorizing and applying privacy-preservation techniques in big data mining,” Computer, vol. 49, no. 2, pp. 54–62, Feb 2016.
L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: Privacy and data mining,” IEEE Access, vol. 2, pp. 1149–1176, 2014.
E. Rasmusen and B. Blackwell, Games and information. Cambridge, 1994, vol. 2.
R. Gibbons, A primer in game theory. Harvester Wheatsheaf Hertfordshire, 1992.
D. C. Parkes, “Iterative combinatorial auctions: Achieving economic and computational efficiency,” Ph.D. dissertation, Philadelphia, PA, USA, 2001.
S. Carter, “Techniques to pollute electronic profiling,” Apr. 26 2007, US Patent App. 11/257,614. [Online]. Available: https://www.google.com/patents/US20070094738
V. C. Inc., “2013 data breach investigations report,” 2013. [Online]. Available: http://www.verizonenterprise.com/resources/reports/rp_data-breach-investigations-report-2013_en_xg.pdf
A. Narayanan and V. Shmatikov, “How to break anonymity of the netflix prize data set,” The University of Texas at Austin, 2007.
B. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Computing Surveys (CSUR), vol. 42, no. 4, p. 14, 2010.
L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.
B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” ACM SIGKDD Explorations Newsletter, vol. 10, no. 2, pp. 12–22, 2008.
X. Wu, X. Ying, K. Liu, and L. Chen, “A survey of privacy-preservation of graphs and social networks,” in Managing and mining graph data. Springer, 2010, pp. 421–453.
S. Sharma, P. Gupta, and V. Bhatnagar, “Anonymisation in social network: a literature survey and classification,” International Journal of Social Network Mining, vol. 1, no. 1, pp. 51–66, 2012.
W. Peng, F. Li, X. Zou, and J. Wu, “A two-stage deanonymization attack against anonymized social networks,” Computers, IEEE Transactions on, vol. 63, no. 2, pp. 290–303, Feb 2014.
C. Sun, P. S. Yu, X. Kong, and Y. Fu, “Privacy preserving social network publication against mutual friend attacks,” arXiv preprint arXiv:1401.3201, 2013.
C. Tai, P. Yu, D. Yang, and M. Chen, “Structural diversity for resisting community identification in published social networks,” 2013.
M. Hafez Ninggal and J. Abawajy, “Attack vector analysis and privacy-preserving social network data publishing,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 2011, pp. 847–852.
N. Medforth and K. Wang, “Privacy risk in graph stream publishing for social network data,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011, pp. 437–446.
C. Tai, P. Tseng, P. Yu, and M. Chen, “Identity protection in sequential releases of dynamic social networks,” 2013.
G. Ghinita, Privacy for Location-based Services, ser. Synthesis Lectures on Information Security, Privacy, and Trust. Morgan & Claypool Publishers, 2013.
M. Wernke, P. Skvortsov, F. Dürr, and K. Rothermel, “A classification of location privacy attacks and approaches,” Personal and Ubiquitous Computing, vol. 18, no. 1, pp. 163–175, 2014.
M. Terrovitis and N. Mamoulis, “Privacy preservation in the publication of trajectories,” in Mobile Data Management, 2008. MDM’08. 9th International Conference on. IEEE, 2008, pp. 65–72.
R. Chen, B. Fung, N. Mohammed, B. C. Desai, and K. Wang, “Privacy-preserving trajectory data publishing by local suppression,” Information Sciences, vol. 231, pp. 83–97, 2013.
M. Ghasemzadeh, B. Fung, R. Chen, and A. Awasthi, “Anonymizing trajectory data for passenger flow analysis,” Transportation Research Part C: Emerging Technologies, vol. 39, pp. 63–79, 2014.
G. Poulis, S. Skiadopoulos, G. Loukides, and A. Gkoulalas-Divanis, “Distance-based kˆ m-anonymization of trajectory data,” in Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, vol. 2. IEEE, 2013, pp. 57–62.
F. Bonchi, L. V. Lakshmanan, and H. W. Wang, “Trajectory anonymity in publishing personal mobility data,” ACM Sigkdd Explorations Newsletter, vol. 13, no. 1, pp. 30–42, 2011.
X. Xiao and Y. Tao, “Personalized privacy preservation,” in Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 2006, pp. 229–240.
B. Wang and J. Yang, “Personalized (α, k)-anonymity algorithm based on entropy classification,” Journal of Computational Information Systems, vol. 8, no. 1, pp. 259–266, 2012.
Y. Xua, X. Qina, Z. Yanga, Y. Yanga, and K. Lia, “A personalized k-anonymity privacy preserving method,” 2013.
S. Yang, L. Lijie, Z. Jianpei, and Y. Jing, “Method for individualized privacy preservation.” International Journal of Security & Its Applications, vol. 7, no. 6, 2013.
A. Halevy, A. Rajaraman, and J. Ordille, “Data integration: the teenage years,” in Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, 2006, pp. 9–16.
V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,” ACM Sigmod Record, vol. 33, no. 1, pp. 50–57, 2004.
R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 207–216.
V. S. Verykios, “Association rule hiding methods,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 28–36, 2013.
D. Jain, P. Khatri, R. Soni, and B. K. Chaurasia, “Hiding sensitive association rules without altering the support of sensitive item (s),” in Advances in Computer Science and Information Technology. Networks and Communications. Springer, 2012, pp. 500–509.
M. N. Dehkordi, “A novel association rule hiding approach in olap data cubes,” Indian Journal of Science & Technology, vol. 6, no. 2, 2013.
J. Bonam, A. R. Reddy, and G. Kalyani, “Privacy preserving in association rule mining by data distortion using pso,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol II. Springer, 2014, pp. 551–558.
T. Mielikäinen, “On inverse frequent set mining,” in Workshop on Privacy Preserving Data Mining, 2003, pp. 18–23.
X. Chen and M. Orlowska, “A further study on inverse frequent set mining,” in Advanced Data Mining and Applications. Springer, 2005, pp. 753–760.
Y. Guo, “Reconstruction-based association rule hiding,” in Proceedings of SIGMOD2007 Ph. D. Workshop on Innovative Database Research, vol. 2007, 2007, pp. 51–56.
J. Brickell and V. Shmatikov, “Privacy-preserving classifier learning,” in Financial Cryptography and Data Security. Springer, 2009, pp. 128–147.
P. K. Fong and J. H. Weber-Jahnke, “Privacy preserving decision tree learning using unrealized data sets,” Knowledge and Data Engineering, IEEE Transactions on, vol. 24, no. 2, pp. 353–364, 2012.
M. A. Sheela and K. Vijayalakshmi, “A novel privacy preserving decision tree induction,” in Information & Communication Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013, pp. 1075–1079.
O. Goldreich, “Secure multi-party computation,” Manuscript. Preliminary version, 1998. [Online]. Available: http://www.wisdom.weizmann.ac.il/~oded/PS/prot.ps
M. E. Skarkala, M. Maragoudakis, S. Gritzalis, and L. Mitrou, “Privacy preserving tree augmented naïve Bayesian multi-party implementation on horizontally partitioned databases,” in Trust, Privacy and Security in Digital Business. Springer, 2011, pp. 62–73.
F. Zheng and G. I. Webb, “Tree augmented naive bayes,” in Encyclopedia of Machine Learning. Springer, 2010, pp. 990–991.
J. Vaidya, B. Shafiq, A. Basu, and Y. Hong, “Differentially private naive bayes classification,” in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1. IEEE, 2013, pp. 571–576.
C. Dwork, “Differential privacy,” in Automata, languages and programming. Springer, 2006, pp. 1–12.
H. Xia, Y. Fu, J. Zhou, and Y. Fang, “Privacy-preserving svm classifier with hyperbolic tangent kernel,” Journal of Computational Information Systems6, vol. 5, pp. 1415–1420, 2010.
K.-P. Lin and M.-S. Chen, “On the design and analysis of the privacy-preserving svm classifier,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 11, pp. 1704–1717, 2011.
R. Rajalaxmi and A. Natarajan, “An effective data transformation approach for privacy preserving clustering,” Journal of Computer Science, vol. 4, no. 4, p. 320, 2008.
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28–34, 2002.
J. Vaidya and C. Clifton, “Privacy-preserving k-means clustering over vertically partitioned data,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003, pp. 206–215.
S. Jha, L. Kruger, and P. McDaniel, “Privacy preserving clustering,” in Computer Security–ESORICS 2005. Springer, 2005, pp. 397–417.
R. Akhter, R. J. Chowdhury, K. Emura, T. Islam, M. S. Rahman, and N. Rubaiyat, “Privacy-preserving two-party k-means clustering in malicious model,” in Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual. IEEE, 2013, pp. 121–126.
I. De and A. Tripathy, “A secure two party hierarchical clustering approach for vertically partitioned data set with accuracy measure,” in Recent Advances in Intelligent Informatics. Springer, 2014, pp. 153–162.
Y. L. Simmhan, B. Plale, and D. Gannon, “A survey of data provenance in e-science,” ACM Sigmod Record, vol. 34, no. 3, pp. 31–36, 2005.
O. Hartig, “Provenance information in the web of data.” in LDOW, 2009.
L. Moreau, “The foundations for provenance on the web,” Foundations and Trends in Web Science, vol. 2, no. 2–3, pp. 99–241, 2010.
G. Barbier, Z. Feng, P. Gundecha, and H. Liu, “Provenance data in social media,” Synthesis Lectures on Data Mining and Knowledge Discovery, vol. 4, no. 1, pp. 1–84, 2013.
M. Tudjman and N. Mikelic, “Information science: Science about information, misinformation and disinformation,” Proceedings of Informing Science+ Information Technology Education, pp. 1513–1527, 2003.
M. J. Metzger, “Making sense of credibility on the web: Models for evaluating online information and recommendations for future research,” Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 2078–2091, 2007.
F. Yang, Y. Liu, X. Yu, and M. Yang, “Automatic detection of rumor on sina weibo,” in Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012, p. 13.
S. Sun, H. Liu, J. He, and X. Du, “Detecting event rumors on sina weibo automatically,” in Web Technologies and Applications. Springer, 2013, pp. 120–131.
R. K. Adl, M. Askari, K. Barker, and R. Safavi-Naini, “Privacy consensus in anonymization systems via game theory,” in Data and Applications Security and Privacy XXVI. Springer, 2012, pp. 74–89.
K. Barker, J. Denzinger, and R. Karimi Adl, “A negotiation game: Establishing stable privacy policies for aggregate reasoning,” 2012.
H. Kargupta, K. Das, and K. Liu, “Multi-party, privacy-preserving distributed data mining using a game theoretic framework,” in Knowledge Discovery in Databases: PKDD 2007. Springer, 2007, pp. 523–531.
A. Miyaji and M. S. Rahman, “Privacy-preserving data mining: a game-theoretic approach,” in Data and Applications Security and Privacy XXV. Springer, 2011, pp. 186–200.
X. Ge, L. Yan, J. Zhu, and W. Shi, “Privacy-preserving distributed association rule mining based on the secret sharing technique,” in Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on. IEEE, 2010, pp. 345–350.
N. R. Nanavati and D. C. Jinwala, “A novel privacy preserving game theoretic repeated rational secret sharing scheme for distributed data mining,” dcj, vol. 91, p. 9426611777, 2013.
M. Halkidi and I. Koutsopoulos, “A game theoretic framework for data privacy preservation in recommender systems,” in Machine Learning and Knowledge Discovery in Databases. Springer, 2011, pp. 629–644.
S. Ioannidis and P. Loiseau, “Linear regression as a non-cooperative game,” in Web and Internet Economics. Springer, 2013, pp. 277–290.
S. L. Chakravarthy, V. V. Kumari, and C. Sarojini, “A coalitional game theoretic mechanism for privacy preserving publishing based on< i> k</i>-anonymity,” Procedia Technology, vol. 6, pp. 889–896, 2012.
R. Nix and M. Kantarciouglu, “Incentive compatible privacy-preserving distributed classification,” Dependable and Secure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 451–462, 2012.
M. Kantarcioglu and W. Jiang, “Incentive compatible privacy-preserving data analysis,” Knowledge and Data Engineering, IEEE Transactions on, vol. 25, no. 6, pp. 1323–1335, 2013.
A. Panoui, S. Lambotharan, and R. C.-W. Phan, “Vickrey-clarke-groves for privacy-preserving collaborative classification,” in Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. IEEE, 2013, pp. 123–128.
A. Ghosh and A. Roth, “Selling privacy at auction,” in Proceedings of the 12th ACM conference on Electronic commerce. ACM, 2011, pp. 199–208.
L. K. Fleischer and Y.-H. Lyu, “Approximately optimal auctions for selling privacy when costs are correlated with data,” in Proceedings of the 13th ACM Conference on Electronic Commerce. ACM, 2012, pp. 568–585.
K. Ligett and A. Roth, “Take it or leave it: Running a survey when privacy comes at a cost,” in Internet and Network Economics. Springer, 2012, pp. 378–391.
K. Nissim, S. Vadhan, and D. Xiao, “Redrawing the boundaries on purchasing data from privacy-sensitive individuals,” in Proceedings of the 5th conference on Innovations in theoretical computer science. ACM, 2014, pp. 411–422.
C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez, “For sale: your data: by: you,” in Proceedings of the 10th ACM WORKSHOP on Hot Topics in Networks. ACM, 2011, p. 13.
A. Meliou, W. Gatterbauer, and D. Suciu, “Reverse data management,” Proceedings of the VLDB Endowment, vol. 4, no. 12, 2011.
B. Glavic, J. Siddique, P. Andritsos, and R. J. Miller, “Provenance for data mining,” in Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX Association, 2013, p. 5.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Xu, L., Jiang, C., Qian, Y., Ren, Y. (2018). The Conflict Between Big Data and Individual Privacy. In: Data Privacy Games. Springer, Cham. https://doi.org/10.1007/978-3-319-77965-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-77965-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77964-5
Online ISBN: 978-3-319-77965-2
eBook Packages: Computer ScienceComputer Science (R0)