Skip to main content

The Conflict Between Big Data and Individual Privacy

  • Chapter
  • First Online:
Data Privacy Games

Abstract

With the growing popularity of big data applications, data mining technologies has attracted more and more attention in recent years. In the meantime, the fact that data mining may bring serious threat to individual privacy has become a major concern. How to deal with the conflict between big data and individual privacy is an urgent issue. In this chapter, we review the privacy issues related to data mining in a systematic way, and investigate various approaches that can help to protect privacy. According to the basic procedure of data mining, we identify four different types of users involved in big data applications, namely data provider, data collector, data miner and decision maker. For each type of user, we discuss its privacy concerns and the methods it can adopt to protect sensitive information. Basics of related research topics are introduced, and state-of-the-art approaches are reviewed. We also present some preliminary thoughts on future research directions. Specifically, we emphasize the game theoretical approaches that are proposed for analyzing the interactions among different users in a data mining scenario. By differentiating the responsibilities of different users with respect to information security, we’d like to provide some useful insights into the trade-off between data exploration and privacy protection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.netflix.com .

References

  1. J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques. Morgan Kaufmann, 2006.

    Google Scholar 

  2. L. Brankovic and V. Estivill-Castro, “Privacy issues in knowledge discovery and data mining,” in Australian institute of computer ethics conference, 1999, pp. 89–99.

    Google Scholar 

  3. R. Agrawal and R. Srikant, “Privacy-preserving data mining,” SIGMOD Rec., vol. 29, no. 2, pp. 439–450, 2000.

    Google Scholar 

  4. Y. Lindell and B. Pinkas, “Privacy preserving data mining,” in Advances in Cryptology-CRYPTO 2000. Springer, 2000, pp. 36–54.

    Google Scholar 

  5. C. C. Aggarwal and S. Y. Philip, A general survey of privacy-preserving data mining models and algorithms. Springer, 2008.

    Google Scholar 

  6. N. Nethravathi, V. J. Desai, P. D. Shenoy, M. Indiramma, and K. Venugopal, “A brief survey on privacy preserving data mining techniques,” Data Mining and Knowledge Engineering, vol. 8, no. 9, pp. 267–273, 2016.

    Google Scholar 

  7. L. Xu, C. Jiang, Y. Chen, J. Wang, and Y. Ren, “A framework for categorizing and applying privacy-preservation techniques in big data mining,” Computer, vol. 49, no. 2, pp. 54–62, Feb 2016.

    Google Scholar 

  8. L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: Privacy and data mining,” IEEE Access, vol. 2, pp. 1149–1176, 2014.

    Google Scholar 

  9. E. Rasmusen and B. Blackwell, Games and information. Cambridge, 1994, vol. 2.

    Google Scholar 

  10. R. Gibbons, A primer in game theory. Harvester Wheatsheaf Hertfordshire, 1992.

    Google Scholar 

  11. D. C. Parkes, “Iterative combinatorial auctions: Achieving economic and computational efficiency,” Ph.D. dissertation, Philadelphia, PA, USA, 2001.

    Google Scholar 

  12. S. Carter, “Techniques to pollute electronic profiling,” Apr. 26 2007, US Patent App. 11/257,614. [Online]. Available: https://www.google.com/patents/US20070094738

  13. V. C. Inc., “2013 data breach investigations report,” 2013. [Online]. Available: http://www.verizonenterprise.com/resources/reports/rp_data-breach-investigations-report-2013_en_xg.pdf

  14. A. Narayanan and V. Shmatikov, “How to break anonymity of the netflix prize data set,” The University of Texas at Austin, 2007.

    Google Scholar 

  15. B. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Computing Surveys (CSUR), vol. 42, no. 4, p. 14, 2010.

    Google Scholar 

  16. L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.

    Google Scholar 

  17. B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” ACM SIGKDD Explorations Newsletter, vol. 10, no. 2, pp. 12–22, 2008.

    Google Scholar 

  18. X. Wu, X. Ying, K. Liu, and L. Chen, “A survey of privacy-preservation of graphs and social networks,” in Managing and mining graph data. Springer, 2010, pp. 421–453.

    Google Scholar 

  19. S. Sharma, P. Gupta, and V. Bhatnagar, “Anonymisation in social network: a literature survey and classification,” International Journal of Social Network Mining, vol. 1, no. 1, pp. 51–66, 2012.

    Google Scholar 

  20. W. Peng, F. Li, X. Zou, and J. Wu, “A two-stage deanonymization attack against anonymized social networks,” Computers, IEEE Transactions on, vol. 63, no. 2, pp. 290–303, Feb 2014.

    Google Scholar 

  21. C. Sun, P. S. Yu, X. Kong, and Y. Fu, “Privacy preserving social network publication against mutual friend attacks,” arXiv preprint arXiv:1401.3201, 2013.

    Google Scholar 

  22. C. Tai, P. Yu, D. Yang, and M. Chen, “Structural diversity for resisting community identification in published social networks,” 2013.

    Google Scholar 

  23. M. Hafez Ninggal and J. Abawajy, “Attack vector analysis and privacy-preserving social network data publishing,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 2011, pp. 847–852.

    Google Scholar 

  24. N. Medforth and K. Wang, “Privacy risk in graph stream publishing for social network data,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011, pp. 437–446.

    Google Scholar 

  25. C. Tai, P. Tseng, P. Yu, and M. Chen, “Identity protection in sequential releases of dynamic social networks,” 2013.

    Google Scholar 

  26. G. Ghinita, Privacy for Location-based Services, ser. Synthesis Lectures on Information Security, Privacy, and Trust. Morgan & Claypool Publishers, 2013.

    Google Scholar 

  27. M. Wernke, P. Skvortsov, F. Dürr, and K. Rothermel, “A classification of location privacy attacks and approaches,” Personal and Ubiquitous Computing, vol. 18, no. 1, pp. 163–175, 2014.

    Google Scholar 

  28. M. Terrovitis and N. Mamoulis, “Privacy preservation in the publication of trajectories,” in Mobile Data Management, 2008. MDM’08. 9th International Conference on. IEEE, 2008, pp. 65–72.

    Google Scholar 

  29. R. Chen, B. Fung, N. Mohammed, B. C. Desai, and K. Wang, “Privacy-preserving trajectory data publishing by local suppression,” Information Sciences, vol. 231, pp. 83–97, 2013.

    Google Scholar 

  30. M. Ghasemzadeh, B. Fung, R. Chen, and A. Awasthi, “Anonymizing trajectory data for passenger flow analysis,” Transportation Research Part C: Emerging Technologies, vol. 39, pp. 63–79, 2014.

    Google Scholar 

  31. G. Poulis, S. Skiadopoulos, G. Loukides, and A. Gkoulalas-Divanis, “Distance-based kˆ m-anonymization of trajectory data,” in Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, vol. 2. IEEE, 2013, pp. 57–62.

    Google Scholar 

  32. F. Bonchi, L. V. Lakshmanan, and H. W. Wang, “Trajectory anonymity in publishing personal mobility data,” ACM Sigkdd Explorations Newsletter, vol. 13, no. 1, pp. 30–42, 2011.

    Google Scholar 

  33. X. Xiao and Y. Tao, “Personalized privacy preservation,” in Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 2006, pp. 229–240.

    Google Scholar 

  34. B. Wang and J. Yang, “Personalized (α, k)-anonymity algorithm based on entropy classification,” Journal of Computational Information Systems, vol. 8, no. 1, pp. 259–266, 2012.

    Google Scholar 

  35. Y. Xua, X. Qina, Z. Yanga, Y. Yanga, and K. Lia, “A personalized k-anonymity privacy preserving method,” 2013.

    Google Scholar 

  36. S. Yang, L. Lijie, Z. Jianpei, and Y. Jing, “Method for individualized privacy preservation.” International Journal of Security & Its Applications, vol. 7, no. 6, 2013.

    Google Scholar 

  37. A. Halevy, A. Rajaraman, and J. Ordille, “Data integration: the teenage years,” in Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, 2006, pp. 9–16.

    Google Scholar 

  38. V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,” ACM Sigmod Record, vol. 33, no. 1, pp. 50–57, 2004.

    Google Scholar 

  39. R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 207–216.

    Google Scholar 

  40. V. S. Verykios, “Association rule hiding methods,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 28–36, 2013.

    Google Scholar 

  41. D. Jain, P. Khatri, R. Soni, and B. K. Chaurasia, “Hiding sensitive association rules without altering the support of sensitive item (s),” in Advances in Computer Science and Information Technology. Networks and Communications. Springer, 2012, pp. 500–509.

    Google Scholar 

  42. M. N. Dehkordi, “A novel association rule hiding approach in olap data cubes,” Indian Journal of Science & Technology, vol. 6, no. 2, 2013.

    Google Scholar 

  43. J. Bonam, A. R. Reddy, and G. Kalyani, “Privacy preserving in association rule mining by data distortion using pso,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol II. Springer, 2014, pp. 551–558.

    Google Scholar 

  44. T. Mielikäinen, “On inverse frequent set mining,” in Workshop on Privacy Preserving Data Mining, 2003, pp. 18–23.

    Google Scholar 

  45. X. Chen and M. Orlowska, “A further study on inverse frequent set mining,” in Advanced Data Mining and Applications. Springer, 2005, pp. 753–760.

    Google Scholar 

  46. Y. Guo, “Reconstruction-based association rule hiding,” in Proceedings of SIGMOD2007 Ph. D. Workshop on Innovative Database Research, vol. 2007, 2007, pp. 51–56.

    Google Scholar 

  47. J. Brickell and V. Shmatikov, “Privacy-preserving classifier learning,” in Financial Cryptography and Data Security. Springer, 2009, pp. 128–147.

    Google Scholar 

  48. P. K. Fong and J. H. Weber-Jahnke, “Privacy preserving decision tree learning using unrealized data sets,” Knowledge and Data Engineering, IEEE Transactions on, vol. 24, no. 2, pp. 353–364, 2012.

    Google Scholar 

  49. M. A. Sheela and K. Vijayalakshmi, “A novel privacy preserving decision tree induction,” in Information & Communication Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013, pp. 1075–1079.

    Google Scholar 

  50. O. Goldreich, “Secure multi-party computation,” Manuscript. Preliminary version, 1998. [Online]. Available: http://www.wisdom.weizmann.ac.il/~oded/PS/prot.ps

  51. M. E. Skarkala, M. Maragoudakis, S. Gritzalis, and L. Mitrou, “Privacy preserving tree augmented naïve Bayesian multi-party implementation on horizontally partitioned databases,” in Trust, Privacy and Security in Digital Business. Springer, 2011, pp. 62–73.

    Google Scholar 

  52. F. Zheng and G. I. Webb, “Tree augmented naive bayes,” in Encyclopedia of Machine Learning. Springer, 2010, pp. 990–991.

    Google Scholar 

  53. J. Vaidya, B. Shafiq, A. Basu, and Y. Hong, “Differentially private naive bayes classification,” in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1. IEEE, 2013, pp. 571–576.

    Google Scholar 

  54. C. Dwork, “Differential privacy,” in Automata, languages and programming. Springer, 2006, pp. 1–12.

    Google Scholar 

  55. H. Xia, Y. Fu, J. Zhou, and Y. Fang, “Privacy-preserving svm classifier with hyperbolic tangent kernel,” Journal of Computational Information Systems6, vol. 5, pp. 1415–1420, 2010.

    Google Scholar 

  56. K.-P. Lin and M.-S. Chen, “On the design and analysis of the privacy-preserving svm classifier,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 11, pp. 1704–1717, 2011.

    Google Scholar 

  57. R. Rajalaxmi and A. Natarajan, “An effective data transformation approach for privacy preserving clustering,” Journal of Computer Science, vol. 4, no. 4, p. 320, 2008.

    Google Scholar 

  58. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28–34, 2002.

    Google Scholar 

  59. J. Vaidya and C. Clifton, “Privacy-preserving k-means clustering over vertically partitioned data,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003, pp. 206–215.

    Google Scholar 

  60. S. Jha, L. Kruger, and P. McDaniel, “Privacy preserving clustering,” in Computer Security–ESORICS 2005. Springer, 2005, pp. 397–417.

    Google Scholar 

  61. R. Akhter, R. J. Chowdhury, K. Emura, T. Islam, M. S. Rahman, and N. Rubaiyat, “Privacy-preserving two-party k-means clustering in malicious model,” in Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual. IEEE, 2013, pp. 121–126.

    Google Scholar 

  62. I. De and A. Tripathy, “A secure two party hierarchical clustering approach for vertically partitioned data set with accuracy measure,” in Recent Advances in Intelligent Informatics. Springer, 2014, pp. 153–162.

    Google Scholar 

  63. Y. L. Simmhan, B. Plale, and D. Gannon, “A survey of data provenance in e-science,” ACM Sigmod Record, vol. 34, no. 3, pp. 31–36, 2005.

    Google Scholar 

  64. O. Hartig, “Provenance information in the web of data.” in LDOW, 2009.

    Google Scholar 

  65. L. Moreau, “The foundations for provenance on the web,” Foundations and Trends in Web Science, vol. 2, no. 2–3, pp. 99–241, 2010.

    Google Scholar 

  66. G. Barbier, Z. Feng, P. Gundecha, and H. Liu, “Provenance data in social media,” Synthesis Lectures on Data Mining and Knowledge Discovery, vol. 4, no. 1, pp. 1–84, 2013.

    Google Scholar 

  67. M. Tudjman and N. Mikelic, “Information science: Science about information, misinformation and disinformation,” Proceedings of Informing Science+ Information Technology Education, pp. 1513–1527, 2003.

    Google Scholar 

  68. M. J. Metzger, “Making sense of credibility on the web: Models for evaluating online information and recommendations for future research,” Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 2078–2091, 2007.

    Google Scholar 

  69. F. Yang, Y. Liu, X. Yu, and M. Yang, “Automatic detection of rumor on sina weibo,” in Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012, p. 13.

    Google Scholar 

  70. S. Sun, H. Liu, J. He, and X. Du, “Detecting event rumors on sina weibo automatically,” in Web Technologies and Applications. Springer, 2013, pp. 120–131.

    Google Scholar 

  71. R. K. Adl, M. Askari, K. Barker, and R. Safavi-Naini, “Privacy consensus in anonymization systems via game theory,” in Data and Applications Security and Privacy XXVI. Springer, 2012, pp. 74–89.

    Google Scholar 

  72. K. Barker, J. Denzinger, and R. Karimi Adl, “A negotiation game: Establishing stable privacy policies for aggregate reasoning,” 2012.

    Google Scholar 

  73. H. Kargupta, K. Das, and K. Liu, “Multi-party, privacy-preserving distributed data mining using a game theoretic framework,” in Knowledge Discovery in Databases: PKDD 2007. Springer, 2007, pp. 523–531.

    Google Scholar 

  74. A. Miyaji and M. S. Rahman, “Privacy-preserving data mining: a game-theoretic approach,” in Data and Applications Security and Privacy XXV. Springer, 2011, pp. 186–200.

    Google Scholar 

  75. X. Ge, L. Yan, J. Zhu, and W. Shi, “Privacy-preserving distributed association rule mining based on the secret sharing technique,” in Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on. IEEE, 2010, pp. 345–350.

    Google Scholar 

  76. N. R. Nanavati and D. C. Jinwala, “A novel privacy preserving game theoretic repeated rational secret sharing scheme for distributed data mining,” dcj, vol. 91, p. 9426611777, 2013.

    Google Scholar 

  77. M. Halkidi and I. Koutsopoulos, “A game theoretic framework for data privacy preservation in recommender systems,” in Machine Learning and Knowledge Discovery in Databases. Springer, 2011, pp. 629–644.

    Google Scholar 

  78. S. Ioannidis and P. Loiseau, “Linear regression as a non-cooperative game,” in Web and Internet Economics. Springer, 2013, pp. 277–290.

    Google Scholar 

  79. S. L. Chakravarthy, V. V. Kumari, and C. Sarojini, “A coalitional game theoretic mechanism for privacy preserving publishing based on< i> k</i>-anonymity,” Procedia Technology, vol. 6, pp. 889–896, 2012.

    Google Scholar 

  80. R. Nix and M. Kantarciouglu, “Incentive compatible privacy-preserving distributed classification,” Dependable and Secure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 451–462, 2012.

    Google Scholar 

  81. M. Kantarcioglu and W. Jiang, “Incentive compatible privacy-preserving data analysis,” Knowledge and Data Engineering, IEEE Transactions on, vol. 25, no. 6, pp. 1323–1335, 2013.

    Google Scholar 

  82. A. Panoui, S. Lambotharan, and R. C.-W. Phan, “Vickrey-clarke-groves for privacy-preserving collaborative classification,” in Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. IEEE, 2013, pp. 123–128.

    Google Scholar 

  83. A. Ghosh and A. Roth, “Selling privacy at auction,” in Proceedings of the 12th ACM conference on Electronic commerce. ACM, 2011, pp. 199–208.

    Google Scholar 

  84. L. K. Fleischer and Y.-H. Lyu, “Approximately optimal auctions for selling privacy when costs are correlated with data,” in Proceedings of the 13th ACM Conference on Electronic Commerce. ACM, 2012, pp. 568–585.

    Google Scholar 

  85. K. Ligett and A. Roth, “Take it or leave it: Running a survey when privacy comes at a cost,” in Internet and Network Economics. Springer, 2012, pp. 378–391.

    Google Scholar 

  86. K. Nissim, S. Vadhan, and D. Xiao, “Redrawing the boundaries on purchasing data from privacy-sensitive individuals,” in Proceedings of the 5th conference on Innovations in theoretical computer science. ACM, 2014, pp. 411–422.

    Google Scholar 

  87. C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez, “For sale: your data: by: you,” in Proceedings of the 10th ACM WORKSHOP on Hot Topics in Networks. ACM, 2011, p. 13.

    Google Scholar 

  88. A. Meliou, W. Gatterbauer, and D. Suciu, “Reverse data management,” Proceedings of the VLDB Endowment, vol. 4, no. 12, 2011.

    Google Scholar 

  89. B. Glavic, J. Siddique, P. Andritsos, and R. J. Miller, “Provenance for data mining,” in Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX Association, 2013, p. 5.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xu, L., Jiang, C., Qian, Y., Ren, Y. (2018). The Conflict Between Big Data and Individual Privacy. In: Data Privacy Games. Springer, Cham. https://doi.org/10.1007/978-3-319-77965-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77965-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77964-5

  • Online ISBN: 978-3-319-77965-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics