Skip to main content

Knowledge Hiding in Decision Trees for Learning Analytics Applications

  • Chapter
  • First Online:
  • 449 Accesses

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 14))

Abstract

Nowadays there is a wide range of digital information available to educational institutions regarding learners, including performance records, educational resources, student attendance, feedback on the course material, evaluations of courses and social network data. Although collecting, using, and sharing educational data do offer substantial potential, the privacy-sensitivity of the data raises legitimate privacy concerns. The sharing of data among education organizations has become an increasingly common procedure. However, any organization will most likely try to keep some patterns hidden if it must share its datasets with others. This chapter focuses on preserving the privacy of sensitive patterns when inducing decision trees and demonstrates the application of a heuristic to an educational data set. The employed heuristic hiding method allows the sanitized raw data to be readily available for public use and, thus, is preferable over other heuristic solutions, like output perturbation or cryptographic techniques, which limit the usability of the data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. L. Cranor, T. Rabin, V. Shmatikov, S. Vadhan, D. Weitzner, Towards a privacy research roadmap for the computing community, in Computing Community Consortium committee of the Computing Research Association, Washington, DC, USA, White Paper (2015)

    Google Scholar 

  2. Universal Declaration of Human Rights, United Nation General Assembly (New York, NY, USA, 1948), pp. 1–6. http://www.un.org/en/documents/udhr/

  3. S. Yu, Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access. 4, 2751–2763 (2016)

    Article  Google Scholar 

  4. S. Laughlin, A. Westin, Privacy and freedom. Mich. Law Rev. 66, 1064 (1968)

    Article  Google Scholar 

  5. E. Bertino, D. Lin, W. Jiang, A survey of quantification of privacy preserving data mining algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 183–205

    Google Scholar 

  6. C.C. Aggarwal, P.S. Yu, A general survey of privacy-preserving data mining models and algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 11–52

    Google Scholar 

  7. C.C. Aggarwal, Data Mining: The Textbook (Springer, New York, NY, USA, 2015)

    MATH  Google Scholar 

  8. S. Dua, X. Du, Data Mining and Machine Learning in Cybersecurity (CRC Press, Boca Raton, FL, USA, 2011)

    MATH  Google Scholar 

  9. S. Fletcher, M. Islam, Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7, 21–28 (2014)

    Article  Google Scholar 

  10. R. Mendes, J. Vilela, Privacy-preserving data mining: methods, metrics, and applications. IEEE Access. 5, 10562–10582 (2017). https://doi.org/10.1109/ACCESS.2017.2706947

    Article  Google Scholar 

  11. A. Shah, R. Gulati, Privacy Preserving data mining: techniques, classification and implications—a survey. Int. J. Comput. Appl. 137, 40–46 (2016)

    Google Scholar 

  12. Y. Aldeen, M. Salleh, M. Razzaque, A comprehensive review on privacy preserving data mining. SpringerPlus 4 (2015)

    Google Scholar 

  13. E. Bertino, I.N. Fovino, Information driven evaluation of data hiding algorithms, in Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (2005), pp. 418–427

    Google Scholar 

  14. V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, Y. Theodoridis, State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33(1), 50–57 (2004)

    Google Scholar 

  15. A. Gkoulalas-Divanis, V.S. Verykios, Association rule hiding for data mining, in Advances in Database Systems (Springer US, 2010). https://doi.org/10.1007/978-1-4419-6569-1

  16. R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM SIGMOD Rec. 29, 439–450 (2000)

    Article  Google Scholar 

  17. P. Lindell, Privacy preserving data mining. J. Cryptol. 15, 177–206 (2002)

    Article  MathSciNet  Google Scholar 

  18. A. Pardo, G. Siemens, Ethical and privacy principles for learning analytics. Br. J. Edu. Technol. 45, 438–450 (2014)

    Article  Google Scholar 

  19. L.P. Macfadyen, S. Dawson, A. Pardo, D. Gasevic, Embracing big data in complex educational systems: the learning analytics imperative and the policy challenge. Res. Pract. Assess. 9 (2014)

    Google Scholar 

  20. G. Siemens, P. Long, Penetrating the fog: analytics in learning and education. Educ. Rev. 48(5), 31–40 (2011)

    Google Scholar 

  21. Y. Lou, P. Abrami, J. Spence, C. Poulsen, B. Chambers, S. d’Apollonia, Within-class grouping: a meta-analysis. Rev. Educ. Res. 66, 423–458 (1996)

    Article  Google Scholar 

  22. EUP, Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (European Union, European Parliament, 2002)

    Google Scholar 

  23. T.W. House, Consumer data privacy in a networked world. Retrieved 13 April 2013 (2012)

    Google Scholar 

  24. M. Crook, The risks of absolute medical confidentiality. Sci. Eng. Ethics 19, 107–122 (2011)

    Article  Google Scholar 

  25. H. Nissenbaum, Privacy as contextual integrity. Wash. Law Rev. 79(1), 101–139 (2004)

    Google Scholar 

  26. H. Drachsler, S. Dietze, E. Herder, M. d’Aquin, D. Taibi, The learning analytics & knowledge (LAK) data challenge 2014, in Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (ACM, 2014), pp. 289–290

    Google Scholar 

  27. M. Gursoy, A. Inan, M. Nergiz, Y. Saygin, Privacy-preserving learning analytics: challenges and techniques. IEEE Trans. Learn. Technol. 10, 68–81 (2017)

    Article  Google Scholar 

  28. V. Mayer-Schonberger, K. Cukier, Learning with Big Data: The Future of Education (Houghton Mifflin Harcourt, 2014)

    Google Scholar 

  29. P. Ice, S. Díaz, K. Swan, M. Burgess, M. Sharkey, J. Sherrill, D. Huston, H. Okimoto, The PAR framework proof of concept: initial findings from a multi-institutional analysis of federated postsecondary data. Online Learn. 16 (2012)

    Google Scholar 

  30. G. Siemens, R.S. d Baker, Learning analytics and educational data mining: towards communication and collaboration, in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (ACM, 2012), pp. 252–254

    Google Scholar 

  31. J. Heath, Contemporary privacy theory contributions to learning analytics. J. Learn. Anal. 1(1), 140–149 (2014)

    Article  Google Scholar 

  32. S. Slade, P. Prinsloo, Learning analytics. Am. Behav. Sci. 57(10), 1510–1529 (2013)

    Article  Google Scholar 

  33. P. Prinsloo, S. Slade, An evaluation of policy frameworks for addressing ethical considerations in learning analytics, in Proceedings of the Third International Conference on Learning Analytics and Knowledge (ACM, 2013), pp. 240–244

    Google Scholar 

  34. K. Verbert, H. Drachsler, N. Manouselis, M. Wolpers, R. Vuorikari, E. Duval, Dataset-driven research for improving recommender systems for learning, in Proceedings of the 1st International Conference on Learning Analytics and Knowledge (ACM Press, New York, USA, 2011), pp. 44–53. https://doi.org/10.1145/2090116.2090122

  35. L. Chang, I. Moskowitz, Parsimonious downgrading and decision trees applied to the inference problem, in Proceedings of the 1998 Workshop on New Security Paradigms—NSPW ‘98, Charlottesville, VA, USA, 22–26 September (1998)

    Google Scholar 

  36. J. Natwichai, X. Li, M. Orlowska, Hiding classification rules for data sharing with privacy preservation, in Proceedings of the 7th International Conference, DaWak 2005, Copenhagen, Denmark, 22–26 August (2005), pp. 468–467

    Google Scholar 

  37. J. Natwichai, X. Li, M. Orlowska, A reconstruction-based algorithm for classification rules hiding, in Proceedings of 17th Australasian Database Conference, (ADC2006), Hobart, Tasmania, Australia, 16–19 January (2006), pp. 49–58

    Google Scholar 

  38. J. Quinlan, C4.5 (Morgan Kaufmann Publishers, San Mateo, California, 1993)

    Google Scholar 

  39. W.W. Cohen, Fast, effective rule induction, in Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July (1995)

    Google Scholar 

  40. A. Katsarou, A. Gkouvalas-Divanis, V.S. Verykios, Reconstruction-based classification rule hiding through controlled data modification, in Artificial Intelligence Applications and Innovations III, vol. 296, ed. by L. Iliadis, I. Vlahavas, M. Bramer (Springer, Boston, MA, USA, 2009), pp. 449–458

    Google Scholar 

  41. J. Natwichai, X. Sun, X. Li, Data reduction approach for sensitive associative classification rule hiding, in Proceedings of the 19th Australian Database Conference, Wollongong, NSW, Australia, 22–25 January (2008)

    Google Scholar 

  42. K. Wang, B.C. Fung, P.S. Yu, Template-based privacy preservation in classification problems, in Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, Texas, 27–30 November (2005)

    Google Scholar 

  43. A. Delis, V. Verykios, A. Tsitsonis, A data perturbation approach to sensitive classification rule hiding, in Proceedings of the 2010 ACM Symposium on Applied Computing—SAC ‘10, Sierre, Switzerland, 22–26 March (2010)

    Google Scholar 

  44. R. Bost, R. Popa, S. Tu, S. Goldwasser, Machine learning classification over encrypted data, in Proceedings of the 2015 Network And Distributed System Security Symposium, San Diego, CA, USA, 8–11 February (2015)

    Google Scholar 

  45. R. Tai, J. Ma, Y. Zhao, S. Chow, Privacy-preserving decision trees evaluation via linear functions. Comput. Secur. ESORICS, 494–512 (2017). https://doi.org/10.1007/978-3-319-66399-9_27

  46. D. Kalles, V.S. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the Twenty-second European Conference on Artificial Intelligence, Hague, The Netherlands, 29 August–2 September (2016)

    Google Scholar 

  47. D. Kalles, V. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the 1St International Workshop on AI for Privacy and Security—Praise ‘16, Hague, The Netherlands, 29–30 August (2016)

    Google Scholar 

  48. G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21, 66 (2019)

    Article  Google Scholar 

  49. G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for efficient hiding of decision tree rules, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence—SETN ‘18, Patras, Greece, 9–12 July (2018)

    Google Scholar 

  50. R. Li, D. de Vries, J. Roddick, Bands of privacy preserving objectives: classification of PPDM strategies, in Proceedings of the 9th Australasian Data Mining Conference, Ballarat, Australia, 1–2 December 2011 (2011) pp. 137–151

    Google Scholar 

  51. G. Feretzakis, D. Kalles, V. Verykios, Using minimum local distortion to hide decision tree rules. Entropy 21, 334 (2019)

    Article  MathSciNet  Google Scholar 

  52. G. Feretzakis, D. Kalles, V. Verykios, Hiding decision tree rules in medical data: a case study, in Proceedings of the 17th International Conference on Informatics, Management and Technology in Healthcare—ICIMTH ‘19, Athens, Greece, 5–7 July (2019)

    Google Scholar 

  53. D. Kalles, T. Morris, Efficient incremental induction of decision trees. Mach. Learn. 24, 231–242 (1996). https://doi.org/10.1007/bf00058613

    Article  Google Scholar 

  54. D. Kalles, A. Papagelis, Stable decision trees: using local anarchy for efficient incremental learning. Int. J. Artif. Intell. Tools 9, 79–95 (2000). https://doi.org/10.1142/s0218213000000070

    Article  Google Scholar 

  55. D. Kalles, A. Papagelis, Lossless fitness inheritance in genetic algorithms for decision trees. Soft. Comput. 14, 973–993 (2009). https://doi.org/10.1007/s00500-009-0489-y

    Article  Google Scholar 

  56. J.R. Quinlan, Induction of decision trees, in Machine Learning 1 (Kluwer Academic Publishers, Boston, MA, USA, 1986), pp. 81–106

    Google Scholar 

  57. D. Dua, C. Karra Graff, UCI machine learning repository (The University of California, School of Information and Computer Science, Irvine, CA, 2019). http://archive.ics.uci.edu/ml. Accessed 16 April 2019

  58. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  59. I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011)

    Google Scholar 

  60. G. Feretzakis, Local Distortion Hiding in Financial Technology Application: A Case Study with a Benchmark Data Set. http://www.learningalgorithm.eu/datafiles_GermanCredit.html. Accessed 16 April 2019

  61. J. Ellson, E. Gansner, L. Koutsofios, S.C. North, G. Woodhull, Graphviz—Open source graph drawing tools (2001). Graph Drawing. https://doi.org/10.1007/3-540-45848-4_57

  62. S.M. Vieira, U. Kaymak, J. M.C. Sousa, Cohen’s kappa coefficient as a performance measure for feature selection, in International Conference on Fuzzy Systems, Barcelona (2010), pp. 1–8. https://doi.org/10.1109/fuzzy.2010.5584447

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Feretzakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Feretzakis, G., Kalles, D., Verykios, V.S. (2021). Knowledge Hiding in Decision Trees for Learning Analytics Applications. In: Tsihrintzis, G., Virvou, M. (eds) Advances in Core Computer Science-Based Technologies. Learning and Analytics in Intelligent Systems, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-41196-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41196-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41195-4

  • Online ISBN: 978-3-030-41196-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics