Knowledge Hiding in Decision Trees for Learning Analytics Applications

  • Georgios FeretzakisEmail author
  • Dimitris Kalles
  • Vassilios S. Verykios
Part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 14)


Nowadays there is a wide range of digital information available to educational institutions regarding learners, including performance records, educational resources, student attendance, feedback on the course material, evaluations of courses and social network data. Although collecting, using, and sharing educational data do offer substantial potential, the privacy-sensitivity of the data raises legitimate privacy concerns. The sharing of data among education organizations has become an increasingly common procedure. However, any organization will most likely try to keep some patterns hidden if it must share its datasets with others. This chapter focuses on preserving the privacy of sensitive patterns when inducing decision trees and demonstrates the application of a heuristic to an educational data set. The employed heuristic hiding method allows the sanitized raw data to be readily available for public use and, thus, is preferable over other heuristic solutions, like output perturbation or cryptographic techniques, which limit the usability of the data.


  1. 1.
    L. Cranor, T. Rabin, V. Shmatikov, S. Vadhan, D. Weitzner, Towards a privacy research roadmap for the computing community, in Computing Community Consortium committee of the Computing Research Association, Washington, DC, USA, White Paper (2015)Google Scholar
  2. 2.
    Universal Declaration of Human Rights, United Nation General Assembly (New York, NY, USA, 1948), pp. 1–6.
  3. 3.
    S. Yu, Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access. 4, 2751–2763 (2016)CrossRefGoogle Scholar
  4. 4.
    S. Laughlin, A. Westin, Privacy and freedom. Mich. Law Rev. 66, 1064 (1968)CrossRefGoogle Scholar
  5. 5.
    E. Bertino, D. Lin, W. Jiang, A survey of quantification of privacy preserving data mining algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 183–205Google Scholar
  6. 6.
    C.C. Aggarwal, P.S. Yu, A general survey of privacy-preserving data mining models and algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 11–52Google Scholar
  7. 7.
    C.C. Aggarwal, Data Mining: The Textbook (Springer, New York, NY, USA, 2015)zbMATHGoogle Scholar
  8. 8.
    S. Dua, X. Du, Data Mining and Machine Learning in Cybersecurity (CRC Press, Boca Raton, FL, USA, 2011)zbMATHGoogle Scholar
  9. 9.
    S. Fletcher, M. Islam, Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7, 21–28 (2014)CrossRefGoogle Scholar
  10. 10.
    R. Mendes, J. Vilela, Privacy-preserving data mining: methods, metrics, and applications. IEEE Access. 5, 10562–10582 (2017). Scholar
  11. 11.
    A. Shah, R. Gulati, Privacy Preserving data mining: techniques, classification and implications—a survey. Int. J. Comput. Appl. 137, 40–46 (2016)Google Scholar
  12. 12.
    Y. Aldeen, M. Salleh, M. Razzaque, A comprehensive review on privacy preserving data mining. SpringerPlus 4 (2015)Google Scholar
  13. 13.
    E. Bertino, I.N. Fovino, Information driven evaluation of data hiding algorithms, in Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (2005), pp. 418–427Google Scholar
  14. 14.
    V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, Y. Theodoridis, State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33(1), 50–57 (2004)Google Scholar
  15. 15.
    A. Gkoulalas-Divanis, V.S. Verykios, Association rule hiding for data mining, in Advances in Database Systems (Springer US, 2010).
  16. 16.
    R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM SIGMOD Rec. 29, 439–450 (2000)CrossRefGoogle Scholar
  17. 17.
    P. Lindell, Privacy preserving data mining. J. Cryptol. 15, 177–206 (2002)MathSciNetCrossRefGoogle Scholar
  18. 18.
    A. Pardo, G. Siemens, Ethical and privacy principles for learning analytics. Br. J. Edu. Technol. 45, 438–450 (2014)CrossRefGoogle Scholar
  19. 19.
    L.P. Macfadyen, S. Dawson, A. Pardo, D. Gasevic, Embracing big data in complex educational systems: the learning analytics imperative and the policy challenge. Res. Pract. Assess. 9 (2014)Google Scholar
  20. 20.
    G. Siemens, P. Long, Penetrating the fog: analytics in learning and education. Educ. Rev. 48(5), 31–40 (2011)Google Scholar
  21. 21.
    Y. Lou, P. Abrami, J. Spence, C. Poulsen, B. Chambers, S. d’Apollonia, Within-class grouping: a meta-analysis. Rev. Educ. Res. 66, 423–458 (1996)CrossRefGoogle Scholar
  22. 22.
    EUP, Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (European Union, European Parliament, 2002)Google Scholar
  23. 23.
    T.W. House, Consumer data privacy in a networked world. Retrieved 13 April 2013 (2012)Google Scholar
  24. 24.
    M. Crook, The risks of absolute medical confidentiality. Sci. Eng. Ethics 19, 107–122 (2011)CrossRefGoogle Scholar
  25. 25.
    H. Nissenbaum, Privacy as contextual integrity. Wash. Law Rev. 79(1), 101–139 (2004)Google Scholar
  26. 26.
    H. Drachsler, S. Dietze, E. Herder, M. d’Aquin, D. Taibi, The learning analytics & knowledge (LAK) data challenge 2014, in Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (ACM, 2014), pp. 289–290Google Scholar
  27. 27.
    M. Gursoy, A. Inan, M. Nergiz, Y. Saygin, Privacy-preserving learning analytics: challenges and techniques. IEEE Trans. Learn. Technol. 10, 68–81 (2017)CrossRefGoogle Scholar
  28. 28.
    V. Mayer-Schonberger, K. Cukier, Learning with Big Data: The Future of Education (Houghton Mifflin Harcourt, 2014)Google Scholar
  29. 29.
    P. Ice, S. Díaz, K. Swan, M. Burgess, M. Sharkey, J. Sherrill, D. Huston, H. Okimoto, The PAR framework proof of concept: initial findings from a multi-institutional analysis of federated postsecondary data. Online Learn. 16 (2012)Google Scholar
  30. 30.
    G. Siemens, R.S. d Baker, Learning analytics and educational data mining: towards communication and collaboration, in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (ACM, 2012), pp. 252–254Google Scholar
  31. 31.
    J. Heath, Contemporary privacy theory contributions to learning analytics. J. Learn. Anal. 1(1), 140–149 (2014)CrossRefGoogle Scholar
  32. 32.
    S. Slade, P. Prinsloo, Learning analytics. Am. Behav. Sci. 57(10), 1510–1529 (2013)CrossRefGoogle Scholar
  33. 33.
    P. Prinsloo, S. Slade, An evaluation of policy frameworks for addressing ethical considerations in learning analytics, in Proceedings of the Third International Conference on Learning Analytics and Knowledge (ACM, 2013), pp. 240–244Google Scholar
  34. 34.
    K. Verbert, H. Drachsler, N. Manouselis, M. Wolpers, R. Vuorikari, E. Duval, Dataset-driven research for improving recommender systems for learning, in Proceedings of the 1st International Conference on Learning Analytics and Knowledge (ACM Press, New York, USA, 2011), pp. 44–53.
  35. 35.
    L. Chang, I. Moskowitz, Parsimonious downgrading and decision trees applied to the inference problem, in Proceedings of the 1998 Workshop on New Security Paradigms—NSPW ‘98, Charlottesville, VA, USA, 22–26 September (1998)Google Scholar
  36. 36.
    J. Natwichai, X. Li, M. Orlowska, Hiding classification rules for data sharing with privacy preservation, in Proceedings of the 7th International Conference, DaWak 2005, Copenhagen, Denmark, 22–26 August (2005), pp. 468–467Google Scholar
  37. 37.
    J. Natwichai, X. Li, M. Orlowska, A reconstruction-based algorithm for classification rules hiding, in Proceedings of 17th Australasian Database Conference, (ADC2006), Hobart, Tasmania, Australia, 16–19 January (2006), pp. 49–58Google Scholar
  38. 38.
    J. Quinlan, C4.5 (Morgan Kaufmann Publishers, San Mateo, California, 1993)Google Scholar
  39. 39.
    W.W. Cohen, Fast, effective rule induction, in Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July (1995)Google Scholar
  40. 40.
    A. Katsarou, A. Gkouvalas-Divanis, V.S. Verykios, Reconstruction-based classification rule hiding through controlled data modification, in Artificial Intelligence Applications and Innovations III, vol. 296, ed. by L. Iliadis, I. Vlahavas, M. Bramer (Springer, Boston, MA, USA, 2009), pp. 449–458Google Scholar
  41. 41.
    J. Natwichai, X. Sun, X. Li, Data reduction approach for sensitive associative classification rule hiding, in Proceedings of the 19th Australian Database Conference, Wollongong, NSW, Australia, 22–25 January (2008)Google Scholar
  42. 42.
    K. Wang, B.C. Fung, P.S. Yu, Template-based privacy preservation in classification problems, in Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, Texas, 27–30 November (2005)Google Scholar
  43. 43.
    A. Delis, V. Verykios, A. Tsitsonis, A data perturbation approach to sensitive classification rule hiding, in Proceedings of the 2010 ACM Symposium on Applied Computing—SAC ‘10, Sierre, Switzerland, 22–26 March (2010)Google Scholar
  44. 44.
    R. Bost, R. Popa, S. Tu, S. Goldwasser, Machine learning classification over encrypted data, in Proceedings of the 2015 Network And Distributed System Security Symposium, San Diego, CA, USA, 8–11 February (2015)Google Scholar
  45. 45.
    R. Tai, J. Ma, Y. Zhao, S. Chow, Privacy-preserving decision trees evaluation via linear functions. Comput. Secur. ESORICS, 494–512 (2017).
  46. 46.
    D. Kalles, V.S. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the Twenty-second European Conference on Artificial Intelligence, Hague, The Netherlands, 29 August–2 September (2016)Google Scholar
  47. 47.
    D. Kalles, V. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the 1St International Workshop on AI for Privacy and Security—Praise ‘16, Hague, The Netherlands, 29–30 August (2016)Google Scholar
  48. 48.
    G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21, 66 (2019)CrossRefGoogle Scholar
  49. 49.
    G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for efficient hiding of decision tree rules, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence—SETN ‘18, Patras, Greece, 9–12 July (2018)Google Scholar
  50. 50.
    R. Li, D. de Vries, J. Roddick, Bands of privacy preserving objectives: classification of PPDM strategies, in Proceedings of the 9th Australasian Data Mining Conference, Ballarat, Australia, 1–2 December 2011 (2011) pp. 137–151Google Scholar
  51. 51.
    G. Feretzakis, D. Kalles, V. Verykios, Using minimum local distortion to hide decision tree rules. Entropy 21, 334 (2019)MathSciNetCrossRefGoogle Scholar
  52. 52.
    G. Feretzakis, D. Kalles, V. Verykios, Hiding decision tree rules in medical data: a case study, in Proceedings of the 17th International Conference on Informatics, Management and Technology in Healthcare—ICIMTH ‘19, Athens, Greece, 5–7 July (2019)Google Scholar
  53. 53.
    D. Kalles, T. Morris, Efficient incremental induction of decision trees. Mach. Learn. 24, 231–242 (1996). Scholar
  54. 54.
    D. Kalles, A. Papagelis, Stable decision trees: using local anarchy for efficient incremental learning. Int. J. Artif. Intell. Tools 9, 79–95 (2000). Scholar
  55. 55.
    D. Kalles, A. Papagelis, Lossless fitness inheritance in genetic algorithms for decision trees. Soft. Comput. 14, 973–993 (2009). Scholar
  56. 56.
    J.R. Quinlan, Induction of decision trees, in Machine Learning 1 (Kluwer Academic Publishers, Boston, MA, USA, 1986), pp. 81–106Google Scholar
  57. 57.
    D. Dua, C. Karra Graff, UCI machine learning repository (The University of California, School of Information and Computer Science, Irvine, CA, 2019). Accessed 16 April 2019
  58. 58.
    M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  59. 59.
    I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011)Google Scholar
  60. 60.
    G. Feretzakis, Local Distortion Hiding in Financial Technology Application: A Case Study with a Benchmark Data Set. Accessed 16 April 2019
  61. 61.
    J. Ellson, E. Gansner, L. Koutsofios, S.C. North, G. Woodhull, Graphviz—Open source graph drawing tools (2001). Graph Drawing.
  62. 62.
    S.M. Vieira, U. Kaymak, J. M.C. Sousa, Cohen’s kappa coefficient as a performance measure for feature selection, in International Conference on Fuzzy Systems, Barcelona (2010), pp. 1–8.

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.School of Science and TechnologyHellenic Open UniversityPatrasGreece

Personalised recommendations