Abstract
Nowadays there is a wide range of digital information available to educational institutions regarding learners, including performance records, educational resources, student attendance, feedback on the course material, evaluations of courses and social network data. Although collecting, using, and sharing educational data do offer substantial potential, the privacy-sensitivity of the data raises legitimate privacy concerns. The sharing of data among education organizations has become an increasingly common procedure. However, any organization will most likely try to keep some patterns hidden if it must share its datasets with others. This chapter focuses on preserving the privacy of sensitive patterns when inducing decision trees and demonstrates the application of a heuristic to an educational data set. The employed heuristic hiding method allows the sanitized raw data to be readily available for public use and, thus, is preferable over other heuristic solutions, like output perturbation or cryptographic techniques, which limit the usability of the data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
L. Cranor, T. Rabin, V. Shmatikov, S. Vadhan, D. Weitzner, Towards a privacy research roadmap for the computing community, in Computing Community Consortium committee of the Computing Research Association, Washington, DC, USA, White Paper (2015)
Universal Declaration of Human Rights, United Nation General Assembly (New York, NY, USA, 1948), pp. 1–6. http://www.un.org/en/documents/udhr/
S. Yu, Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access. 4, 2751–2763 (2016)
S. Laughlin, A. Westin, Privacy and freedom. Mich. Law Rev. 66, 1064 (1968)
E. Bertino, D. Lin, W. Jiang, A survey of quantification of privacy preserving data mining algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 183–205
C.C. Aggarwal, P.S. Yu, A general survey of privacy-preserving data mining models and algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 11–52
C.C. Aggarwal, Data Mining: The Textbook (Springer, New York, NY, USA, 2015)
S. Dua, X. Du, Data Mining and Machine Learning in Cybersecurity (CRC Press, Boca Raton, FL, USA, 2011)
S. Fletcher, M. Islam, Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7, 21–28 (2014)
R. Mendes, J. Vilela, Privacy-preserving data mining: methods, metrics, and applications. IEEE Access. 5, 10562–10582 (2017). https://doi.org/10.1109/ACCESS.2017.2706947
A. Shah, R. Gulati, Privacy Preserving data mining: techniques, classification and implications—a survey. Int. J. Comput. Appl. 137, 40–46 (2016)
Y. Aldeen, M. Salleh, M. Razzaque, A comprehensive review on privacy preserving data mining. SpringerPlus 4 (2015)
E. Bertino, I.N. Fovino, Information driven evaluation of data hiding algorithms, in Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (2005), pp. 418–427
V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, Y. Theodoridis, State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33(1), 50–57 (2004)
A. Gkoulalas-Divanis, V.S. Verykios, Association rule hiding for data mining, in Advances in Database Systems (Springer US, 2010). https://doi.org/10.1007/978-1-4419-6569-1
R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM SIGMOD Rec. 29, 439–450 (2000)
P. Lindell, Privacy preserving data mining. J. Cryptol. 15, 177–206 (2002)
A. Pardo, G. Siemens, Ethical and privacy principles for learning analytics. Br. J. Edu. Technol. 45, 438–450 (2014)
L.P. Macfadyen, S. Dawson, A. Pardo, D. Gasevic, Embracing big data in complex educational systems: the learning analytics imperative and the policy challenge. Res. Pract. Assess. 9 (2014)
G. Siemens, P. Long, Penetrating the fog: analytics in learning and education. Educ. Rev. 48(5), 31–40 (2011)
Y. Lou, P. Abrami, J. Spence, C. Poulsen, B. Chambers, S. d’Apollonia, Within-class grouping: a meta-analysis. Rev. Educ. Res. 66, 423–458 (1996)
EUP, Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (European Union, European Parliament, 2002)
T.W. House, Consumer data privacy in a networked world. Retrieved 13 April 2013 (2012)
M. Crook, The risks of absolute medical confidentiality. Sci. Eng. Ethics 19, 107–122 (2011)
H. Nissenbaum, Privacy as contextual integrity. Wash. Law Rev. 79(1), 101–139 (2004)
H. Drachsler, S. Dietze, E. Herder, M. d’Aquin, D. Taibi, The learning analytics & knowledge (LAK) data challenge 2014, in Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (ACM, 2014), pp. 289–290
M. Gursoy, A. Inan, M. Nergiz, Y. Saygin, Privacy-preserving learning analytics: challenges and techniques. IEEE Trans. Learn. Technol. 10, 68–81 (2017)
V. Mayer-Schonberger, K. Cukier, Learning with Big Data: The Future of Education (Houghton Mifflin Harcourt, 2014)
P. Ice, S. Díaz, K. Swan, M. Burgess, M. Sharkey, J. Sherrill, D. Huston, H. Okimoto, The PAR framework proof of concept: initial findings from a multi-institutional analysis of federated postsecondary data. Online Learn. 16 (2012)
G. Siemens, R.S. d Baker, Learning analytics and educational data mining: towards communication and collaboration, in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (ACM, 2012), pp. 252–254
J. Heath, Contemporary privacy theory contributions to learning analytics. J. Learn. Anal. 1(1), 140–149 (2014)
S. Slade, P. Prinsloo, Learning analytics. Am. Behav. Sci. 57(10), 1510–1529 (2013)
P. Prinsloo, S. Slade, An evaluation of policy frameworks for addressing ethical considerations in learning analytics, in Proceedings of the Third International Conference on Learning Analytics and Knowledge (ACM, 2013), pp. 240–244
K. Verbert, H. Drachsler, N. Manouselis, M. Wolpers, R. Vuorikari, E. Duval, Dataset-driven research for improving recommender systems for learning, in Proceedings of the 1st International Conference on Learning Analytics and Knowledge (ACM Press, New York, USA, 2011), pp. 44–53. https://doi.org/10.1145/2090116.2090122
L. Chang, I. Moskowitz, Parsimonious downgrading and decision trees applied to the inference problem, in Proceedings of the 1998 Workshop on New Security Paradigms—NSPW ‘98, Charlottesville, VA, USA, 22–26 September (1998)
J. Natwichai, X. Li, M. Orlowska, Hiding classification rules for data sharing with privacy preservation, in Proceedings of the 7th International Conference, DaWak 2005, Copenhagen, Denmark, 22–26 August (2005), pp. 468–467
J. Natwichai, X. Li, M. Orlowska, A reconstruction-based algorithm for classification rules hiding, in Proceedings of 17th Australasian Database Conference, (ADC2006), Hobart, Tasmania, Australia, 16–19 January (2006), pp. 49–58
J. Quinlan, C4.5 (Morgan Kaufmann Publishers, San Mateo, California, 1993)
W.W. Cohen, Fast, effective rule induction, in Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July (1995)
A. Katsarou, A. Gkouvalas-Divanis, V.S. Verykios, Reconstruction-based classification rule hiding through controlled data modification, in Artificial Intelligence Applications and Innovations III, vol. 296, ed. by L. Iliadis, I. Vlahavas, M. Bramer (Springer, Boston, MA, USA, 2009), pp. 449–458
J. Natwichai, X. Sun, X. Li, Data reduction approach for sensitive associative classification rule hiding, in Proceedings of the 19th Australian Database Conference, Wollongong, NSW, Australia, 22–25 January (2008)
K. Wang, B.C. Fung, P.S. Yu, Template-based privacy preservation in classification problems, in Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, Texas, 27–30 November (2005)
A. Delis, V. Verykios, A. Tsitsonis, A data perturbation approach to sensitive classification rule hiding, in Proceedings of the 2010 ACM Symposium on Applied Computing—SAC ‘10, Sierre, Switzerland, 22–26 March (2010)
R. Bost, R. Popa, S. Tu, S. Goldwasser, Machine learning classification over encrypted data, in Proceedings of the 2015 Network And Distributed System Security Symposium, San Diego, CA, USA, 8–11 February (2015)
R. Tai, J. Ma, Y. Zhao, S. Chow, Privacy-preserving decision trees evaluation via linear functions. Comput. Secur. ESORICS, 494–512 (2017). https://doi.org/10.1007/978-3-319-66399-9_27
D. Kalles, V.S. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the Twenty-second European Conference on Artificial Intelligence, Hague, The Netherlands, 29 August–2 September (2016)
D. Kalles, V. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the 1St International Workshop on AI for Privacy and Security—Praise ‘16, Hague, The Netherlands, 29–30 August (2016)
G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21, 66 (2019)
G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for efficient hiding of decision tree rules, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence—SETN ‘18, Patras, Greece, 9–12 July (2018)
R. Li, D. de Vries, J. Roddick, Bands of privacy preserving objectives: classification of PPDM strategies, in Proceedings of the 9th Australasian Data Mining Conference, Ballarat, Australia, 1–2 December 2011 (2011) pp. 137–151
G. Feretzakis, D. Kalles, V. Verykios, Using minimum local distortion to hide decision tree rules. Entropy 21, 334 (2019)
G. Feretzakis, D. Kalles, V. Verykios, Hiding decision tree rules in medical data: a case study, in Proceedings of the 17th International Conference on Informatics, Management and Technology in Healthcare—ICIMTH ‘19, Athens, Greece, 5–7 July (2019)
D. Kalles, T. Morris, Efficient incremental induction of decision trees. Mach. Learn. 24, 231–242 (1996). https://doi.org/10.1007/bf00058613
D. Kalles, A. Papagelis, Stable decision trees: using local anarchy for efficient incremental learning. Int. J. Artif. Intell. Tools 9, 79–95 (2000). https://doi.org/10.1142/s0218213000000070
D. Kalles, A. Papagelis, Lossless fitness inheritance in genetic algorithms for decision trees. Soft. Comput. 14, 973–993 (2009). https://doi.org/10.1007/s00500-009-0489-y
J.R. Quinlan, Induction of decision trees, in Machine Learning 1 (Kluwer Academic Publishers, Boston, MA, USA, 1986), pp. 81–106
D. Dua, C. Karra Graff, UCI machine learning repository (The University of California, School of Information and Computer Science, Irvine, CA, 2019). http://archive.ics.uci.edu/ml. Accessed 16 April 2019
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)
I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011)
G. Feretzakis, Local Distortion Hiding in Financial Technology Application: A Case Study with a Benchmark Data Set. http://www.learningalgorithm.eu/datafiles_GermanCredit.html. Accessed 16 April 2019
J. Ellson, E. Gansner, L. Koutsofios, S.C. North, G. Woodhull, Graphviz—Open source graph drawing tools (2001). Graph Drawing. https://doi.org/10.1007/3-540-45848-4_57
S.M. Vieira, U. Kaymak, J. M.C. Sousa, Cohen’s kappa coefficient as a performance measure for feature selection, in International Conference on Fuzzy Systems, Barcelona (2010), pp. 1–8. https://doi.org/10.1109/fuzzy.2010.5584447
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Feretzakis, G., Kalles, D., Verykios, V.S. (2021). Knowledge Hiding in Decision Trees for Learning Analytics Applications. In: Tsihrintzis, G., Virvou, M. (eds) Advances in Core Computer Science-Based Technologies. Learning and Analytics in Intelligent Systems, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-41196-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-41196-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41195-4
Online ISBN: 978-3-030-41196-1
eBook Packages: EngineeringEngineering (R0)