Knowledge Hiding in Decision Trees for Learning Analytics Applications

Feretzakis, Georgios; Kalles, Dimitris; Verykios, Vassilios S.

doi:10.1007/978-3-030-41196-1_3

Knowledge Hiding in Decision Trees for Learning Analytics Applications

Chapter
First Online: 19 June 2020

449 Accesses

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 14))

Abstract

Nowadays there is a wide range of digital information available to educational institutions regarding learners, including performance records, educational resources, student attendance, feedback on the course material, evaluations of courses and social network data. Although collecting, using, and sharing educational data do offer substantial potential, the privacy-sensitivity of the data raises legitimate privacy concerns. The sharing of data among education organizations has become an increasingly common procedure. However, any organization will most likely try to keep some patterns hidden if it must share its datasets with others. This chapter focuses on preserving the privacy of sensitive patterns when inducing decision trees and demonstrates the application of a heuristic to an educational data set. The employed heuristic hiding method allows the sanitized raw data to be readily available for public use and, thus, is preferable over other heuristic solutions, like output perturbation or cryptographic techniques, which limit the usability of the data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

L. Cranor, T. Rabin, V. Shmatikov, S. Vadhan, D. Weitzner, Towards a privacy research roadmap for the computing community, in Computing Community Consortium committee of the Computing Research Association, Washington, DC, USA, White Paper (2015)
Google Scholar
Universal Declaration of Human Rights, United Nation General Assembly (New York, NY, USA, 1948), pp. 1–6. http://www.un.org/en/documents/udhr/
S. Yu, Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access. 4, 2751–2763 (2016)
Article Google Scholar
S. Laughlin, A. Westin, Privacy and freedom. Mich. Law Rev. 66, 1064 (1968)
Article Google Scholar
E. Bertino, D. Lin, W. Jiang, A survey of quantification of privacy preserving data mining algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 183–205
Google Scholar
C.C. Aggarwal, P.S. Yu, A general survey of privacy-preserving data mining models and algorithms, in Privacy-Preserving Data Mining (Springer, New York, NY, USA, 2008), pp. 11–52
Google Scholar
C.C. Aggarwal, Data Mining: The Textbook (Springer, New York, NY, USA, 2015)
MATH Google Scholar
S. Dua, X. Du, Data Mining and Machine Learning in Cybersecurity (CRC Press, Boca Raton, FL, USA, 2011)
MATH Google Scholar
S. Fletcher, M. Islam, Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7, 21–28 (2014)
Article Google Scholar
R. Mendes, J. Vilela, Privacy-preserving data mining: methods, metrics, and applications. IEEE Access. 5, 10562–10582 (2017). https://doi.org/10.1109/ACCESS.2017.2706947
Article Google Scholar
A. Shah, R. Gulati, Privacy Preserving data mining: techniques, classification and implications—a survey. Int. J. Comput. Appl. 137, 40–46 (2016)
Google Scholar
Y. Aldeen, M. Salleh, M. Razzaque, A comprehensive review on privacy preserving data mining. SpringerPlus 4 (2015)
Google Scholar
E. Bertino, I.N. Fovino, Information driven evaluation of data hiding algorithms, in Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (2005), pp. 418–427
Google Scholar
V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, Y. Theodoridis, State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33(1), 50–57 (2004)
Google Scholar
A. Gkoulalas-Divanis, V.S. Verykios, Association rule hiding for data mining, in Advances in Database Systems (Springer US, 2010). https://doi.org/10.1007/978-1-4419-6569-1
R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM SIGMOD Rec. 29, 439–450 (2000)
Article Google Scholar
P. Lindell, Privacy preserving data mining. J. Cryptol. 15, 177–206 (2002)
Article MathSciNet Google Scholar
A. Pardo, G. Siemens, Ethical and privacy principles for learning analytics. Br. J. Edu. Technol. 45, 438–450 (2014)
Article Google Scholar
L.P. Macfadyen, S. Dawson, A. Pardo, D. Gasevic, Embracing big data in complex educational systems: the learning analytics imperative and the policy challenge. Res. Pract. Assess. 9 (2014)
Google Scholar
G. Siemens, P. Long, Penetrating the fog: analytics in learning and education. Educ. Rev. 48(5), 31–40 (2011)
Google Scholar
Y. Lou, P. Abrami, J. Spence, C. Poulsen, B. Chambers, S. d’Apollonia, Within-class grouping: a meta-analysis. Rev. Educ. Res. 66, 423–458 (1996)
Article Google Scholar
EUP, Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (European Union, European Parliament, 2002)
Google Scholar
T.W. House, Consumer data privacy in a networked world. Retrieved 13 April 2013 (2012)
Google Scholar
M. Crook, The risks of absolute medical confidentiality. Sci. Eng. Ethics 19, 107–122 (2011)
Article Google Scholar
H. Nissenbaum, Privacy as contextual integrity. Wash. Law Rev. 79(1), 101–139 (2004)
Google Scholar
H. Drachsler, S. Dietze, E. Herder, M. d’Aquin, D. Taibi, The learning analytics & knowledge (LAK) data challenge 2014, in Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (ACM, 2014), pp. 289–290
Google Scholar
M. Gursoy, A. Inan, M. Nergiz, Y. Saygin, Privacy-preserving learning analytics: challenges and techniques. IEEE Trans. Learn. Technol. 10, 68–81 (2017)
Article Google Scholar
V. Mayer-Schonberger, K. Cukier, Learning with Big Data: The Future of Education (Houghton Mifflin Harcourt, 2014)
Google Scholar
P. Ice, S. Díaz, K. Swan, M. Burgess, M. Sharkey, J. Sherrill, D. Huston, H. Okimoto, The PAR framework proof of concept: initial findings from a multi-institutional analysis of federated postsecondary data. Online Learn. 16 (2012)
Google Scholar
G. Siemens, R.S. d Baker, Learning analytics and educational data mining: towards communication and collaboration, in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (ACM, 2012), pp. 252–254
Google Scholar
J. Heath, Contemporary privacy theory contributions to learning analytics. J. Learn. Anal. 1(1), 140–149 (2014)
Article Google Scholar
S. Slade, P. Prinsloo, Learning analytics. Am. Behav. Sci. 57(10), 1510–1529 (2013)
Article Google Scholar
P. Prinsloo, S. Slade, An evaluation of policy frameworks for addressing ethical considerations in learning analytics, in Proceedings of the Third International Conference on Learning Analytics and Knowledge (ACM, 2013), pp. 240–244
Google Scholar
K. Verbert, H. Drachsler, N. Manouselis, M. Wolpers, R. Vuorikari, E. Duval, Dataset-driven research for improving recommender systems for learning, in Proceedings of the 1st International Conference on Learning Analytics and Knowledge (ACM Press, New York, USA, 2011), pp. 44–53. https://doi.org/10.1145/2090116.2090122
L. Chang, I. Moskowitz, Parsimonious downgrading and decision trees applied to the inference problem, in Proceedings of the 1998 Workshop on New Security Paradigms—NSPW ‘98, Charlottesville, VA, USA, 22–26 September (1998)
Google Scholar
J. Natwichai, X. Li, M. Orlowska, Hiding classification rules for data sharing with privacy preservation, in Proceedings of the 7th International Conference, DaWak 2005, Copenhagen, Denmark, 22–26 August (2005), pp. 468–467
Google Scholar
J. Natwichai, X. Li, M. Orlowska, A reconstruction-based algorithm for classification rules hiding, in Proceedings of 17th Australasian Database Conference, (ADC2006), Hobart, Tasmania, Australia, 16–19 January (2006), pp. 49–58
Google Scholar
J. Quinlan, C4.5 (Morgan Kaufmann Publishers, San Mateo, California, 1993)
Google Scholar
W.W. Cohen, Fast, effective rule induction, in Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July (1995)
Google Scholar
A. Katsarou, A. Gkouvalas-Divanis, V.S. Verykios, Reconstruction-based classification rule hiding through controlled data modification, in Artificial Intelligence Applications and Innovations III, vol. 296, ed. by L. Iliadis, I. Vlahavas, M. Bramer (Springer, Boston, MA, USA, 2009), pp. 449–458
Google Scholar
J. Natwichai, X. Sun, X. Li, Data reduction approach for sensitive associative classification rule hiding, in Proceedings of the 19th Australian Database Conference, Wollongong, NSW, Australia, 22–25 January (2008)
Google Scholar
K. Wang, B.C. Fung, P.S. Yu, Template-based privacy preservation in classification problems, in Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, Texas, 27–30 November (2005)
Google Scholar
A. Delis, V. Verykios, A. Tsitsonis, A data perturbation approach to sensitive classification rule hiding, in Proceedings of the 2010 ACM Symposium on Applied Computing—SAC ‘10, Sierre, Switzerland, 22–26 March (2010)
Google Scholar
R. Bost, R. Popa, S. Tu, S. Goldwasser, Machine learning classification over encrypted data, in Proceedings of the 2015 Network And Distributed System Security Symposium, San Diego, CA, USA, 8–11 February (2015)
Google Scholar
R. Tai, J. Ma, Y. Zhao, S. Chow, Privacy-preserving decision trees evaluation via linear functions. Comput. Secur. ESORICS, 494–512 (2017). https://doi.org/10.1007/978-3-319-66399-9_27
D. Kalles, V.S. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the Twenty-second European Conference on Artificial Intelligence, Hague, The Netherlands, 29 August–2 September (2016)
Google Scholar
D. Kalles, V. Verykios, G. Feretzakis, A. Papagelis, Data set operations to hide decision tree rules, in Proceedings of the 1St International Workshop on AI for Privacy and Security—Praise ‘16, Hague, The Netherlands, 29–30 August (2016)
Google Scholar
G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21, 66 (2019)
Article Google Scholar
G. Feretzakis, D. Kalles, V. Verykios, On using linear diophantine equations for efficient hiding of decision tree rules, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence—SETN ‘18, Patras, Greece, 9–12 July (2018)
Google Scholar
R. Li, D. de Vries, J. Roddick, Bands of privacy preserving objectives: classification of PPDM strategies, in Proceedings of the 9th Australasian Data Mining Conference, Ballarat, Australia, 1–2 December 2011 (2011) pp. 137–151
Google Scholar
G. Feretzakis, D. Kalles, V. Verykios, Using minimum local distortion to hide decision tree rules. Entropy 21, 334 (2019)
Article MathSciNet Google Scholar
G. Feretzakis, D. Kalles, V. Verykios, Hiding decision tree rules in medical data: a case study, in Proceedings of the 17th International Conference on Informatics, Management and Technology in Healthcare—ICIMTH ‘19, Athens, Greece, 5–7 July (2019)
Google Scholar
D. Kalles, T. Morris, Efficient incremental induction of decision trees. Mach. Learn. 24, 231–242 (1996). https://doi.org/10.1007/bf00058613
Article Google Scholar
D. Kalles, A. Papagelis, Stable decision trees: using local anarchy for efficient incremental learning. Int. J. Artif. Intell. Tools 9, 79–95 (2000). https://doi.org/10.1142/s0218213000000070
Article Google Scholar
D. Kalles, A. Papagelis, Lossless fitness inheritance in genetic algorithms for decision trees. Soft. Comput. 14, 973–993 (2009). https://doi.org/10.1007/s00500-009-0489-y
Article Google Scholar
J.R. Quinlan, Induction of decision trees, in Machine Learning 1 (Kluwer Academic Publishers, Boston, MA, USA, 1986), pp. 81–106
Google Scholar
D. Dua, C. Karra Graff, UCI machine learning repository (The University of California, School of Information and Computer Science, Irvine, CA, 2019). http://archive.ics.uci.edu/ml. Accessed 16 April 2019
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011)
Google Scholar
G. Feretzakis, Local Distortion Hiding in Financial Technology Application: A Case Study with a Benchmark Data Set. http://www.learningalgorithm.eu/datafiles_GermanCredit.html. Accessed 16 April 2019
J. Ellson, E. Gansner, L. Koutsofios, S.C. North, G. Woodhull, Graphviz—Open source graph drawing tools (2001). Graph Drawing. https://doi.org/10.1007/3-540-45848-4_57
S.M. Vieira, U. Kaymak, J. M.C. Sousa, Cohen’s kappa coefficient as a performance measure for feature selection, in International Conference on Fuzzy Systems, Barcelona (2010), pp. 1–8. https://doi.org/10.1109/fuzzy.2010.5584447

Download references

Author information

Authors and Affiliations

School of Science and Technology, Hellenic Open University, Patras, Greece
Georgios Feretzakis, Dimitris Kalles & Vassilios S. Verykios

Authors

Georgios Feretzakis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Kalles
View author publications
You can also search for this author in PubMed Google Scholar
Vassilios S. Verykios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Feretzakis .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
Department of Informatics, University of Piraeus, Piraeus, Greece
Maria Virvou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Feretzakis, G., Kalles, D., Verykios, V.S. (2021). Knowledge Hiding in Decision Trees for Learning Analytics Applications. In: Tsihrintzis, G., Virvou, M. (eds) Advances in Core Computer Science-Based Technologies. Learning and Analytics in Intelligent Systems, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-41196-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-41196-1_3
Published: 19 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41195-4
Online ISBN: 978-3-030-41196-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics