Abstract
Addressing privacy regulation such as GDPR requires organizations to find and classify sensitive and personal data in their datastores. First, data discovery tools are applied to identify the data. Then, data classification tools are applied on the data that was discovered. Organizations must classify the data into concrete categories to manage data appropriately. In this paper we focus on multi-value classification, where the classifier provides a category to set of values all from the same category. Traditional classifiers usually apply single-value classification methods to a multi-value data set. However, in many cases this resulting an incorrect classification when, for example, domain categories overlap. In this paper, we address this scenario and provide two methods to overcome this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
EU GDPR Portal. https://eugdpr.org/
Li, Y.C., Yao, Q.Z.: The validation of credit card number on the wired and wireless internet. J. Netw. 6, 432–437 (2011)
du Mouza, C., Métais, E.: Towards an automatic detection of sensitive information in a database. In: Second International Conference on Advances in Databases, Knowledge, and Data Applications, pp. 247–252 (2010)
SHiELD. (n.d). https://project-shield.eu/
Puckett, C.: The story of the social security number. Soc. Secur. Bull. 69(2), 55–74 (2018). United States Social Security Administration
DeGroot, M.H., Schervish, M.J.: Probability and Statistics. Addison Wesley, Boston (2012)
Farinde, A.: Lab Values, Normal Adult (2019)
PShlens, J.: Notes on Kullback-Leibler Divergence and Likelihood (2014)
Bigi, B.: Using Kullback-Leibler distance for text categorization. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 305–319. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36618-0_22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Assaf, S., Farkash, A., Moffie, M. (2019). Multi-value Classification of Ambiguous Personal Data. In: Attiogbé, C., Ferrarotti, F., Maabout, S. (eds) New Trends in Model and Data Engineering. MEDI 2019. Communications in Computer and Information Science, vol 1085. Springer, Cham. https://doi.org/10.1007/978-3-030-32213-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-32213-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32212-0
Online ISBN: 978-3-030-32213-7
eBook Packages: Computer ScienceComputer Science (R0)