Abstract
Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Camp, J.: Identity in Digital Government. In: Proceedings of 2003 Civic Scenario Workshop: an Event of the Kennedy School of Government. Cambridge, MA 02138 (2003)
Wang, A.G., Atabakhsh, H., Petersen, T., Chen, H.: Discovering identity problems: A case study. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 368–373. Springer, Heidelberg (2005)
Redman, T.C.: The Impact of Poor Data Quality on the Typical Enterprises. Communications of the ACM 41(3), 79–82 (1998)
Wang, G., Chen, H., Atabakhsh, H.: Automatically Detecting Deceptive Criminal Identities. Communications of the ACM 47(3), 71–76 (2004)
Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., Chen, H.: Cross-Jurisdictional criminal activity networks to support border and transportation security. In: Proceedings of 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C (2004)
Ravikumar, P., Cohen, W.W.: A Hierarchical Graphical Model for Record Linkage. In: Proceedings of 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff Park Lodge, Banff, Canada (2004)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39, 103–134 (2000)
Winkler, W.E.: Methods for Record Linkage and Bayesian Networks. In: Proceedings of Section on Survey Research Methods, American Statistical Association, Alexandria, Virginia (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, G.A., Chen, H., Atabakhsh, H. (2006). A Multi-layer Naïve Bayes Model for Approximate Identity Matching. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, FY. (eds) Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760146_44
Download citation
DOI: https://doi.org/10.1007/11760146_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34478-0
Online ISBN: 978-3-540-34479-7
eBook Packages: Computer ScienceComputer Science (R0)