Abstract
Poor quality data such as data with missing values (or records) cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this paper, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to support PBC measurements in practice. This paper explores the need of PBC in the microbial genomics where real sample data sets retrieved from a microbial database called Comprehensive Microbial Resources are used (CMR).
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45, 211–218 (2002)
Iles, M.M.: What can genome-wide association studies tell us about the genetics of common disease. PLOS Genetics 4, 1–8 (2008)
Tiffin, N., Andrade-Navarro, M.A., Perez-Iratxeta, C.: Linking genes to diseases: it’s all in the data. Genome Medicine 1, 1–7 (2009)
Codd, E.F.: Extending the database relational model to capture more meaning. ACM Transactions on Database Systems (TODS) 4 (1979)
Reich, D.E., Gabriel, S., Atshuler, D.: Quality and completeness of SNP databases. Nature Genetics 33, 457–458 (2003)
Zaniolo, C.: Database relations with null values. Journal of Computer and System Sciences 28, 142–166 (1984)
Codd, E.F.: Understanding relations (installment #7). Bulletin of ACM SIGMOD 7, 23–28 (1975)
Imieliński, T., Lipski, J.: Incomplete information in relational databases. Journal of the ACM 31, 761–791 (1984)
Fox, C., Levitin, A., Redman, T.: The notion of data and its quality dimensions. Information Processing and Management 30, 9–19 (1994)
Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems 14, 480–502 (1989)
Motro, A., Rakov, I.: Estimating the Quality of Databases. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds.) FQAS 1998. LNCS (LNAI), vol. 1495, pp. 298–307. Springer, Heidelberg (1998)
Sampaio, S.F.M., Sampaio, P.R.F.: Incorporating completeness quality support in internet query systems. In: CAiSE Forum. CEUR-WS.org, pp. 17–20 (2007)
Scannapieco, M., Batini, C.: Completeness in the relational model: a comprehensive framework. In: Ninth International Conference on Information Quality (IQ), pp. 333–345. MIT (2004)
Knudson, A.: Mutation and cancer: statistical study of retinoblastoma. Proceedings of the National Academy of Sciences of the United States of America 68, 820–823 (1971)
Hashimoto, C.: Population census of the chimpanzees in the Kalinzu forest, Uganda: Comparison between methods with nest counts. Primates 36, 477–488 (2006)
Liang, Z., Ma, Z.: China’s floating population: new evidence from the 2000 census. Population and Development Review 30, 467–488 (2004)
Bird, A., Tobin, E.: Natural kinds. In: The Stanford Encyclopedia of Philosophy (summer 2010)
Science Daily: Human gene count tumbles again (2008), http://www.sciencedaily.com/releases/2008/01/080113161406.htm (accessed June 27, 2011)
Maddux, R.: The origin of relation algebras in the development and axiomatization of the calculus of relations. Studia Logica 50, 421–455 (1991)
Falkow, S.: Who speaks for the microbes? Emerging Infectious Disease 4, 495–497 (1998)
Fraser, C.M., Eisen, J.A., Salzberg, S.L.: Consanguinity and susceptibility to infectious diseases in humans. Nature 406, 799–803 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Emran, N.A., Embury, S., Missier, P., Isa, M.N.M., Muda, A.K. (2013). Measuring Data Completeness for Microbial Genomics Database. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36546-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-36546-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36545-4
Online ISBN: 978-3-642-36546-1
eBook Packages: Computer ScienceComputer Science (R0)