Abstract
Information incompleteness is a major data quality issue which is amplified by the increasing amount of data collected from unreliable sources. Assessing the completeness of data is crucial for determining the quality of the data itself, but also for verifying the validity of query answers over incomplete data. In this article, we tackle the issue of efficiently describing and inferring knowledge about data completeness w.r.t. to a complete reference data set and study the use of a partition pattern algebra for summarizing the completeness and validity of query answers. We describe an implementation and experiments with a real-world dataset to validate the effectiveness and the efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Two pattern tables are equivalent if their instances in R are equal.
References
Bidoit, N., Herschel, M., Tzompanaki, A.: Efficient computation of polynomial explanations of why-not questions. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 713–722 (2015)
Fan, W., Geerts, F.: Relative information completeness. ACM Trans. Database Syst. 35(4), 27:1–27:44 (2010)
Hannou, F.Z., Amann, B., Baazizi, M.A.: Explaining query answer completeness and correctness using partition patterns (long version). Technical report (2019). http://www-bd.lip6.fr/wiki/site/recherche/articles/start
Herschel, M., Hernández, M.A.: Explaining missing answers to SPJUA queries. Proc. VLDB Endow. 3(1–2), 185–196 (2010)
Imieliński, T., Lipski, W.: Incomplete information in relational databases. In: Readings in Artificial Intelligence and Databases, pp. 342–360. Elsevier (1988)
Lang, W., Nehme, R.V., Robinson, E., Naughton, J.F.: Partial results in database systems. In: International Conference on Management of Data, SIGMOD, pp. 1275–1286. Snowbird, USA, June 2014
Levy, A.Y.: Obtaining complete answers from incomplete databases. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB 1996, pp. 402–412. Morgan Kaufmann Publishers Inc., San Francisco (1996)
Loshin, D.: Master Data Management. Morgan Kaufmann, Burlington (2010)
Mazón, J.N., Lechtenbörger, J., Trujillo, J.: A survey on summarizability issues in multidimensional modeling. Data Knowl. Eng. 68(12), 1452–1469 (2009)
Motro, A.: Integrity = validity + completeness. ACM Trans. Database Syst. 14(4), 480–502 (1989)
Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, pp. 561–576, 31 May–4 June 2015
Shoshani, A.: OLAP and statistical databases: similarities and differences. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 185–196. ACM (1997)
Stonebraker, M., Rowe, L.A.: The design of postgres. SIGMOD Rec. 15(2), 340–355 (1986)
Sundarmurthy, B., Koutris, P., Lang, W., Naughton, J.F., Tannen, V.: m-tables: representing missing data. In: 20th International Conference on Database Theory, ICDT, Venice, Italy, pp. 21:1–21:20, March 2017
Tran, Q.T., Chan, C.Y.: How to conquer why-not questions. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 15–26. ACM (2010)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hannou, FZ., Amann, B., Baazizi, MA. (2019). Explaining Query Answer Completeness and Correctness with Partition Patterns. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-27618-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27617-1
Online ISBN: 978-3-030-27618-8
eBook Packages: Computer ScienceComputer Science (R0)