Conclusion
We have presented a method for estimating the quality (accuracy) of query results derived from underlying base relations in a relational database through a database query. Since the quality of the derived data is a function of the query, the emphasis is on estimating the quality for the output of every operator that could be present in the relational algebra. By postulating the impact on the quality for each operator in theoretical terms, the quality profile of the output can be generated for any arbitrary query comprised of such operators. Some operators increase the degree of potential quality, while others tend to decrease the quality. This analysis, therefore, provides a basis for further research in formulating queries in terms of preferred operators with the objective of attaining the optimized quality profile for a given set of data sources and their corresponding quality profiles.
Clearly, the validity of the computed quality is a function of the validity of the quality figures of the underlying base relations. The latter figures obtained via sampling and other techniques are themselves prone to error. In the absence of more stringent information for each base relation, the data quality profile has been assumed to be uniform; however, the impact of non-uniform accuracy profiles has also been studied. The issue of defining rigorous techniques for estimating the idiosyncrasies of individual data parameters is an area requiring further research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ballou, D. P. and H. L. Pazer, “Modeling Data and Process Quality in Multi-input, Multioutput Information Systems,” Management Science, 31(2), 1985, pp. 150–162.
Ballou, D. P. and H. L. Pazer, “Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff,” Information Systems Research, 6(1), 1995, pp. 51–72.
Ceri, S. and G. Pelagatti, Distributed Databases Principles & Systems. 1st ed. McGraw-Hill, 1984.
Codd, E. F., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM, 13(6), 1970, pp. 377–387.
Codd, E. F., The Relational Model for Database Management: Version 2. Addison-Wesley, Reading, MA, 1990.
Date, C. J. “Referential Integrity,” in Proceedings of the Proceedings of the 7th International Conference on Very Large Data bases. Cannes, France: pp. 2–12, 1981.
Date, C. J., An Introduction to Database Systems. 5th ed. Addison-Wesley Systems Programming Series, Addison-Wesley, Reading, 1990.
Janson, M., “Data Quality: The Achilles Heel of End-User Computing,” Omega Int. J. of Mgmt. Sci., 16(5), 1988, pp. 491–502.
Kent, W., Data and Reality. North Holland, New York, 1978.
Klug, A., “Equivalence of relational algebra and relational calculus query languages having aggregate functions,” The Journal of ACM, 29, 1982, pp. 699–717.
Madnick, S. E., “Challenges in the “on-and-off the ramps” of the Information Superhighway,” Journal of Organizational Computing, 1995.
Paradice, D. B. and W. L. Fuerst, “An MIS data quality methodology based on optimal error detection,” 5(1), 1991, pp. 48–66.
Reddy, M. P. and R. Y. Wang. “Estimating Data Accuracy in a Federated Database Environment,” in Proceedings of 6th International Conference, CISMOD (Also in Lecture Notes in Computer Science). Bombay, India: pp. 115–134, 1995.
Siegel, M. and S. E. Madnick. “A metadata approach to resolving semantic conflicts,” in Proceedings of the proceedings of the 17th International Conference on Very Large Data Bases (VLDB). Barcelona, Spain: pp. 133–145, 1991.
Siegel, M., E. Sciore and A. Rosenthal, Using Semantic Values to Facilitate Interoperability Among Heterogeneous Information Systems (No. 3543-93). Context Interchange Project, MIT Sloan School of Management, 1993.
Wand, Y. and R. Y. Wang, “Anchoring Data Quality Dimensions in Ontological Foundations,” Forthcoming, Communications of the ACM, 1995.
Wang, R. Y., H. B. Kon and S. E. Madnick. “Data Quality Requirements Analysis and Modeling,” in Proceedings of the 9th International Conference on Data Engineering. Vienna: pp. 670–677, 1993.
Wang, R. Y., M. P. Reddy and H. B. Kon, “Toward quality data: An attribute-based approach,” Decision Support Systems (DSS), 13, 1995, pp. 349–372.
Wang, Y. R. and S. E. Madnick. “A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective,” in Proceedings of the 16th International Conference on Very Large Data bases (VLDB). Brisbane, Australia: pp. 519–538, 1990.
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
(2002). Developing a Data Quality Algebra. In: Data Quality. Advances in Database Systems, vol 23. Springer, Boston, MA. https://doi.org/10.1007/0-306-46987-1_5
Download citation
DOI: https://doi.org/10.1007/0-306-46987-1_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7215-8
Online ISBN: 978-0-306-46987-9
eBook Packages: Springer Book Archive