Abstract
We consider the problem of defining an approximation measure for functional dependencies (FDs). An approximation measure for X → Y is a function mapping relation instances, r, to non-negative real numbers. The number to which r is mapped, intuitively, describes the “degree” to which the dependency X → Y holds in r. We develop a set of axioms for measures based on the following intuition. The degree to which X → Y is approximate in r is the degree to which r determines a function from π X(r)to π Y (r). The axioms apply to measures that depend only on frequencies (i.e. the frequency of x ∈ π X(r) is the number of tuples containing x divided by the total number of tuples). We prove that a unique measure satisfies these axioms (up to a constant multiple), namely, the information dependency measure of [5]. We do not argue that this result implies that the only reasonable, frequency-based, measure is the information dependency measure. However, if an application designer decides to use another measure, then the designer must accept that the measure used violates one of the axioms.
Work supported by National Science Foundation grant IIS-0082407.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul S., Hull R., and Vianu V. Foundations of Database Systems. Addison-Wesley, Reading, Mass., 1995.
Ash R. Information Theory. Interscience Publishers, John Wiley and Sons, New York, 1965.
Cavallo R. and Pittarelli M. The Theory of Probabilistic Databases. In Proceedings 13th International Conference on Very Large Databases (VLDB), pages 71–81, 1987.
Dalkilic M. Establishing the Foundations of Data Mining. PhD thesis, Indiana University, Bloomington, IN 47404, May 2000.
Dalkilic M. and Robertson E. Information Dependencies. In Proceedings 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principals of Database Systems (PODS), pages 245–253, 2000.
De Bra P. and Paredaens J. An Algorithm for Horizontal Decompositions. Information Processing Letters, 17:91–95, 1983.
Demetrovics J., Katona G.O.H., and Miklos D. Partial Dependencies in Relational Databases and Their Realization. Discrete Applied Mathematics, 40:127–138, 1992.
Demetrovics J., Katona G.O.H., Niklosb D., Seleznjevc O., and Thalheimd B. Asymptotic Properties of Keys and Functional Dependencies in Random Databases. Theoretical Computer Science, 40(2):151–166, 1998.
Goodman L. and Kruskal W. Measures of Associations for Cross Classifications.Journal of the American Statistical Association, 49:732–764, 1954.
Hilderman R. and Hamilton H. Evaluation of Interestingness Measures for Ranking Discovered Knowledge. In Lecture Notes in Computer Science 2035 (Proceedings Fifth Pacific-Asian Conference on Knowledge Discovery and Data Mining (PAKDD 2001)), pages 247–259, 2001.
Huhtala Y., Kärkkäinen J., Porkka P., and Toivonen H. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal, 42(2):100–111, 1999.
Kantola M., Mannila H., Räihä K., and Siirtola H. Discovering Functional and Inclusion Dependencies in Relational Databases. International Journal of Intelligent Systems, 7:591–607, 1992.
Kivinen J., Mannila H. Approximate Inference of Functional Dependencies from Relations. Theoretical Computer Science, 149:129–149, 1995.
Lee T. An Information-Theoretic Analysis of Relational Databases — Part I: Data Dependencies and Information Metric. IEEE Transactions on Software Engineering, SE-13(10):1049–1061, 1987.
Lopes S., Petit J., and Lakhal L. Efficient Discovery of Functional Dependencies and Armstrong Relations. In Lecture Notes in Computer Science 1777 (Proceedings 7th International Conference on Extending Database Technology (EDBT)), pages 350–364, 2000.
Malvestuto F. Statistical Treatment of the Information Content of a Database. Information Systems, 11(3):211–223, 1986.
Mannila H. and Räihä K. Dependency Inference. In Proceedings 13th International Conference on Very Large Databases (VLDB), pages 155–158, 1987.
Nambiar K. K. Some Analytic Tools for the Design of Relational Database Systems. In Proceedings 6th International Conference on Very Large Databases (VLDB), pages 417–428, 1980.
Novelli N. and Cicchetti R. Functional and Embedded Dependency Inference: a Data Mining Point of View. Information Systems, 26:477–506, 2001.
Piatatsky-Shapiro G. Probabilistic Data Dependencies. In Proceedings ML-92 Workshop on Machine Discovery, Aberdeen, UK, pages 11–17, 1992.
Ramakrishnan R., Gehrke J. Database Management Systems Second Edition. Mc-Graw Hill Co., New York, 2000.
Wyss C., Giannella C., and Robertson E. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. In Lecture Notes in Computer Science 2114 (Proceedings 3rd International Conference on Data Warehousing and Knowledge Discovery (DaWaK)), pages 101–110, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giannella, C. (2002). An Axiomatic Approach to Defining Approximation Measures for Functional Dependencies. In: Manolopoulos, Y., Návrat, P. (eds) Advances in Databases and Information Systems. ADBIS 2002. Lecture Notes in Computer Science, vol 2435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45710-0_4
Download citation
DOI: https://doi.org/10.1007/3-540-45710-0_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44138-0
Online ISBN: 978-3-540-45710-7
eBook Packages: Springer Book Archive