FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances Extended Abstract
The problem of discovering functional dependencies (FDs) from an existing relation instance has received considerable attention in the database research community. To date, even the most efficient solutions have exponential complexity in the number of attributes of the instance. We develop an algorithm, FastFDs, for solving this problem based on a depth-first, heuristic-driven (DFHD) search for finding minimal covers of hypergraphs. The technique of reducing the FD discovery problem to the problem of finding minimal covers of hypergraphs was applied previously by Lopes et al. in the algorithm Dep-Miner. Dep-Miner employs a levelwise search for minimal covers, whereas FastFDs uses DFHD search. We report several tests on distinct benchmark relation instances involving Dep-Miner, FastFDs, and Tane. Our experimental results indicate that DFHD search is more efficient than Dep-Miner’s levelwise search or Tane’s partitioning approach for many of these benchmark instances.
KeywordsFunctional Dependency Search Tree Correlation Factor Minimal Cover Relation Instance
Unable to display preview. Download preview PDF.
- 1.Agrawal, Rakesh; Mannila, Heikki; Srikant, Ramakrishnan; Toivonen, Hannu and Verkamo, A.I. “Fast Discovery of Association Rules.” Advances in KDD, AAA, Press, Menlo Park, CA, pg. 307–328, 1996.Google Scholar
- 2.Demetrovics, J; Katona, G; Miklos, D; Seleznjev, O. and Thalheim, B. “The Average Length of Keys and Functional Dependencies in (Random) Databases.” Lecture Notes in Computer Science, vol. 893, 1995.Google Scholar
- 4.Flach, Peter and Savnik, Iztok. “Database Dependency Discovery: a Machine Learning Approach.” AI Comm. vol. 12,no. 3, pg 139–160.Google Scholar
- 5.Gunopulos, Dimitrios; Khardon, Roni; Mannila, Heikki; and Toivonen, Hannu. “Data Mining, Hypergraph Traversals, and Machine Learning (extended abstract)”, PODS, 1997, pg 209–216.Google Scholar
- 6.Huhtala, Ykä; Kärkkäinen, Juha; Porkka, Pasi and Toivonen, Hannu. “TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies.” The Computer Journal, vol. 42,no. 2, 1999.Google Scholar
- 8.Lopes, Stephane; Petit, Jean-Marc and Lakhal, Lotfi. “Efficient Discovery of Functional Dependencies and Armstrong Relations.” Proceedings of ECDT 2000. Lecture Notes in Computer Science, vol 1777.Google Scholar
- 9.Mannila, Heikki and Räihä, Kari-Jouko. “Dependency Inference (Extended Abstract)”, Proceedings of the Very Large Databases Conference (VLDB), Brighton, pg. 155–158, 1987.Google Scholar
- 11.Merz, C.J. and Murphy, P.M. UCI Machine Learning databases (1996). http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Comp. Sci.
- 12.The Tane and Tane/mem source code is available on the web at http://http://www.cs.helsinki.fi/research/fdk/datamining/tane
- 13.Wyss, C; Giannella, C; and Robertson E. “FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances”, Technical Report, Dept. of Comp. Sci, Indiana University, May 2001.Google Scholar