Advertisement

Revisiting Conditional Functional Dependency Discovery: Splitting the “C” from the “FD”

  • Joeri RammelaereEmail author
  • Floris Geerts
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

Many techniques for cleaning dirty data are based on enforcing some set of integrity constraints. Conditional functional dependencies (CFDs) are a combination of traditional Functional dependencies (FDs) and association rules, and are widely used as a constraint formalism for data cleaning. However, the discovery of such CFDs has received limited attention. In this paper, we regard CFDs as an extension of association rules, and present three general methodologies for (approximate) CFD discovery, each using a different way of combining pattern mining for discovering the conditions (the “C” in CFD) with FD discovery. We discuss how existing algorithms fit into these three methodologies, and introduce new techniques to improve the discovery process. We show that the right choice of methodology improves performance over the traditional CFD discovery method CTane. Code related to this paper is available at: https://github.com/j-r77/cfddiscovery, https://codeocean.com/2018/06/20/discovering-conditional-functional-dependencies/code.

Supplementary material

478890_1_En_33_MOESM1_ESM.pdf (1.2 mb)
Supplementary material 1 (pdf 1197 KB)

References

  1. 1.
    Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM, pp. 949–958. ACM (2014)Google Scholar
  2. 2.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)Google Scholar
  3. 3.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45372-5_8CrossRefGoogle Scholar
  4. 4.
    Chiang, F.: Data Quality Through Active Constraint Discovery and Maintenance. Ph.D. thesis, University of Toronto (Canada) (2012)Google Scholar
  5. 5.
    Chiang, F., Miller, R.J.: Discovering data quality rules. PVLDB 1(1), 1166–1177 (2008)Google Scholar
  6. 6.
    Diallo, T., Novelli, N., Petit, J.M.: Discovering (frequent) constant conditional functional dependencies. IJDMMM 4(3), 205–223 (2012)CrossRefGoogle Scholar
  7. 7.
    Fan, W., Geerts, F.: Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2012)CrossRefGoogle Scholar
  8. 8.
    Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. TODS 33(2), 6 (2008)CrossRefGoogle Scholar
  9. 9.
    Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. TKDE 23(5), 683–698 (2011)Google Scholar
  10. 10.
    Goethals, B., Page, W.L., Mannila, H.: Mining association rules of simple conjunctive queries. In: SDM, pp. 96–107. SIAM (2008)Google Scholar
  11. 11.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefGoogle Scholar
  12. 12.
    Ilyas, I.F., Chu, X.: Trends in cleaning relational data: consistency and deduplication. Found. Trends Databases 5(4), 281–393 (2015)CrossRefGoogle Scholar
  13. 13.
    Mandros, P., Boley, M., Vreeken, J.: Discovering reliable approximate functional dependencies. In: KDD, pp. 355–363. ACM (2017)Google Scholar
  14. 14.
    Medina, R., Nourine, L.: A unified hierarchy for functional dependencies, conditional functional dependencies and association rules. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 98–113. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01815-2_9CrossRefGoogle Scholar
  15. 15.
    Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44503-X_13CrossRefGoogle Scholar
  16. 16.
    Papenbrock, T.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)Google Scholar
  17. 17.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: ICDT, pp. 398–416 (1999)Google Scholar
  18. 18.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)Google Scholar
  19. 19.
    Szathmary, L., Valtchev, P., Napoli, A., Godin, R.: Efficient vertical mining of frequent closures and generators. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 393–404. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03915-7_34CrossRefGoogle Scholar
  20. 20.
    Yao, H., Hamilton, H.J., Butz, C.J.: Fd\_mine: discovering functional dependencies in a database using equivalences. In: ICDM, pp. 729–732. IEEE (2002)Google Scholar
  21. 21.
  22. 22.
    Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar
  23. 23.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. In: KDD, pp. 283–286 (1997)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of AntwerpAntwerpBelgium

Personalised recommendations