Abstract
Referential integrity is one of the three inherent integrity rules and can be enforced in databases using foreign keys. However, in many real world applications referential integrity is not enforced since foreign keys remain disabled to ease data acquisition. Important applications such as anomaly detection, data integration, data modeling, indexing, reverse engineering, schema design, and query optimization all benefit from the discovery of foreign keys. Therefore, the profiling of foreign keys from dirty data is an important yet challenging task. We raise the challenge further by diverting from previous research in which null markers have been ignored. We propose algorithms for profiling unary and multi-column foreign keys in the real world, that is, under the different semantics for null markers of the SQL standard. While state of the art algorithms perform well in the absence of null markers, it is shown that they perform poorly in their presence. Extensive experiments demonstrate that our algorithms perform as well in the real world as state of the art algorithms perform in the idealized special case where null markers are ignored.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bauckmann, J., Abedjan, Z., Leser, U., Müller, H., Naumann, F.: Discovering conditional inclusion dependencies. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, October 29–November 02, 2012, pp. 2094–2098. ACM (2012)
Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K. (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1448–1450. IEEE (2007)
Casanova, M.A., Fagin, R., Papadimitriou, C.H.: Inclusion dependencies and their interaction with functional dependencies. J. Comput. Syst. Sci. 28(1), 29–59 (1984)
Chen, Z., Narasayya, V.R., Chaudhuri, S.: Fast foreign-key detection in Microsoft SQL server PowerPivot for Excel. Proc. VLDB 7(13), 1417–1428 (2014)
Codd, E.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
De Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. Int. J. Intell. Syst. 32(1), 53–73 (2009)
Heise, A., Quiané-Ruiz, J., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB 7(4), 301–312 (2013)
Kantola, M., Mannila, H., Räihä, K.J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. Int. J. Intell. Syst. 7(7), 591–607 (1992)
Koeller, A., Rundensteiner, E.A.: Heuristic strategies for the discovery of inclusion dependencies and other patterns. In: Spaccapietra, S., Atzeni, P., Chu, W.W., Catarci, T., Sycara, K. (eds.) Journal on Data Semantics V. LNCS, vol. 3870, pp. 185–210. Springer, Heidelberg (2006)
Lopes, S., Petit, J., Toumani, F.: Discovering interesting inclusion dependencies: application to logical database tuning. Inf. Syst. 27(1), 1–19 (2002)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Melton, J., Simon, A.R.: SQL:1999: Understanding Relational Language Components. Morgan Kaufmann, Boston (2001)
Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)
Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: 12th International Workshop on the Web and Databases, WebDB 2009, Providence, RI, USA, 28 June 2009 (2009)
Saha, B., Srivastava, D.: Data quality: the other face of big data. In: Cruz, I.F., Ferrari, E., Tao, Y., Bertino, E., Trajcevski, G. (eds.) 30th International Conference on Data Engineering, ICDE 2014, Chicago, IL, USA, 31 March–4 April 2014, pp. 1294–1297. IEEE (2014)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: Dayal, U., Whang, K., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y. (eds.) Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 12–15 September 2006, pp. 691–702. ACM (2006)
Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. Proc. VLDB 3(1), 805–814 (2010)
Acknowledgement
This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Memari, M., Link, S., Dobbie, G. (2015). SQL Data Profiling of Foreign Keys. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-25264-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25263-6
Online ISBN: 978-3-319-25264-3
eBook Packages: Computer ScienceComputer Science (R0)