SQL Data Profiling of Foreign Keys

Memari, Mozhgan; Link, Sebastian; Dobbie, Gillian

doi:10.1007/978-3-319-25264-3_17

Mozhgan Memari¹⁸,
Sebastian Link¹⁸ &
Gillian Dobbie¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

International Conference on Conceptual Modeling

1925 Accesses
7 Citations

Abstract

Referential integrity is one of the three inherent integrity rules and can be enforced in databases using foreign keys. However, in many real world applications referential integrity is not enforced since foreign keys remain disabled to ease data acquisition. Important applications such as anomaly detection, data integration, data modeling, indexing, reverse engineering, schema design, and query optimization all benefit from the discovery of foreign keys. Therefore, the profiling of foreign keys from dirty data is an important yet challenging task. We raise the challenge further by diverting from previous research in which null markers have been ignored. We propose algorithms for profiling unary and multi-column foreign keys in the real world, that is, under the different semantics for null markers of the SQL standard. While state of the art algorithms perform well in the absence of null markers, it is shown that they perform poorly in their presence. Extensive experiments demonstrate that our algorithms perform as well in the real world as state of the art algorithms perform in the idealized special case where null markers are ignored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.tpc.org/tpcc/ and http://www.tpc.org/tpch/.

References

Bauckmann, J., Abedjan, Z., Leser, U., Müller, H., Naumann, F.: Discovering conditional inclusion dependencies. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, October 29–November 02, 2012, pp. 2094–2098. ACM (2012)
Google Scholar
Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K. (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1448–1450. IEEE (2007)
Google Scholar
Casanova, M.A., Fagin, R., Papadimitriou, C.H.: Inclusion dependencies and their interaction with functional dependencies. J. Comput. Syst. Sci. 28(1), 29–59 (1984)
Article MathSciNet MATH Google Scholar
Chen, Z., Narasayya, V.R., Chaudhuri, S.: Fast foreign-key detection in Microsoft SQL server PowerPivot for Excel. Proc. VLDB 7(13), 1417–1428 (2014)
Article Google Scholar
Codd, E.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Article MATH Google Scholar
De Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. Int. J. Intell. Syst. 32(1), 53–73 (2009)
Article Google Scholar
Heise, A., Quiané-Ruiz, J., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB 7(4), 301–312 (2013)
Article Google Scholar
Kantola, M., Mannila, H., Räihä, K.J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. Int. J. Intell. Syst. 7(7), 591–607 (1992)
Article MATH Google Scholar
Koeller, A., Rundensteiner, E.A.: Heuristic strategies for the discovery of inclusion dependencies and other patterns. In: Spaccapietra, S., Atzeni, P., Chu, W.W., Catarci, T., Sycara, K. (eds.) Journal on Data Semantics V. LNCS, vol. 3870, pp. 185–210. Springer, Heidelberg (2006)
Chapter Google Scholar
Lopes, S., Petit, J., Toumani, F.: Discovering interesting inclusion dependencies: application to logical database tuning. Inf. Syst. 27(1), 1–19 (2002)
Article MATH Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Article Google Scholar
Melton, J., Simon, A.R.: SQL:1999: Understanding Relational Language Components. Morgan Kaufmann, Boston (2001)
Google Scholar
Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)
Article Google Scholar
Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: 12th International Workshop on the Web and Databases, WebDB 2009, Providence, RI, USA, 28 June 2009 (2009)
Google Scholar
Saha, B., Srivastava, D.: Data quality: the other face of big data. In: Cruz, I.F., Ferrari, E., Tao, Y., Bertino, E., Trajcevski, G. (eds.) 30th International Conference on Data Engineering, ICDE 2014, Chicago, IL, USA, 31 March–4 April 2014, pp. 1294–1297. IEEE (2014)
Google Scholar
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: Dayal, U., Whang, K., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y. (eds.) Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 12–15 September 2006, pp. 691–702. ACM (2006)
Google Scholar
Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. Proc. VLDB 3(1), 805–814 (2010)
Article Google Scholar

Download references

Acknowledgement

This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand.

Author information

Authors and Affiliations

Department of Computer Science, University of Auckland, Auckland, New Zealand
Mozhgan Memari, Sebastian Link & Gillian Dobbie

Authors

Mozhgan Memari
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Link
View author publications
You can also search for this author in PubMed Google Scholar
Gillian Dobbie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Link .

Editor information

Editors and Affiliations

Stockholm University, Kista, Sweden
Paul Johannesson
National University of Singapore, Singapore, Singapore
Mong Li Lee
Brigham Young University, Provo, Utah, USA
Stephen W. Liddle
University of Bergen, Bergen, Norway
Andreas L. Opdahl
Universidad Politécnica de Valencia, Valencia, Spain
Óscar Pastor López

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Memari, M., Link, S., Dobbie, G. (2015). SQL Data Profiling of Foreign Keys. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-25264-3_17
Published: 08 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25263-6
Online ISBN: 978-3-319-25264-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics