Skip to main content

SQL Data Profiling of Foreign Keys

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

Abstract

Referential integrity is one of the three inherent integrity rules and can be enforced in databases using foreign keys. However, in many real world applications referential integrity is not enforced since foreign keys remain disabled to ease data acquisition. Important applications such as anomaly detection, data integration, data modeling, indexing, reverse engineering, schema design, and query optimization all benefit from the discovery of foreign keys. Therefore, the profiling of foreign keys from dirty data is an important yet challenging task. We raise the challenge further by diverting from previous research in which null markers have been ignored. We propose algorithms for profiling unary and multi-column foreign keys in the real world, that is, under the different semantics for null markers of the SQL standard. While state of the art algorithms perform well in the absence of null markers, it is shown that they perform poorly in their presence. Extensive experiments demonstrate that our algorithms perform as well in the real world as state of the art algorithms perform in the idealized special case where null markers are ignored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tpc.org/tpcc/ and http://www.tpc.org/tpch/.

References

  1. Bauckmann, J., Abedjan, Z., Leser, U., Müller, H., Naumann, F.: Discovering conditional inclusion dependencies. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, October 29–November 02, 2012, pp. 2094–2098. ACM (2012)

    Google Scholar 

  2. Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K. (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1448–1450. IEEE (2007)

    Google Scholar 

  3. Casanova, M.A., Fagin, R., Papadimitriou, C.H.: Inclusion dependencies and their interaction with functional dependencies. J. Comput. Syst. Sci. 28(1), 29–59 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chen, Z., Narasayya, V.R., Chaudhuri, S.: Fast foreign-key detection in Microsoft SQL server PowerPivot for Excel. Proc. VLDB 7(13), 1417–1428 (2014)

    Article  Google Scholar 

  5. Codd, E.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)

    Article  MATH  Google Scholar 

  6. De Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. Int. J. Intell. Syst. 32(1), 53–73 (2009)

    Article  Google Scholar 

  7. Heise, A., Quiané-Ruiz, J., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB 7(4), 301–312 (2013)

    Article  Google Scholar 

  8. Kantola, M., Mannila, H., Räihä, K.J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. Int. J. Intell. Syst. 7(7), 591–607 (1992)

    Article  MATH  Google Scholar 

  9. Koeller, A., Rundensteiner, E.A.: Heuristic strategies for the discovery of inclusion dependencies and other patterns. In: Spaccapietra, S., Atzeni, P., Chu, W.W., Catarci, T., Sycara, K. (eds.) Journal on Data Semantics V. LNCS, vol. 3870, pp. 185–210. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Lopes, S., Petit, J., Toumani, F.: Discovering interesting inclusion dependencies: application to logical database tuning. Inf. Syst. 27(1), 1–19 (2002)

    Article  MATH  Google Scholar 

  11. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)

    Article  Google Scholar 

  12. Melton, J., Simon, A.R.: SQL:1999: Understanding Relational Language Components. Morgan Kaufmann, Boston (2001)

    Google Scholar 

  13. Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)

    Article  Google Scholar 

  14. Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: 12th International Workshop on the Web and Databases, WebDB 2009, Providence, RI, USA, 28 June 2009 (2009)

    Google Scholar 

  15. Saha, B., Srivastava, D.: Data quality: the other face of big data. In: Cruz, I.F., Ferrari, E., Tao, Y., Bertino, E., Trajcevski, G. (eds.) 30th International Conference on Data Engineering, ICDE 2014, Chicago, IL, USA, 31 March–4 April 2014, pp. 1294–1297. IEEE (2014)

    Google Scholar 

  16. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: Dayal, U., Whang, K., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y. (eds.) Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 12–15 September 2006, pp. 691–702. ACM (2006)

    Google Scholar 

  17. Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. Proc. VLDB 3(1), 805–814 (2010)

    Article  Google Scholar 

Download references

Acknowledgement

This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Link .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Memari, M., Link, S., Dobbie, G. (2015). SQL Data Profiling of Foreign Keys. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics