Advertisement

Efficient Foreign Key Discovery Based on Nearest Neighbor Search

  • Xiaojie Yuan
  • Xiangrui Cai
  • Man Yu
  • Chao Wang
  • Ying ZhangEmail author
  • Yanlong Wen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)

Abstract

With rapid growth of data size and schema complexity, many data sets are structured in tables but without explicit foreign key definitions. Automatically identifying foreign keys among relations will be beneficial to query optimization, schema matching, data integration and database design as well. This paper formulates foreign key discovery as a nearest neighbor search problem and proposes a fast foreign key discovery algorithm. To reduce foreign key candidates, we detect inclusion dependencies first. Then we choose statistical features to represent an attribute and define two attributes’s distance. Finally, foreign keys are discovered by finding nearest neighbors of all primary keys. Experiment results on real and synthetic data sets show that our algorithm can discover foreign keys efficiently.

Keywords

Foreign key Nearest neighbors Schema 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: Proceedings of ICDE, pp. 1448–1450 (2007)Google Scholar
  2. 2.
    Chen, Z., Narasayya, V., Chaudhuri, S.: Fast foreign-key detection in microsoft sql server powerpivot for excel. Proceedings of the VLDB Endowment 7(13) (2014)Google Scholar
  3. 3.
    De Marchi, F., Lopes, S., Petit, J.M.: Unary & n-ary inclusion dependency discovery in relational databases. Journal of Intelligent Information Systems 32(1) (2009)Google Scholar
  4. 4.
    Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: 12th International Workshop on the Web and Databases, Providence, Rhode Island, USA (2009)Google Scholar
  5. 5.
    Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. PVLDB 3(1), 805–814 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Xiaojie Yuan
    • 1
  • Xiangrui Cai
    • 1
  • Man Yu
    • 1
  • Chao Wang
    • 1
  • Ying Zhang
    • 1
    Email author
  • Yanlong Wen
    • 1
  1. 1.College of Computer and Control EngineeringNankai UniversityTianjinPeople’s Republic of China

Personalised recommendations