Abstract
Real enterprise databases are usually composed of hundreds of tables, which make querying a complex database a really hard task for unprofessional users, especially when lack of documentation. Schema summarization helps to improve the usability of databases and provides a succinct overview of the entire schema. In this paper, we introduce a novel three-step schema summarization method based on label propagation. First, we exploit varied similarity properties in database schema and propose a measure of table similarity based on Radial Basis Function Kernel, which measures similarity properties comprehensively. Second, we find representative tables as labeled data and annotate the labeled schema graph. Finally, we use label propagation algorithm on the labeled schema graph to classify database schema and create a schema summary. Extensive evaluations demonstrate the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jagadish, H.V., Chapman, A., Elkiss, A., et al.: Making database systems usable. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 13–24. ACM (2007)
Akoka, J., Comyn-Wattiau, I.: Entity-relationship and object-oriented model automatic clustering. Data & Knowledge Engineering 20(2), 87–117 (1996)
Yu, C., Jagadish, H.V.: Schema summarization. Proceedings of the VLDB Endowment, 319–330 (2006)
Sampaio, M., Quesado, J., Barros, S.: Relational Schema Summarization:A Context-Oriented Approach. In: Morzy, T., Härder, T., Wrembel, R. (eds.) Advances in Databases and Information Systems. AISC, vol. 186, pp. 217–228. Springer, Heidelberg (2013)
Yang, X., Procopiuc, C.M., Srivastava, D.: Summarizing relational databases. Proceedings of the VLDB Endowment 2(1), 634–645 (2009)
Wang, X., Zhou, X., Wang, S.: Summarizing Large-Scale Database Schema Using Community Detection. Journal of Computer Science and Technology 27(3), 515–526 (2012)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, vol. 3, pp. 912–919 (2003)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering 20(1), 55–67 (2008)
Vert, J.P., Tsuda, K., Schölkopf, B.: A primer on kernel methods. Kernel Methods in Computational Biology, 35–70 (2004)
Wu, W., Reinwald, B., Sismanis, Y., Manjrekar, R.: Discovering topical structures of databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1019–1030. ACM (2008)
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1983)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10(4), 334–350 (2001)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yuan, X., Li, X., Yu, M., Cai, X., Zhang, Y., Wen, Y. (2014). Summarizing Relational Database Schema Based on Label Propagation. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8709. Springer, Cham. https://doi.org/10.1007/978-3-319-11116-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-11116-2_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11115-5
Online ISBN: 978-3-319-11116-2
eBook Packages: Computer ScienceComputer Science (R0)