Abstract
Nowadays there is an abundance of tools and systems to analyze large graphs. In general, the goal is to summarize the graph and discover interesting patterns hidden in the graph. On the other hand, there is a lot of data stored on DBMSs that can be potentially analyzed as graphs. External graph data sets can be quickly loaded. It is feasible to load data quickly and that SQL can help prepare graph data sets from raw data. In this paper, we show SQL queries on a graph stored in relational form as triples can reveal many interesting properties and patterns on the graph in a more flexible manner and efficient than existing systems. We explain many interesting statistics on the graph can be derived with queries combining joins and aggregations. On the other hand, linearly recursive queries can summarize interesting patterns including reachability, paths, and connected components. We experimentally show exploratory queries can be efficiently evaluated based on the input edges and it performs better than Spark. We also show that skewed degree vertices, cycles and cliques are the main reason exploratory queries become slow.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases : The Logical Level, Facsimile edn. Pearson Education POD, Boston (1994)
Agrawal, R., Dar, S., Jagadish, H.: Direct and transitive closure algorithms: design and performance evaluation. ACM TODS 15(3), 427–458 (1990)
Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)
Cabrera, W., Ordonez, C.: Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries. Distrib. Parallel Databases 35(3–4), 335–362 (2017)
Jindal, A., Rawlani, P., Wu, E., Madden, S., Deshpande, A., Stonebraker, M.: VERTEXICA: your relational friend for graph analytics!. Proc. VLDB Endow. 7(13), 1669–1672 (2014)
Johnson, T., Kanza, Y., Lakshmanan, L.V.S., Shkapenyuk, V.: Nepal: a path query language for communication networks. In: Proceedings of the 1st ACM SIGMOD Workshop on Network Data Analytics, NDA 2016, pp. 6:1–6:8 (2016)
Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238 (2009)
Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5, 1790–1801 (2012)
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Libkin, L., Wong, L.: Incremental recomputation of recursive queries with nested sets and aggregate functions. In: Cluet, S., Hull, R. (eds.) DBPL 1997. LNCS, vol. 1369, pp. 222–238. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-64823-2_13
Mumick, I., Finkelstein, S., Pirahesh, H., Ramakrishnan, R.: Magic Conditions. ACM TODS 21(1), 107–155 (1996)
Mumick, I., Pirahesh, H.: Implementation of magic-sets in a relational database system. In: ACM SIGMOD, pp. 103–114 (1994)
Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2016)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD Conference, pp. 165–178 (2009)
Rodriguez, M.A.: The Gremlin graph traversal machine and language (invited talk). In: Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015, pp. 1–10 (2015)
Seshadri, S., Naughton, J.: On the expected size of recursive Datalog queries. In: Proceedings of ACM PODS Conference, pp. 268–279 (1991)
Siek, J., Lee, L.Q., Lumsdaine, A.: Boost c++ libraries. https://www.boost.org/
Sakr, S., Elnikety, S., He, Y.: Hybrid query execution engine for large attributed graphs. Inf. Syst. 41, 45–73 (2014)
Tetzel, F., Voigt, H., Paradies, M., Lehner, W.: An analysis of the feasibility of graph compression techniques for indexing regular path queries. In: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, GRADES 2017, pp. 11:1–11:6 (2017)
Thakkar, H., Punjani, D., Auer, S., Vidal, M.-E.: Towards an integrated graph algebra for graph pattern matching with Gremlin. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017 Part I. LNCS, vol. 10438, pp. 81–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_6
Ullman, J.: Implementation of logical query languages for databases. ACM Trans. Database Syst. 10(3), 289–321 (1985)
Valduriez, P., Boral, H.: Evaluation of recursive queries using join indices. In: Expert Database Systems, pp. 271–293 (1986)
Youn, C., Kim, H., Henschen, L., Han, J.: Classification and compilation of linear recursive queries in deductive databases. IEEE TKDE 4(1), 52–67 (1992)
Zhao, K., Yu, J.X.: All-in-one: graph processing in RDBMSs revisited. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pp. 1165–1180 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Al-Amin, S.T., Ordonez, C., Bellatreche, L. (2018). Big Data Analytics: Exploring Graphs with Optimized SQL Queries. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-99133-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)