Skip to main content

Big Data Analytics: Exploring Graphs with Optimized SQL Queries

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Abstract

Nowadays there is an abundance of tools and systems to analyze large graphs. In general, the goal is to summarize the graph and discover interesting patterns hidden in the graph. On the other hand, there is a lot of data stored on DBMSs that can be potentially analyzed as graphs. External graph data sets can be quickly loaded. It is feasible to load data quickly and that SQL can help prepare graph data sets from raw data. In this paper, we show SQL queries on a graph stored in relational form as triples can reveal many interesting properties and patterns on the graph in a more flexible manner and efficient than existing systems. We explain many interesting statistics on the graph can be derived with queries combining joins and aggregations. On the other hand, linearly recursive queries can summarize interesting patterns including reachability, paths, and connected components. We experimentally show exploratory queries can be efficiently evaluated based on the input edges and it performs better than Spark. We also show that skewed degree vertices, cycles and cliques are the main reason exploratory queries become slow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases : The Logical Level, Facsimile edn. Pearson Education POD, Boston (1994)

    Google Scholar 

  2. Agrawal, R., Dar, S., Jagadish, H.: Direct and transitive closure algorithms: design and performance evaluation. ACM TODS 15(3), 427–458 (1990)

    Article  MathSciNet  Google Scholar 

  3. Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)

    Google Scholar 

  4. Cabrera, W., Ordonez, C.: Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries. Distrib. Parallel Databases 35(3–4), 335–362 (2017)

    Article  Google Scholar 

  5. Jindal, A., Rawlani, P., Wu, E., Madden, S., Deshpande, A., Stonebraker, M.: VERTEXICA: your relational friend for graph analytics!. Proc. VLDB Endow. 7(13), 1669–1672 (2014)

    Article  Google Scholar 

  6. Johnson, T., Kanza, Y., Lakshmanan, L.V.S., Shkapenyuk, V.: Nepal: a path query language for communication networks. In: Proceedings of the 1st ACM SIGMOD Workshop on Network Data Analytics, NDA 2016, pp. 6:1–6:8 (2016)

    Google Scholar 

  7. Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238 (2009)

    Google Scholar 

  8. Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5, 1790–1801 (2012)

    Article  MathSciNet  Google Scholar 

  9. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data

  10. Libkin, L., Wong, L.: Incremental recomputation of recursive queries with nested sets and aggregate functions. In: Cluet, S., Hull, R. (eds.) DBPL 1997. LNCS, vol. 1369, pp. 222–238. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-64823-2_13

    Chapter  Google Scholar 

  11. Mumick, I., Finkelstein, S., Pirahesh, H., Ramakrishnan, R.: Magic Conditions. ACM TODS 21(1), 107–155 (1996)

    Article  Google Scholar 

  12. Mumick, I., Pirahesh, H.: Implementation of magic-sets in a relational database system. In: ACM SIGMOD, pp. 103–114 (1994)

    Google Scholar 

  13. Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2016)

    Article  Google Scholar 

  14. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD Conference, pp. 165–178 (2009)

    Google Scholar 

  15. Rodriguez, M.A.: The Gremlin graph traversal machine and language (invited talk). In: Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015, pp. 1–10 (2015)

    Google Scholar 

  16. Seshadri, S., Naughton, J.: On the expected size of recursive Datalog queries. In: Proceedings of ACM PODS Conference, pp. 268–279 (1991)

    Google Scholar 

  17. Siek, J., Lee, L.Q., Lumsdaine, A.: Boost c++ libraries. https://www.boost.org/

  18. Sakr, S., Elnikety, S., He, Y.: Hybrid query execution engine for large attributed graphs. Inf. Syst. 41, 45–73 (2014)

    Article  Google Scholar 

  19. Tetzel, F., Voigt, H., Paradies, M., Lehner, W.: An analysis of the feasibility of graph compression techniques for indexing regular path queries. In: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, GRADES 2017, pp. 11:1–11:6 (2017)

    Google Scholar 

  20. Thakkar, H., Punjani, D., Auer, S., Vidal, M.-E.: Towards an integrated graph algebra for graph pattern matching with Gremlin. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017 Part I. LNCS, vol. 10438, pp. 81–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_6

    Chapter  Google Scholar 

  21. Ullman, J.: Implementation of logical query languages for databases. ACM Trans. Database Syst. 10(3), 289–321 (1985)

    Article  Google Scholar 

  22. Valduriez, P., Boral, H.: Evaluation of recursive queries using join indices. In: Expert Database Systems, pp. 271–293 (1986)

    Google Scholar 

  23. Youn, C., Kim, H., Henschen, L., Han, J.: Classification and compilation of linear recursive queries in deductive databases. IEEE TKDE 4(1), 52–67 (1992)

    Google Scholar 

  24. Zhao, K., Yu, J.X.: All-in-one: graph processing in RDBMSs revisited. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pp. 1165–1180 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sikder Tahsin Al-Amin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al-Amin, S.T., Ordonez, C., Bellatreche, L. (2018). Big Data Analytics: Exploring Graphs with Optimized SQL Queries. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99133-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99132-0

  • Online ISBN: 978-3-319-99133-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics