Skip to main content
Log in

Hub Labels on the database for large-scale graphs with the COLD framework

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Shortest-path computation on graphs is one of the most well-studied problems in algorithmic theory. An aspect that has only recently attracted attention is the use of databases in combination with graph algorithms, so-called distance oracles, to compute shortest-path queries on large graphs. To this purpose, we propose a novel, efficient, pure-SQL framework for answering exact distance queries on large-scale graphs, implemented entirely on an open-source database engine. Our COLD framework (COmpressed Labels on the Database) can answer multiple distance queries (vertex-to-vertex, one-to-many, k-Nearest Neighbors, Reverse k-Nearest Neighbors, Reverse k-Farthest Neighbors and Top-k Range) not handled by previous methods, rendering it a complete database solution for a variety of practical large-scale graph applications. Our experimentation shows that COLD outperforms existing approaches (including popular graph databases) in terms of query time and efficiency, while requiring significantly less storage space than these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://stackoverflow.com/a/23838131.

  2. http://neo4j.com/developer/guide-performance-tuning/.

References

  1. Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2012) Hldb: Location-based services in databases. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp 339–348

  2. Abraham I, Delling D, Goldberg AV, Werneck RF (2011) A hub-based labeling algorithm for shortest paths in road networks. In: Proc. 10th International Symposium on Experimental Algorithms (SEA), pp 230–241

  3. Abraham I, Delling D, Goldberg AV, Werneck RF (2012) Hierarchical hub labelings for shortest paths. In: Proc. 20th Annual European Symposium on Algorithms (ESA), pp 24–35

  4. Afshani P, Brodal GS, Zeh N (2011) Ordered and unordered top-k range reporting in large data sets. In: Proc. Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 390–400

  5. Akiba T, Iwata Y, Kawarabayashi K, Kawata Y (2014) Fast shortest-path distance queries on road networks by pruned highway labeling. In: Proc. 16th Workshop on Algorithm Engineering and Experiments (ALENEX), pp 147–154

  6. Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proc. ACM SIGMOD International Conference on Management of Data, pp 349–360

  7. Akiba T, Iwata Y, Yoshida Y (2015) Pruned landmark labeling. https://github.com/iwiwi/pruned-landmark-labeling

  8. Albert R, Jeong H, Barabási A-L (1999) The diameter of the world wide web. CoRR. arXiv:cond-mat/9907038

  9. Bader DA, Meyerhenke H, Sanders P, Wagner D (eds) (2013) Proceedings of the 10th DIMACS Implementation Challenge Workshop Graph Partitioning and Graph Clustering

  10. Bast H, Delling D, Goldberg AV, Muller-Hannemann M, Pajor T, Sanders P, Wagner D, Werneck RF (2015) Route planning in transportation networks. CoRR. arXiv:abs/1504.05140

  11. Borutta F, Nascimento MA, Niedermayer J, Kröger P (2014) Monochromatic rknn queries in time-dependent road networks. In: Proc. Third ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, pp 26–33

  12. Cheema MA, Shen Z, Lin X, Zhang W (2014) A unified framework for efficiently processing ranking related queries. In: Proc. 17th International Conference on Extending Database Technology (EDBT), pp 427–438

  13. Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proc. of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1082–1090

  14. Cohen E, Halperin E, Kaplan H, Zwick U (2002) Reachability and distance queries via 2-hop labels. In: Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 937–946

  15. Delling D, Dibbelt J, Pajor T, Werneck R (2015) Public transit labeling. In: Proc. 14th International Symposium on Experimental Algorithms(SEA), pp 273–285

  16. Delling D, Goldberg AV, Pajor T, Werneck RF (2011) Customizable route planning. In: Proc. 10th International Conference on Experimental Algorithms (SEA), pp 376–387

  17. Delling D, Goldberg AV, Pajor T, Werneck RF (2014) Robust distance queries on massive networks. In: Proc. 22th Annual European Symposium on Algorithms (ESA), pp 321–333

  18. Delling D, Goldberg AV, Werneck R (2011) Faster batched shortest paths in road networks. In: Proc. 11th Workshop on Algorithmic Approaches for Transportation Modeling, Optimization, and Systems (ATMOS)

  19. Delling D, Goldberg AV, Werneck RF (2013) Hub label compression. In: Proc. 12th International Symposium on Experimental Algorithms (SEA), pp 18–29

  20. Delling D, Werneck R (2015) Customizable point-of-interest queries in road networks. IEEE Trans Knowl Data Eng 27(3):686–698

    Article  Google Scholar 

  21. Delling D, Werneck RFF (2012) Better bounds for graph bisection. In: Proc. 20th Annual European Symposium on Algorithms (ESA), pp 407–418

  22. Efentakis A (2016) Scalable public transportation queries on the database. In: Proc. 19th International Conference on Extending Database Technology (EDBT), pp 527–538

  23. Efentakis A, Efstathiades C, Pfoser D (2015) COLD. revisiting hub labels on the database for large-scale graphs. In: Proc. 14th International Symposium on Advances in Spatial and Temporal Databases (SSTD), pp 22–39

  24. Efentakis A, Pfoser D (2013) Optimizing landmark-based routing and preprocessing. In: Proc. 6th ACM SIGSPATIAL International Workshop on Computational Transportation Science (CTS)

  25. Efentakis A, Pfoser D (2014) GRASP. extending graph separators for the single-source shortest-path problem. In: Proc. 22th Annual European Symposium on Algorithms (ESA), pp 358–370

  26. Efentakis A, Pfoser D (2016) Rehub: Extending hub labels for reverse k-nearest neighbor queries on large-scale networks. J. Exp. Algorithmics 21:1.13:1–1.13:35

  27. Efentakis A, Pfoser D, Vassiliou Y (2015) Salt.aunifiedframeworkforallshortest-path query variants on road networks. In: Proc. 14th International Symposium on Experimental Algorithms (SEA)), pp 298–311

  28. Gavoille C, Peleg D, Pérennes S, Raz R (2001) Distance labeling in graphs. In: Proc. Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SODA ’01, pp 210–219

  29. Gavoille C, Peleg D, Pérennes S, Raz R (2004) Distance labeling in graphs. J. Algorithms 53(1):85–112

    Article  Google Scholar 

  30. Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Proc. 10th Workshop on Algorithm Engineering and Experiments (ALENEX), pp 90–100

  31. Geisberger R, Sanders P, Schultes D, Delling D (2008) Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In: Proc. 7th International Workshop on Experimental Algorithms (WEA), pp 319–333

  32. Hung H-P, Chuang K-T, Chen M-S (2007) Efficient process of top-k range-sum queries over multiple streams with minimized global error, pp 1404–1419

  33. Jiang M, Fu AW, Wong RC, Xu Y (2014) Hop doubling label indexing for point-to-point distance querying on scale-free networks. PVLDB 7(12):1203–1214

    Google Scholar 

  34. Kumar Y, Janardan R, Gupta P (2008) Efficient algorithms for reverse proximity query problems. In: Proc. 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp 39:1–39:10

  35. Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  36. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123

    Article  Google Scholar 

  37. Liao B, LHU, Yiu ML, Gong Z (2015) Beyond millisecond latency knn search on commodity machine. IEEE Trans Knowl Data Eng 27(10):2618–2631

    Article  Google Scholar 

  38. Liu J, Chen H, Furuse K, Kitagawa H (2010) An efficient algorithm for reverse furthest neighbors query with metric index. In: Proc. 21st International Conference on Database and Expert Systems Applications (DEXA): Part II, pp 437–451

  39. Luo Z, Ling TW, Ang C-H, Lee SY, Cui B (2001) Range top/bottom k queries in olap sparse data cubes. In: Proc. 12th International Conference on Database and Expert Systems Applications (DEXA), pp 678–687

  40. McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. In: Proc. 26th Annual Conference on Neural Information Processing Systems, pp 548–556

  41. PostgreSQL (2016) The world’s most advanced open source database. http://www.postgresql.org/

  42. Safar M, Ibrahimi D, Taniar D (2009) Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimedia Systems 15(5):295–308

    Article  Google Scholar 

  43. Sankaranarayanan J, Samet H (2010) Query processing using distance oracles for spatial networks. IEEE Trans Knowl Data Eng 22(8):1158–1175

    Article  Google Scholar 

  44. Sheng C, Tao Y (2012) Dynamic top-k range reporting in external memory. In: Proc. 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp 121–130

  45. Tao Y (2014) A dynamic i/o-efficient structure for one-dimensional top-k range reporting. In: Proc. 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp 256–265

  46. Tran QT, Taniar D, Safar M (2009) Transactions on large-scale data- and knowledge-centered systems i. chapter Reverse K Nearest Neighbor and Reverse Farthest Neighbor Search on Spatial Networks, pp 353–372. Springer-Verlag

  47. Wang S, Cheema MA, Lin X, Zhang Y, Liu D (2016) Efficiently computing reverse k furthest neighbors. In: Proc. 32nd IEEE International Conference on Data Engineering (ICDE), pp 1110–1121

  48. Wang S, Lin W, Yang Y, Xiao X, Zhou S (2015) Efficient route planning on public transportation networks: A labelling approach. In: Proc. 2015 ACM SIGMOD International Conference on Management of Data, pp 967–982

  49. Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proc. 12th IEEE International Conference on Data Mining (ICDM), pp 745–754

  50. Yiu ML, Papadias D, Mamoulis N, Tao Y (2006) Reverse nearest neighbors in large graphs. IEEE Trans Knowl Data Eng 18(4):540–553

    Article  Google Scholar 

  51. Zhong R, Li G, Tan K-L, Zhou L (2013) G-tree: An efficient index for knn search on road networks. In: Proc. 22nd ACM International Conference on Conference on Information Knowledge Management (CIKM), pp 39–48. ACM

Download references

Acknowledgments

This work was partially supported by the project “Research Programs for Excellence 2014-2016 / CitySense-ATHENA R.I.C.” and the EU/Greece funded KRIPIS Action: MEDA Project. D. Pfoser’s work was partially supported by the National Science Foundation under Grant No. 1637541.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandros Efentakis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Efentakis, A., Efstathiades, C. & Pfoser, D. Hub Labels on the database for large-scale graphs with the COLD framework. Geoinformatica 21, 703–732 (2017). https://doi.org/10.1007/s10707-016-0287-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0287-5

Keywords

Navigation