Skip to main content

Evaluating Graph Traversal Algorithms for Distributed SPARQL Query Optimization

  • Conference paper
The Semantic Web (JIST 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7185))

Included in the following conference series:

Abstract

Distributed SPARQL queries enable users to retrieve information by exploiting the increasing amount of linked data being published. However, industrial-strength distributed SPARQL query processing is still at its early stage for efficiently answering queries. Previous research shows that it is possible to apply methods from graph theory to optimize the performance of distributed SPARQL. In this paper we describe a framework that can simulate arbitrary RDF data networks to evaluate different approaches of distributed SPARQL query processing. Using this framework we further explore the graph traversal algorithms for distributed SPARQL optimization. We present an implementation of a Minimum-Spanning-Tree-based (MST-based) algorithm for distributed SPARQL processing, the performance of which is compared to other approaches using this evaluation framework. The contribution of this paper is to show that a MST-based approach seems to perform much better than other non graph-traversal-based approaches, and to provide an evaluation framework for evaluating distributed SPARQL processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, A., Kiefer, C., Stocker, M.: OptARQ: A SPARQL Optimization Approach based on Triple Pattern Selectivity Estimation (2007)

    Google Scholar 

  2. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)

    Article  Google Scholar 

  3. Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal On Semantic Web and Information Systems-Special Issue on Scalability and Performance of Semantic Web Systems (2009)

    Google Scholar 

  4. Bouquet, P., Ghidini, C., Serafini, L.: Querying the Web of Data: A Formal Approach. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 291–305. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Correndo, G., Salvadores, M., Millard, I., Glaser, H., Shadbolt, N.: SPARQL query rewriting for implementing data integration over linked data. In: Proceedings of the 2010 EDBT Workshops, pp. 1–11. ACM (2010)

    Google Scholar 

  6. Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory, HP Laboratories Bristol, pp. 2005–170 (2005)

    Google Scholar 

  7. Fletcher, G., Beck, P.: Scalable indexing of RDF graphs for efficient join processing. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1513–1516. ACM (2009)

    Google Scholar 

  8. Gray, A., Gray, N., Ounis, I.: Can RDB2RDF Tools Feasibily Expose Large Science Archives for Data Integration? In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 491–505. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Haas, L., Kossmann, D., Wimmers, E., Yang, J.: Optimizing queries across diverse data sources. In: Proceedings of The International Conference on Very Large Data Bases, pp. 276–285. Citeseer (1997)

    Google Scholar 

  10. Haase, P., Mathäß, T.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, pp. 1–9. ACM (2010)

    Google Scholar 

  11. Hartig, O., Heese, R.: The SPARQL Query Graph Model for Query Optimization. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 564–578. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)

    Article  Google Scholar 

  13. Ladwig, G., Tran, T.: Linked Data Query Processing Strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. McGlothlin, J., Khan, L.: RDFJoin: A Scalable Data Model for Persistence and Efficient Querying of RDF Datasets. Database (2009)

    Google Scholar 

  15. Obermeier, L., Nixon, L.: A Cost Model for Querying Distributed RDF-Repositories with SPARQL. In: Proceedings of the Workshop on Advancing Reasoning on the Web: Scalability and Commonsense Tenerife. Citeseer (2008)

    Google Scholar 

  16. Özsu, M.: Distributed and parallel database systems. ACM Computing Surveys (CSUR), 1–21 (1996)

    Google Scholar 

  17. Özsu, M., Valduriez, P.: Principles of distributed database systems. Prentice Hall (1999)

    Google Scholar 

  18. Pérez, J., Arenas, M.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems, TODS (2009)

    Google Scholar 

  19. Prim, R.: Shortest connection networks and some generalizations. Bell System Technical Journal 36(6), 1389–1401 (1957)

    Google Scholar 

  20. Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008)

    Google Scholar 

  21. Quilitz, B.: Querying Distributed RDF Data Sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  22. Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 82–97. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL Performance Benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)

    Google Scholar 

  24. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)

    Google Scholar 

  25. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceeding of the 17th International Conference on World Wide Web, pp. 595–604. ACM (2008)

    Google Scholar 

  26. Stuckenschmidt, H., Vdovjak, R., Houben, G., Broekstra, J.: Index structures and algorithms for querying distributed RDF repositories. In: Proceedings of the 13th International Conference on World Wide Web, pp. 631–639. ACM (2004)

    Google Scholar 

  27. Vandervalk, B.P., McCarthy, E.L., Wilkinson, M.D.: Optimization of Distributed SPARQL Queries Using Edmonds’ Algorithm and Prim’s Algorithm. In: International Conference on Computational Science and Engineering, CSE 2009, vol. 1, pp. 330–337. IEEE (2009)

    Google Scholar 

  28. Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D., et al.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of SWDB, vol. 3, pp. 7–8 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, X., Tiropanis, T., Davis, H.C. (2012). Evaluating Graph Traversal Algorithms for Distributed SPARQL Query Optimization. In: Pan, J.Z., et al. The Semantic Web. JIST 2011. Lecture Notes in Computer Science, vol 7185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29923-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29923-0_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29922-3

  • Online ISBN: 978-3-642-29923-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics