Abstract
Implementing scalable RDF triple stores that can store many triples and process many queries concurrently is challenging. Several projects have investigated the use of distributed hash tables for this task but query planning has received little attention in this context so far. Given the distributed nature of DHTs, latencies of messages and limited network bandwidth are crucial factors to consider. Also due to a lack of global knowledge in DHTs, query planning is different from centralized databases. This book chapter discusses a set of heuristics and evaluates their performance on the Lehigh University Benchmark with emphasis on the network traffic. The results show the importance of query planning in DHT based RDF triple stores.1
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Bibliography
D. Battr´e. Query Planning in DHT based RDF stores. In Proceedings of Fourth International IEEE Conference on Signal-Image Technologies and Internet-Based System (SITIS 2008), pp. 187–194, (2008).
T. Berners-Lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American. 284(5), 28–47 (May, 2001).
E. Oren, B. Heitmann, and S. Decker, ActiveRDF: Embedding Semantic Web data into objectoriented languages, Web Semantics: Science, Services and Agents on the World Wide Web. 6(3), 191–202, (2008).
M. Hepp. GoodRelations: An Ontology for Describing Products and Services Offers on the Web. In Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), (2008).
N. Shadbolt, T. Berners-Lee, and W. Hall, The Semantic Web Revisited, IEEE Intelligent Systems. 21(3), 96–101, (2006).
C. Bizer, T. Heath, D. Ayers, and Y. Raimond. Linking Open Data (ESWC 2007 Poster), (2007).
M. Hausenblas, W. Halb, Y. Raimond, and T. Heath. What is the Size of the Semantic Web? In
Proceedings of I-SEMANTICS 08 - International Conference on Semantic Systems, pp. 9–16, (2008).
E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, A survey and comparison of peer-topeer overlay network schemes, IEEE Communications Surveys & Tutorials. 7(2), 72–93, (2005).
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB ’07: Proceedings of the 33rd international conference on Very large data bases, pp. 411–422. VLDB Endowment, (2007). ISBN 978-1-59593-649-3.
K.Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In Proceedings of SWDB’03, The first International Workshop on Semantic Web and Databases, pp. 131–150, (2003).
J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the semantic web recommendations. In WWW Alt. ’04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 74–83, New York, NY, USA, (2004). ACM Press. ISBN 1-58113-912-8. doi: http://doi.acm.org/10.1145/1013367.1013381.
E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An efficient SQL-based RDF querying scheme. In VLDB ’05: Proceedings of the 31st international conference on Very large data bases, pp. 1216–1227. VLDB Endowment, (2005). ISBN 1-59593-154-6.
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC ’02: Proceedings of the First International Semantic Web Conference on The Semantic Web, pp. 54–68, London, UK, (2002). Springer-Verlag. ISBN 3-540-43760-6.
S. Harris and N. Gibbins. 3store: Efficient Bulk RDF Storage. In eds. R. Volz, S. Decker, and I. F. Cruz, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, vol. 89, CEUR Workshop Proceedings. CEUR-WS.org, (2003).
T. Neumann and G.Weikum, RDF-3X: a RISC-style engine for RDF, Proceedings of the VLDB Endowment. 1(1), 647–659, (2008).doi:http://doi.acm.org/10.1145/1453856.1453927.
W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek, A. Naeve, M. Nilsson, M. Palm´er, and T. Risch. EDUTELLA: A P2P networking infrastructure based on RDF. In WWW ’02: Proceedings of the 11th international conference on World Wide Web, pp. 604– 615, New York, NY, USA, (2002). ACM Press. ISBN 1-58113-449-5. doi: http://doi.acm.org/10.1145/511446.511525.
W. Nejdl,W. Siberski, U. Thaden, and W.-T. Balke. Top-k Query Evaluation for Schema-Based Peer-to-Peer Networks. In eds. S. A. McIlraith, D. Plexousakis, and F. van Harmelen, The Semantic Web - ISWC 2004: Third International Semantic Web Conference, Lecture Notes in Computer Science, vol. 3298, pp. 137–151 (Jan., 2004).
W. Nejdl, M. Wolpers, W. Siberski, C. Schmitz, M. Schlosser, I. Brunkhorst, and A. L¨oser. Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-To-Peer Networks. In WWW ’03: Proceedings of the 12th international conference on World Wide Web, pp. 536–543, New York, NY, USA, (2003). ACMPress. ISBN 1-58113-680-3. doi: http://doi.acm.org/10.1145/775152.775229.
I. Brunkhorst, H. Dhraief, A. Kemper,W. Nejdl, and C.Wiesner. Distributed Queries and Query Optimization in Schema-Based P2P-Systems. In International Workshop On Databases, Information Systems and Peer-to-Peer Computing, pp. 184–199, (2003).
G. Kokkinidis, L. Sidirourgos, and V. Christophides, Semantic Web and Peer-to-Peer, In Semantic Web and Peer-to-Peer, chapter Query Processing in RDF/S-based P2P Database Systems, pp. 59–81. Springer, (2006).
M. Cai and M. Frank. RDFPeers: A Scalable Distributed RDF Repository based on A Structured Peer-to-Peer Network. In Proceedings of the 13th International World Wide Web Conference (WWW2004), pp. 650–657 (May, 2004).
M. Cai, M. Frank, B. Pan, and R. MacGregor, A Subscribable Peer-to-Peer RDF Repository for Distributed Metadata Management, Journal of Web Semantics: Science, Services and Agents on the World Wide Web. 2(2), 109–130, (2004).
M. Cai,M. Frank, J. Chen, and P. Szekely, MAAN: A Multi-Attribute Addressable Network for Grid Information Services, Journal of Grid Computing. 2(1), (2004).
A. Matono, S. M. Pahlevi, and I. Kojima. RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores. In Ref. [55], pp. 323–330. ISBN 978- 3-540-71660-0.
F. Heine, M. Hovestadt, and O. Kao. Processing complex RDF queries over P2P networks. In P2PIR’05: Proceedings of the 2005 ACM workshop on Information retrieval in peer-topeer networks, pp. 41–48. ACM Press, (2005). ISBN 1-59593-164-3. doi: http://doi.acm.org/10.1145/1096952.1096960.
F. Heine. Scalable P2P based RDF Querying. In InfoScale ’06: Proceedings of the 1st international conference on Scalable information systems, p. 17, New York, NY, USA, (2006). ACM Press. ISBN 1-59593-428-6. doi: http://doi.acm.org/10.1145/1146847.1146864.
B. H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors., Communications of the ACM. 13(7), 422–426, (1970).
D. Battr´e, F. Heine, and O. Kao. Top k RDF Query Evaluation in Structured P2P Networks. In eds. W. Nagel, W. Walter, and W. Lehner, Euro-Par 2006 Parallel Processing: 12th International Euro-Par Conference, vol. 4128, LNCS, pp. 995–1004. Springer Berlin / Heidelberg, (2006). doi: 10.1007/11823285.
D. Battr´e, F. Heine, A. H¨oing, and O. Kao. On Triple Dissemination, Forward-Chaining, and Load Balancing in DHT Based RDF Stores. In Ref. [55], pp. 343–354. ISBN 978-3- 540-71660-0.
D. Battr´e, F. Heine, A. H¨oing, and O. Kao. Load-balancing in P2P based RDF stores. In Proceedings of Second International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006), pp. 21–34, (2006).
D. Battr´e, Caching of intermediate results in DHT-based RDF stores, International Journal of Metadata, Semantics and Ontologies. 3(1), 84–93, (2008).
M. Koubarakis, I. Miliaraki, Z. Kaoudi, M. Magiridou, and A. Papadakis-Pesaresi. Semantic Grid Resource Discovery using DHTs in Atlas. In 3rd GGF Semantic Grid Workshop (Feb., 2006).
E. Liarou, S. Idreos, and M. Koubarakis. Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks. In eds. I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, The Semantic Web – ISWC 2006, vol. 4273, LNCS, pp. 399–413 (Nov., 2006).
E. Liarou, S. Idreos, and M. Koubarakis. Continuous RDF Query Processing over DHTs. In Ref. [56], pp. 324–339. ISBN 978-3-540-76297-3.
Z. Kaoudi, I. Miliaraki, and M. Koubarakis. RDFS Reasoning and Query Answering on Top of DHTs. In 7th International Semantic Web Conference (ISWC 2008), (2008).
S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz. Handling churn in a DHT. In ATEC’04: Proceedings of the USENIX Annual Technical Conference 2004 on USENIX Annual Technical Conference, pp. 10–10, Berkeley, CA, USA, (2004). USENIX Association.
K. Aberer, P. Cudr´e-Mauroux, M. Hauswirth, and T. van Pelt. GridVine: Building Internet- Scale Semantic Overlay Networks. In International Semantic Web Conference (ISWC), vol. 3298, LNCS, pp. 107–121, (2004).
P. Cudr´e-Mauroux, S. Agarwal, and K. Aberer, GridVine: An Infrastructure for Peer Information Management, IEEE Internet Computing. 11(5), 36–44, (2007). ISSN 1089-7801. doi:http://doi.ieeecomputersociety.org/10.1109/MIC.2007.108.
K. Aberer, P. Cudr´e-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt, P-Grid: a self-organizing structured P2P system, SIGMOD Rec. 32(3), 29–
33, (2003). ISSN 0163-5808. doi: http://doi.acm.org/10.1145/945721.945729.
A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In Ref. [56], pp. 211–224. ISBN 978-3-540- 76297-3.
A. Harth and S. Decker. Optimized Index Structures for Querying RDF from the Web. In LA-WEB ’05: Proceedings of the Third Latin American Web Congress, pp. 71–80, Washington, DC, USA, (2005). IEEE Computer Society. ISBN 0-7695-2471-0. doi: http://dx.doi.org/10.1109/LAWEB.2005.25.
O. Hartig and R. Heese. The SPARQL Query Graph Model for Query Optimization. In The Semantic Web: Research and Applications (ESWC 2007), vol. 4519/2007, LNCS, pp. 564–578, (2007). doi: 10.1007/978-3-540-72667-8.
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation. In Proceedings of the 17th International World Wide Web Conference (WWW), (2008).
D. Battr´e. Efficient Query Processing in DHT-based RDF Stores. PhD thesis, Technische Universit¨at Berlin, Germany (Dec., 2008). URL http://nbn-resolving.de/urn: nbn:de:kobv:83-opus-21188.
A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica. Load Balancing in Structured P2P Systems. In Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS 03). Springer, (2003).
S. Surana, B. Godfrey, K. Lakshminarayanan, R. Karp, and I. Stoica, Load Balancing in Dynamic Structured P2P Systems, Performance Evaluation. 63(6), 217–240 (Mar., 2006).
Y. Zhu and Y. Hu, Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems, IEEE Transactions on Parallel and Distributed Systems. 16(4), 349–361, (2005).
Y. Guo, Z. Pan, and J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, Journal of Web Semantics. 3(2), 158–182, (2005).
D. E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching. (Addison Wesley, 1998).
F. Heine. P2P based RDF Querying and Reasoning for Grid Resource Description and Matching. PhD thesis, University of Paderborn, Germany (July, 2006).
N. J. A. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. In USENIX Symposium on Internet Technologies and Systems, Seattle, WA (Mar., 2003).
P. Ganesan, M. Bawa, and H. Garcia-Molina. Online Balancing of Range-Partitioned Data with
Applications to Peer-to-Peer Systems. In eds. M. A. Nascimento, M. T. ¨Ozsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 444–455. Morgan Kaufmann, (2004). ISBN 0-12-088469-0.
Y. Chen and W. Benn. Query Evaluation for Distributed Heterogeneous Relational Databases. In COOPIS ’98: Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems, pp. 44–53, Washington, DC, USA, (1998). IEEE Computer Sociess.
M.-S. Chen, P. S. Yu, and K.-L.Wu, Optimization of Parallel Execution for Multi-Join Queries, IEEE Transactions on Knowledge and Data Engineering. 8, 416–428, (1996).
G. Moro, S. Bergamaschi, S. Joseph, J.-H. Morin, and A. M. Ouksel, Eds. Databases, Information Systems, and Peer-to-Peer Computing, International Workshops, DBISP2P 2005/2006, Trondheim, Norway, August 28-29, 2005, Seoul, Korea, September 11, 2006, Revised Selected Papers, vol. 4125, Lecture Notes in Computer Science, (2007). Springer. ISBN 978-3-540-71660-0.
K. Aberer, K.-S. Choi, N. F. Noy, D. Allemang, K.-I. Lee, L. J. B. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudr´e-Mauroux, Eds. The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007, vol. 4825, Lecture Notes in Computer Science, (2007). Springer. ISBN 978-3-540-76297-3. April 18, 2010 11:57 Atlantis Press Book - 9.75in x 6.5in book˙Mansoor
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Atlantis Press/World Scientific
About this chapter
Cite this chapter
Battré, D. (2010). Query Planning in DHT Based RDF Stores. In: Web-Based Information Technologies and Distributed Systems. Atlantis Ambient and Pervasive Intelligence, vol 2. Atlantis Press. https://doi.org/10.2991/978-94-91216-32-9_4
Download citation
DOI: https://doi.org/10.2991/978-94-91216-32-9_4
Publisher Name: Atlantis Press
Online ISBN: 978-94-91216-32-9
eBook Packages: Computer ScienceComputer Science (R0)