Advertisement

Presto-RDF: SPARQL Querying over Big RDF Data

  • Mulugeta MammoEmail author
  • Srividya K. BansalEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9093)

Abstract

There has been a rapid increase in the amount of Resource Description Framework (RDF) data on the web. The processing of large volumes of RDF data requires an efficient storage and query-processing engine that can scale well with the volume of data. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook’s Presto is one such example. This paper proposes an architecture based on Presto, called Presto-RDF, that can be used to process big RDF data. An evaluation of performance of Presto in processing big RDF data against Apache Hive is also presented. The results of the experiments show that Presto-RDF framework has a much higher performance than Apache Hive and native RDF store - 4Store and it can be used to process big RDF data.

Keywords

Database performance Evaluation Querying Semantic web data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Luo, Y., Picalausa, F., Fletcher, G.H., Hidders, J., Vansummeren, S.: Storing and indexing massive RDF datasets. In: Semantic Search Over the Web, pp. 31–60. Springer (2012)Google Scholar
  2. 2.
    Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F.L., Miranker, D., Sequeda, J.F., Wylot, M.: NoSql databases for rdf: an empirical evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 310–325. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    RDF, S.: Efficient RDF Storage and Retrieval in Jena2 (2003)Google Scholar
  4. 4.
    Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38(4), 23–28 (2010)CrossRefGoogle Scholar
  5. 5.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 411–422 (2007)Google Scholar
  6. 6.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Presto: Interacting with petabytes of data at Facebook. https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920. (accessed: December 02, 2014)
  8. 8.
    Hammoud, M., etal.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proc. of Intl. Conf. on Vary Large Databases (VLDB 2015)Google Scholar
  9. 9.
    Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of SIGMOD Conference, pp. 909-912 (2014)Google Scholar
  10. 10.
    Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of SIGMOD Conference, pp. 289-300 (2014)Google Scholar
  11. 11.
    Kulkarni, P.: Distributed SPARQL query engine using MapReduce. In: Master of Science, Computer Science, School of Informatics, University of Edinburgh (2010)Google Scholar
  12. 12.
    Leida, M., Chu, A.: Distributed SPARQL query answering over RDF data streams. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 369–378 (2013)Google Scholar
  13. 13.
    Wang, X., Tiropanis, T., Davis, H.C.: Evaluating graph traversal algorithms for distributed SPARQL query optimization. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Wu, Z., Horrocks, I., Mizoguchi, R., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 210–225. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Dutta, A.K., Theobald, M., Schenkel, R.: A Distributed In-Memory SPARQL Query Processor based on Message Passing (2012)Google Scholar
  15. 15.
    Harth, A., Hose, K., Schenkel, R.: Linked Data Management. In: CRC Press (2014)Google Scholar
  16. 16.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP^ 2Bench: a SPARQL performance benchmark. In: Data Engineering, ICDE 2009, pp. 222–233 (2009)Google Scholar
  17. 17.
    The SP2Bench SPARQL Performance Benchmark. http://dbis.informatik.uni-freiburg.de/forschung/projekte/SP2B/. (accessed: December 02, 2014)
  18. 18.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services & Agents on WWW 3(2), 158–182 (2005)CrossRefGoogle Scholar
  19. 19.
    Berlin SPARQL Benchmark. http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/. (accessed: December 02, 2014)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Arizona State UniversityMesaUSA

Personalised recommendations