Abstract
In this paper, we investigate the problem of efficiently evaluating SPARQL queries, over large amount of linked data utilizing distributed NoSQL system. We propose an efficient approach for partitioning large linked data graphs using distributed frameworks (MapReduce), as well as an effective data model for storing linked data in a document database using a maximum replication factor of 2 (i.e., in the worst case scenario, the data graph will be doubled in storage size). The model proposed and the partitioning approach ensure high-performance query evaluation and horizontal scaling for the type of queries called generalized star queries (i.e., queries allowing both subject-object and object-subject edges from a central node), due to the fact that no joining operations over multiple datasets are required to evaluate the queries. Furthermore, we present an implementation of our approach using MongoDB and an algorithm for translating generalized star queries into MongoDB query language, based on the proposed data model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this paper we do not consider typed literals.
- 2.
Note that not all variables of Q necessarily appear in the output pattern O(Q) of Q.
References
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_16
Apache Jena. https://jena.apache.org/
Virtuoso Universal Server. https://virtuoso.openlinksw.com/
Rohloff, K., Schantz, R.E.: Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In: 4th International Workshop on Data-Intensive Distributed Computing, DIDC 2011, pp. 35–44 (2011)
Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: mapping SPARQL to Pig Latin. In: SWIM 2011, pp. 4:1–4:8. ACM (2011)
Du, J.-H., Wang, H.-F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. In: Huang, D.-S., Ma, J., Jo, K.-H., Gromiha, M.M. (eds.) ICIC 2012. LNCS (LNAI), vol. 7390, pp. 633–641. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31576-3_80
Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: ICDE 2013, pp. 565–576. IEEE (2013)
Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In: ICPCA 2011, pp. 363–366. IEEE (2011)
MongoDB, NoSQL Document Database. https://www.mongodb.com/
Apache HBase. https://hbase.apache.org/
Neo4j Graph Platform. https://neo4j.com/
Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. VLDB Endow. 3(1–2), 330–339 (2010)
Gallego, M.A., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. In: USEWOD Workshop (2011)
Kalogeros, E., Gergatsoulis, M., Damigos, M.: Redundancy in linked data partitioning for efficient query evaluation. In: FiCloud 2015, pp. 497–504. IEEE (2015)
Nomikos, C., Gergatsoulis, M., Kalogeros, E., Damigos, M.: A Map-Reduce algorithm for querying linked data based on query decomposition into stars. In: Workshops of EDBT/ICDT 2014, vol. 1133, pp. 224–231. CEUR-WS (2014)
Gergatsoulis, M., Nomikos, C., Kalogeros, E., Damigos, M.: An algorithm for querying linked data using map-reduce. In: Hameurlain, A., Rahayu, W., Taniar, D. (eds.) Globe 2013. LNCS, vol. 8059, pp. 51–62. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40053-7_5
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD Conference 2008, pp. 1099–1110. ACM (2008)
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: high-performance distributed joins over large-scale RDF graphs. In: IEEE BigData 2013, pp. 255–263. IEEE (2013)
Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: CLOUD-I (2012)
Apache Accumulo. https://accumulo.apache.org/
Apache Cassandra. http://cassandra.apache.org/
Amazon DynamoDB. https://aws.amazon.com/dynamodb/
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. VLDB Endow. 9(10), 804–815 (2016)
Apache Spark. http://spark.apache.org/
Mutharaju, R., Sakr, S., Sala, A., Hitzler, P.: D-SPARQ: distributed, scalable and efficient RDF query engine. In: ISWC-PD 2013, vol. 1035, pp. 261–264, CEUR-WS (2013)
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13
Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: ICDE 2015, pp. 795–806. IEEE (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Apache Hadoop. http://hadoop.apache.org/
Fox, A., Brewer, E.A.: Harvest, yield, and scalable tolerant systems. In: 7th Workshop on Hot Topics in Operating Systems, pp. 174–178. IEEE (1999)
eXist-db - The Open Source Native XML Database. http://exist-db.org/
Apache CouchDB. http://couchdb.apache.org/
JSON (JavaScript Object Notation). http://www.json.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kalogeros, E., Gergatsoulis, M., Damigos, M. (2019). Document Based RDF Storage Method for Efficient Parallel Query Processing. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds) Metadata and Semantic Research. MTSR 2018. Communications in Computer and Information Science, vol 846. Springer, Cham. https://doi.org/10.1007/978-3-030-14401-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-14401-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14400-5
Online ISBN: 978-3-030-14401-2
eBook Packages: Computer ScienceComputer Science (R0)