Abstract
The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scientific data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-finding queries on this implicit multigraph in a SQL-like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins efficiently, this paper presents a new strategy based on bitmap indexes. We store the RDF data in column-oriented compressed bitmap structures, along with two dictionaries. We find that our bitmap index-based query evaluation approach is up to an order of magnitude faster the state-of-the-art system RDF-3X, for a variety of SPARQL queries on gigascale RDF data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. 33rd Int’l. Conference on Very Large Data Bases (VLDB 2007), pp. 411–422 (2007)
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix ”bit” loaded: a scalable lightweight join query processor for RDF data. In: Proc. 19th Int’l. World Wide Web Conference (WWW), pp. 41–50 (2010)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int’l. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semant. 3, 158–182 (2005)
McGlothlin, J.P., Khan, L.: Efficient RDF data management including provenance and uncertainty. In: Proc.14th Int’l. Database Engineering & Applications Symposium (IDEAS 2010), pp. 193–198 (2010)
McGlothlin, J.P., Khan, L.R.: RDFJoin: A scalable data model for persistence and efficient querying of RDF datasets. Tech. Rep. UTDCS-08-09, Univ. of Texas at Dallas (2008)
Murray, C.: RDF data model in Oracle. Tech. Rep. B19307-01, Oracle (2005)
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. In: Proc. VLDB Endow., vol. 1, pp. 647–659 (August 2008)
O’Neil, P.: Model 204 architecture and performance. In: Proc. of HPTS , vol 359. LNCS, pp. 40–59 (1987)
Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. In: World Wide Web Consortium. Recommendation REC-rdf-sparql-query-20080115 (January 2008)
Redaschi, N.: Uniprot in RDF: Tackling data integration and distributed annotation with the semantic web. In: Proc. 3rd Int’l. Biocuration Conf. (2009)
Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. In: Proc. VLDB Endow., vol. 1, pp. 1553–1563 (August 2008)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A large ontology from Wikipedia and WordNet. Web Semant. 6, 203–217 (2008)
Wu, K., Otoo, E., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM TODS 31(1), 1–38 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Madduri, K., Wu, K. (2011). Massive-Scale RDF Processing Using Compressed Bitmap Indexes. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-22351-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)