GCM-Bench: A Benchmark for RDF Data Management System on Microorganism Data

  • Renfeng LiuEmail author
  • Jungang Xu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 911)


The biological data is growing up to an unprecedented scale, such as microorganism knowledge graph organized by biologists, which is represented by Resource Description Framework (RDF) data model. In this paper, GCM-Bench, a new benchmark to evaluate the performance of general-purpose RDF data management systems on microorganism RDF data is proposed, which consists of microorganism RDF data generator, SPARQL query workloads and automatic test system, that can execute the testing workloads automatically and monitor the resource utilization. Five RDF data management systems are selected for evaluation on different sizes of data using automatic test system. We think GCM-Bench will help microbiologists and system developers to select their proper RDF data management system.


Benchmark RDF SPARQL Scientific computing Evaluation Microorganism 



This work is supported by the National Key Research and Development Plan of China (Grant No. 2016YFB1000600 and 2016YFB1000601).


  1. 1.
    Carroll, J. J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference - Alternate Track Papers & Posters, pp. 74–83. ACM, New York (2004)Google Scholar
  2. 2.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002). Scholar
  3. 3.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRefGoogle Scholar
  4. 4.
    Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)CrossRefGoogle Scholar
  5. 5.
    Peng, P., Zou, L., Őzsu, M.T., Chen, L., Zhao, D.: Processing SPARQL queries over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016)CrossRefGoogle Scholar
  6. 6.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRefGoogle Scholar
  7. 7.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)CrossRefGoogle Scholar
  8. 8.
    Zou, L., Őzsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gStore: a graph-based SPARQL query engine. VLDB J. 23(4), 565–590 (2014)CrossRefGoogle Scholar
  9. 9.
    Zou, L., Mo, J., Chen, L., Őzsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endow. 4(8), 482–493 (2011)CrossRefGoogle Scholar
  10. 10.
    Őzsu, M.T.: A survey of RDF data management systems. Front. Comput. Sci. 10(3), 418–432 (2016)CrossRefGoogle Scholar
  11. 11.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing, vol. 10, no. 10–10, p. 95. USENIX Association, Boston (2010)Google Scholar
  12. 12.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6(4), 265–276 (2013)CrossRefGoogle Scholar
  13. 13.
    Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 505–516. ACM, New York (2013)Google Scholar
  14. 14.
    Karypis, G., Kumar, V.: Analysis of multilevel graph partitioning. In: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing. ACM, New York (1995)Google Scholar
  15. 15.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2–3), 158–182 (2005)CrossRefGoogle Scholar
  16. 16.
    Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology benchmark. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 125–139. Springer, Heidelberg (2006). Scholar
  17. 17.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: a SPARQL performance benchmark. In: Proceedings of the 25th International Conference on Data Engineering, pp. 222–233. IEEE Computer Society, Shanghai, China (2009)Google Scholar
  18. 18.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). Scholar
  19. 19.
    Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009)CrossRefGoogle Scholar
  20. 20.
    Feng, J., Meng, C., Song, J., Zhang, X., Feng, Z., Zou, L.: SPARQL query parallel processing: a survey. In: Proceedings of the 2017 IEEE International Congress on Big Data, pp. 444–451. IEEE Computer Society, Honolulu (2017)Google Scholar
  21. 21.
    Berners-Lee, T., Connolly, D.: Notation3 (N3): a readable RDF syntax. Last Accessed 2 Apr 2018
  22. 22.
    Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., et al.: Bigdatabench: a big data benchmark suite from internet services. In: IEEE International Symposium On High Performance Computer Architecture (HPCA), pp. 488–499 (2014)Google Scholar
  23. 23.
    Jia, Z., Zhan, J., Wang, L., Luo, C., Gao, W., Jin, Y., et al.: Understanding big data analytics workloads on modern processors. IEEE Trans. Parallel Distrib. Syst. 28(6), 1797–1810 (2017)CrossRefGoogle Scholar
  24. 24.
    Gao, W., Zhan, J., Wang, L., Luo, C., Zheng, D., et al.: Data Motifs: a lens towards fully understanding big data and AI workloads. In: Parallel Architectures and Compilation Techniques (PACT). IEEE, Limassol, Cyprus (2018)Google Scholar
  25. 25.
    Gao, W., Zhan, J., Wang, L., Luo, C., Jia, Z., et al.: Data Motif-based proxy benchmarks for big data and AI workloads. In: 2018 IEEE International Symposium on Workload Characterization. IEEE, Raleigh (2018)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations