Skip to main content

Data Engineering in Graph Databases

  • Conference paper
  • First Online:
Computer and Information Sciences

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 62))

  • 840 Accesses

Abstract

Graph-structured databases have a wide range of emerging applications, e.g., the Semantic Web, eXtensible Markup Language (XML), biological databases and network topologies. To-date, there has already been voluminous real-world (possibly cyclic and schemaless) graph-structured data. Therefore, data engineering in graph-structured databases has recently received a lot of attention, where there are limitations as well as scope for significant developments. In these databases, there exist many different indexes and different query languages, e.g., XQuery, regular expressions, Web Ontology Langauge and subgraph isomorphism, while there are few graphical user interfaces for effectively querying subgraphs. In this paper, we examine and evaluate the current stateof- the-art in graph-structured databases with respect to (i) query languages, (ii) dynamic aspects, (iii) data mining, (iv) graphical user interfaces, and (v) modern computer architecture on graph-structured data. In addition, the incremental maintenance of graph indexes/views will be addressed

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, P. Buneman, and D. Suciu. Data on the web : from relations to semistructured data and XML. Morgan Kaufmann, San Francisco, 2000.

    Google Scholar 

  2. S. Abiteboul, D. Quass, J. Mchugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1:68–88, 1997.

    Article  Google Scholar 

  3. D. Agrawal, D. Ganesan, R. K. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB, 2(1):361–372, 2009.

    Google Scholar 

  4. R. Bramandia, J. Cheng, B. Choi, and J. X. Yu. Optimizing updates of recursive XML views of relations. The VLDB Journal, 18(6):1313–1333, 2009.

    Article  Google Scholar 

  5. R. Bramandia, B. Choi, and W. K. Ng. On incremental maintenance of 2-hop labeling of graphs. In WWW, pages 845–854, 2008.

    Google Scholar 

  6. R. Bramandia, B. Choi, and W. K. Ng. Incremental maintenance of 2-hop labeling of large graphs. TKDE, 22:682–698, 2010.

    Google Scholar 

  7. P. Buneman, M. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. The VLDB Journal, 9(1):76–110, 2000.

    Article  Google Scholar 

  8. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In LNCS; Vol. 1997, pages 1–25. Springer-Verlag, 2000.

    Google Scholar 

  9. C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB, pages 926–937, 2007.

    Google Scholar 

  10. Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graphstructured data. In SIGMOD, pages 134–144, 2003.

    Google Scholar 

  11. J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857–872, 2007.

    Google Scholar 

  12. J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961–979, 2006.

    Google Scholar 

  13. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. Journal of Computing, 32(5):1338–1355, 2003.

    MATH  MathSciNet  Google Scholar 

  14. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. http://www.w3.org/TR/NOTE-xml-ql/, 1998.

  15. G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. Self-organization and identification of web communities. Computer, 35(3):66–71, 2002.

    Article  Google Scholar 

  16. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, pages 436–445, 1997.

    Google Scholar 

  17. G. Jeh and J. Widom. Mining the space of graph properties. In SIGKDD, pages 187–196, 2004.

    Google Scholar 

  18. H. Jiang, H. Wang, P. S. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In ICDE, pages 566–575, 2007.

    Google Scholar 

  19. C. Jin, S. S. Bhowmick, X. Xiao, J. Cheng, and B. Choi. Gblender: Towards blending visual query formulation and query processing in graph databases. In SIGMOD, 2010.

    Google Scholar 

  20. R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, page 129, 2002.

    Google Scholar 

  21. A. Kawaguchi, S. Nishioka, and H. Motoda. A flash-memory based file system. In TCON’95, pages 13–13, Berkeley, CA, USA, 1995. USENIX Association.

    Google Scholar 

  22. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  23. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313–320, 2001.

    Google Scholar 

  24. M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. TKDE, 16(9):1038–1051, 2004.

    Google Scholar 

  25. S.-W. Lee and B. Moon. Design of flash-based DBMS: an in-page logging approach. In SIGMOD, pages 55–66, 2007.

    Google Scholar 

  26. U. Leser. A query language for biological networks. Bioinformatics, 21(1):33–39, 2005.

    MathSciNet  Google Scholar 

  27. J. Li, W. K. Cheung, J. Liu, and C. H. Li. On discovering community trends in social networks. WIIAT, pages 230–237, 2009.

    Google Scholar 

  28. Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In ICDE, pages 1303–1306, 2009.

    Google Scholar 

  29. Y. Li, S. T. On, J. Xu, B. Choi, and H. Hu. Digestjoin: Exploiting fast random reads for flash-based joins. In MDM, pages 152–161, 2009.

    Google Scholar 

  30. Z. Lin, B. He, and B. Choi. A quantitative summary of XML structures. In ER, pages 228–240, 2006.

    Google Scholar 

  31. T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.

    Google Scholar 

  32. S. T. On, H. Hu, Y. Li, and J. Xu. Lazy-update B+-tree for flash devices. In MDM, pages 323–328, 2009.

    Google Scholar 

  33. G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.

    Article  Google Scholar 

  34. P. Pons and M. Latapy. Computing communities in large networks using random walks. In ISCIS, pages 284–293, 2005.

    Google Scholar 

  35. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658–2663, 2004.

    Article  Google Scholar 

  36. R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex XML document collections. In EDBT, pages 237–255, 2004.

    Google Scholar 

  37. R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the hopi index for complex XML document collections. In ICDE, pages 360–371, 2005.

    Google Scholar 

  38. J. Scott. Social Network Analysis: A Handbook. Sage Publications, second. edition, 2000.

    Google Scholar 

  39. M. A. Shah, S. Harizopoulos, J. L. Wiener, and G. Graefe. Fast scans and joins using flash drives. In DaMoN, pages 17–24, 2008.

    Google Scholar 

  40. H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364–375, 2008.

    Google Scholar 

  41. J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR, page 731, 1997.

    Google Scholar 

  42. M. Stonebraker et al. The Lowell database research self-assessment. Comm. of the ACM, 48(5):111–118, 2005.

    Article  Google Scholar 

  43. S. Triβl and U. Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, pages 845–856, 2007.

    Google Scholar 

  44. O. Udrea, A. Pugliese, and V. S. Subrahmanian. GRIN: a graph based RDF index. In ICAI, pages 1465–1470, 2007.

    Google Scholar 

  45. J. R. Ullmann. An algorithm for subgraph isomorphism. JACM, 23(1):31–42, 1976.

    Article  MathSciNet  Google Scholar 

  46. W3C. OWL web ontology language overview. http://www.w3.org/TR/owl-features, 2004.

  47. W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query, 2008.

  48. H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE, page 75, 2006.

    Google Scholar 

  49. X. Wu, M. L. Lee, and W. Hsu. A prime number labeling scheme for dynamic ordered XML trees. In ICDE, page 66, 2004.

    Google Scholar 

  50. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, page 721, 2002.

    Google Scholar 

  51. X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD, pages 335–346, 2004.

    Google Scholar 

  52. X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960–993, 2005.

    Article  Google Scholar 

  53. X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766–777, 2005.

    Google Scholar 

  54. B. Yang, W. Cheung, and J. Liu. Community mining from signed social networks. TKDE, 19(10):1333–1348, 2007.

    Google Scholar 

  55. B. Yang, J. Liu, and D. Liu. An autonomy-oriented computing approach to community mining in distributed and dynamic networks. AAMAS, 20(2):123–157, 2010.

    Google Scholar 

  56. C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. In SIGMOD, pages 425–436, 2001.

    Google Scholar 

  57. S. Zhang, J. Li, H. Gao, and Z. Zou. A novel approach for efficient supergraph query processing on graph databases. In EDBT, pages 204–215, 2009.

    Google Scholar 

  58. P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB, pages 938–949, 2007.

    Google Scholar 

  59. L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181–192, 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Byron Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this paper

Cite this paper

Choi, B., Hu, H., Xu, J., Cheung, W.K.W., Li, CH., Liu, J. (2011). Data Engineering in Graph Databases. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9794-1_26

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9793-4

  • Online ISBN: 978-90-481-9794-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics