Abstract
Graph-structured databases have a wide range of emerging applications, e.g., the Semantic Web, eXtensible Markup Language (XML), biological databases and network topologies. To-date, there has already been voluminous real-world (possibly cyclic and schemaless) graph-structured data. Therefore, data engineering in graph-structured databases has recently received a lot of attention, where there are limitations as well as scope for significant developments. In these databases, there exist many different indexes and different query languages, e.g., XQuery, regular expressions, Web Ontology Langauge and subgraph isomorphism, while there are few graphical user interfaces for effectively querying subgraphs. In this paper, we examine and evaluate the current stateof- the-art in graph-structured databases with respect to (i) query languages, (ii) dynamic aspects, (iii) data mining, (iv) graphical user interfaces, and (v) modern computer architecture on graph-structured data. In addition, the incremental maintenance of graph indexes/views will be addressed
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Abiteboul, P. Buneman, and D. Suciu. Data on the web : from relations to semistructured data and XML. Morgan Kaufmann, San Francisco, 2000.
S. Abiteboul, D. Quass, J. Mchugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1:68–88, 1997.
D. Agrawal, D. Ganesan, R. K. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB, 2(1):361–372, 2009.
R. Bramandia, J. Cheng, B. Choi, and J. X. Yu. Optimizing updates of recursive XML views of relations. The VLDB Journal, 18(6):1313–1333, 2009.
R. Bramandia, B. Choi, and W. K. Ng. On incremental maintenance of 2-hop labeling of graphs. In WWW, pages 845–854, 2008.
R. Bramandia, B. Choi, and W. K. Ng. Incremental maintenance of 2-hop labeling of large graphs. TKDE, 22:682–698, 2010.
P. Buneman, M. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. The VLDB Journal, 9(1):76–110, 2000.
D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In LNCS; Vol. 1997, pages 1–25. Springer-Verlag, 2000.
C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB, pages 926–937, 2007.
Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graphstructured data. In SIGMOD, pages 134–144, 2003.
J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857–872, 2007.
J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961–979, 2006.
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. Journal of Computing, 32(5):1338–1355, 2003.
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. http://www.w3.org/TR/NOTE-xml-ql/, 1998.
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. Self-organization and identification of web communities. Computer, 35(3):66–71, 2002.
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, pages 436–445, 1997.
G. Jeh and J. Widom. Mining the space of graph properties. In SIGKDD, pages 187–196, 2004.
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In ICDE, pages 566–575, 2007.
C. Jin, S. S. Bhowmick, X. Xiao, J. Cheng, and B. Choi. Gblender: Towards blending visual query formulation and query processing in graph databases. In SIGMOD, 2010.
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, page 129, 2002.
A. Kawaguchi, S. Nishioka, and H. Motoda. A flash-memory based file system. In TCON’95, pages 13–13, Berkeley, CA, USA, 1995. USENIX Association.
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632, 1999.
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313–320, 2001.
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. TKDE, 16(9):1038–1051, 2004.
S.-W. Lee and B. Moon. Design of flash-based DBMS: an in-page logging approach. In SIGMOD, pages 55–66, 2007.
U. Leser. A query language for biological networks. Bioinformatics, 21(1):33–39, 2005.
J. Li, W. K. Cheung, J. Liu, and C. H. Li. On discovering community trends in social networks. WIIAT, pages 230–237, 2009.
Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In ICDE, pages 1303–1306, 2009.
Y. Li, S. T. On, J. Xu, B. Choi, and H. Hu. Digestjoin: Exploiting fast random reads for flash-based joins. In MDM, pages 152–161, 2009.
Z. Lin, B. He, and B. Choi. A quantitative summary of XML structures. In ER, pages 228–240, 2006.
T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.
S. T. On, H. Hu, Y. Li, and J. Xu. Lazy-update B+-tree for flash devices. In MDM, pages 323–328, 2009.
G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.
P. Pons and M. Latapy. Computing communities in large networks using random walks. In ISCIS, pages 284–293, 2005.
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658–2663, 2004.
R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex XML document collections. In EDBT, pages 237–255, 2004.
R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the hopi index for complex XML document collections. In ICDE, pages 360–371, 2005.
J. Scott. Social Network Analysis: A Handbook. Sage Publications, second. edition, 2000.
M. A. Shah, S. Harizopoulos, J. L. Wiener, and G. Graefe. Fast scans and joins using flash drives. In DaMoN, pages 17–24, 2008.
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364–375, 2008.
J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR, page 731, 1997.
M. Stonebraker et al. The Lowell database research self-assessment. Comm. of the ACM, 48(5):111–118, 2005.
S. Triβl and U. Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, pages 845–856, 2007.
O. Udrea, A. Pugliese, and V. S. Subrahmanian. GRIN: a graph based RDF index. In ICAI, pages 1465–1470, 2007.
J. R. Ullmann. An algorithm for subgraph isomorphism. JACM, 23(1):31–42, 1976.
W3C. OWL web ontology language overview. http://www.w3.org/TR/owl-features, 2004.
W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query, 2008.
H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE, page 75, 2006.
X. Wu, M. L. Lee, and W. Hsu. A prime number labeling scheme for dynamic ordered XML trees. In ICDE, page 66, 2004.
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, page 721, 2002.
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD, pages 335–346, 2004.
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960–993, 2005.
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766–777, 2005.
B. Yang, W. Cheung, and J. Liu. Community mining from signed social networks. TKDE, 19(10):1333–1348, 2007.
B. Yang, J. Liu, and D. Liu. An autonomy-oriented computing approach to community mining in distributed and dynamic networks. AAMAS, 20(2):123–157, 2010.
C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. In SIGMOD, pages 425–436, 2001.
S. Zhang, J. Li, H. Gao, and Z. Zou. A novel approach for efficient supergraph query processing on graph databases. In EDBT, pages 204–215, 2009.
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB, pages 938–949, 2007.
L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181–192, 2008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Choi, B., Hu, H., Xu, J., Cheung, W.K.W., Li, CH., Liu, J. (2011). Data Engineering in Graph Databases. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_26
Download citation
DOI: https://doi.org/10.1007/978-90-481-9794-1_26
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9793-4
Online ISBN: 978-90-481-9794-1
eBook Packages: EngineeringEngineering (R0)