Data Engineering in Graph Databases

Choi, Byron; Hu, Haibo; Xu, Jianliang; Cheung, William K. W.; Li, Chun-Hung; Liu, Jiming

doi:10.1007/978-90-481-9794-1_26

Byron Choi⁷,
Haibo Hu⁷,
Jianliang Xu⁷,
William K. W. Cheung⁷,
Chun-Hung Li⁷ &
…
Jiming Liu⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 62))

840 Accesses

Abstract

Graph-structured databases have a wide range of emerging applications, e.g., the Semantic Web, eXtensible Markup Language (XML), biological databases and network topologies. To-date, there has already been voluminous real-world (possibly cyclic and schemaless) graph-structured data. Therefore, data engineering in graph-structured databases has recently received a lot of attention, where there are limitations as well as scope for significant developments. In these databases, there exist many different indexes and different query languages, e.g., XQuery, regular expressions, Web Ontology Langauge and subgraph isomorphism, while there are few graphical user interfaces for effectively querying subgraphs. In this paper, we examine and evaluate the current stateof- the-art in graph-structured databases with respect to (i) query languages, (ii) dynamic aspects, (iii) data mining, (iv) graphical user interfaces, and (v) modern computer architecture on graph-structured data. In addition, the incremental maintenance of graph indexes/views will be addressed

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Abiteboul, P. Buneman, and D. Suciu. Data on the web : from relations to semistructured data and XML. Morgan Kaufmann, San Francisco, 2000.
Google Scholar
S. Abiteboul, D. Quass, J. Mchugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1:68–88, 1997.
Article Google Scholar
D. Agrawal, D. Ganesan, R. K. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB, 2(1):361–372, 2009.
Google Scholar
R. Bramandia, J. Cheng, B. Choi, and J. X. Yu. Optimizing updates of recursive XML views of relations. The VLDB Journal, 18(6):1313–1333, 2009.
Article Google Scholar
R. Bramandia, B. Choi, and W. K. Ng. On incremental maintenance of 2-hop labeling of graphs. In WWW, pages 845–854, 2008.
Google Scholar
R. Bramandia, B. Choi, and W. K. Ng. Incremental maintenance of 2-hop labeling of large graphs. TKDE, 22:682–698, 2010.
Google Scholar
P. Buneman, M. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. The VLDB Journal, 9(1):76–110, 2000.
Article Google Scholar
D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In LNCS; Vol. 1997, pages 1–25. Springer-Verlag, 2000.
Google Scholar
C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB, pages 926–937, 2007.
Google Scholar
Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graphstructured data. In SIGMOD, pages 134–144, 2003.
Google Scholar
J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857–872, 2007.
Google Scholar
J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961–979, 2006.
Google Scholar
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. Journal of Computing, 32(5):1338–1355, 2003.
MATH MathSciNet Google Scholar
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. http://www.w3.org/TR/NOTE-xml-ql/, 1998.
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. Self-organization and identification of web communities. Computer, 35(3):66–71, 2002.
Article Google Scholar
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, pages 436–445, 1997.
Google Scholar
G. Jeh and J. Widom. Mining the space of graph properties. In SIGKDD, pages 187–196, 2004.
Google Scholar
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In ICDE, pages 566–575, 2007.
Google Scholar
C. Jin, S. S. Bhowmick, X. Xiao, J. Cheng, and B. Choi. Gblender: Towards blending visual query formulation and query processing in graph databases. In SIGMOD, 2010.
Google Scholar
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, page 129, 2002.
Google Scholar
A. Kawaguchi, S. Nishioka, and H. Motoda. A flash-memory based file system. In TCON’95, pages 13–13, Berkeley, CA, USA, 1995. USENIX Association.
Google Scholar
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632, 1999.
Article MATH MathSciNet Google Scholar
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313–320, 2001.
Google Scholar
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. TKDE, 16(9):1038–1051, 2004.
Google Scholar
S.-W. Lee and B. Moon. Design of flash-based DBMS: an in-page logging approach. In SIGMOD, pages 55–66, 2007.
Google Scholar
U. Leser. A query language for biological networks. Bioinformatics, 21(1):33–39, 2005.
MathSciNet Google Scholar
J. Li, W. K. Cheung, J. Liu, and C. H. Li. On discovering community trends in social networks. WIIAT, pages 230–237, 2009.
Google Scholar
Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In ICDE, pages 1303–1306, 2009.
Google Scholar
Y. Li, S. T. On, J. Xu, B. Choi, and H. Hu. Digestjoin: Exploiting fast random reads for flash-based joins. In MDM, pages 152–161, 2009.
Google Scholar
Z. Lin, B. He, and B. Choi. A quantitative summary of XML structures. In ER, pages 228–240, 2006.
Google Scholar
T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.
Google Scholar
S. T. On, H. Hu, Y. Li, and J. Xu. Lazy-update B+-tree for flash devices. In MDM, pages 323–328, 2009.
Google Scholar
G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.
Article Google Scholar
P. Pons and M. Latapy. Computing communities in large networks using random walks. In ISCIS, pages 284–293, 2005.
Google Scholar
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658–2663, 2004.
Article Google Scholar
R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex XML document collections. In EDBT, pages 237–255, 2004.
Google Scholar
R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the hopi index for complex XML document collections. In ICDE, pages 360–371, 2005.
Google Scholar
J. Scott. Social Network Analysis: A Handbook. Sage Publications, second. edition, 2000.
Google Scholar
M. A. Shah, S. Harizopoulos, J. L. Wiener, and G. Graefe. Fast scans and joins using flash drives. In DaMoN, pages 17–24, 2008.
Google Scholar
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364–375, 2008.
Google Scholar
J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR, page 731, 1997.
Google Scholar
M. Stonebraker et al. The Lowell database research self-assessment. Comm. of the ACM, 48(5):111–118, 2005.
Article Google Scholar
S. Triβl and U. Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, pages 845–856, 2007.
Google Scholar
O. Udrea, A. Pugliese, and V. S. Subrahmanian. GRIN: a graph based RDF index. In ICAI, pages 1465–1470, 2007.
Google Scholar
J. R. Ullmann. An algorithm for subgraph isomorphism. JACM, 23(1):31–42, 1976.
Article MathSciNet Google Scholar
W3C. OWL web ontology language overview. http://www.w3.org/TR/owl-features, 2004.
W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query, 2008.
H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE, page 75, 2006.
Google Scholar
X. Wu, M. L. Lee, and W. Hsu. A prime number labeling scheme for dynamic ordered XML trees. In ICDE, page 66, 2004.
Google Scholar
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, page 721, 2002.
Google Scholar
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD, pages 335–346, 2004.
Google Scholar
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960–993, 2005.
Article Google Scholar
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766–777, 2005.
Google Scholar
B. Yang, W. Cheung, and J. Liu. Community mining from signed social networks. TKDE, 19(10):1333–1348, 2007.
Google Scholar
B. Yang, J. Liu, and D. Liu. An autonomy-oriented computing approach to community mining in distributed and dynamic networks. AAMAS, 20(2):123–157, 2010.
Google Scholar
C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. In SIGMOD, pages 425–436, 2001.
Google Scholar
S. Zhang, J. Li, H. Gao, and Z. Zou. A novel approach for efficient supergraph query processing on graph databases. In EDBT, pages 204–215, 2009.
Google Scholar
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB, pages 938–949, 2007.
Google Scholar
L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181–192, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Byron Choi, Haibo Hu, Jianliang Xu, William K. W. Cheung, Chun-Hung Li & Jiming Liu

Authors

Byron Choi
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jianliang Xu
View author publications
You can also search for this author in PubMed Google Scholar
William K. W. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Hung Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiming Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Byron Choi .

Editor information

Editors and Affiliations

, EEE Dept., Imperial College, Exhibition Road, London, SW72BT, United Kingdom
Erol Gelenbe
, EEE Dept., Imperial College, Exhibition Rd., London, SW72AZ, United Kingdom
Ricardo Lent
, EEE Dept., Imperial College, Exhibition Rd., London, SW72AZ, United Kingdom
Georgia Sakellari
, School of Biomedical Eng., Sci. and Heal, Drexel University, Bossone 702, 3120 Market Street, Philadelphia, 19104, Pennsylvania, USA
Ahmet Sacan
, Dept. of Computer Engineering, Middle East Technical University, Ankara, 06531, Turkey
Hakki Toroslu
Fac. Engineering, Dept. Computer Engineering, Middle East Technical University - METU, Ankara, 06531, Turkey
Adnan Yazici

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, B., Hu, H., Xu, J., Cheung, W.K.W., Li, CH., Liu, J. (2011). Data Engineering in Graph Databases. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_26

Download citation

DOI: https://doi.org/10.1007/978-90-481-9794-1_26
Published: 18 August 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9793-4
Online ISBN: 978-90-481-9794-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics