Graph Processing with Massive Datasets: A Kel Primer

Bayliss, David; Villanustre, Flavio

doi:10.1007/978-3-319-44550-2_11

David Bayliss³ &
Flavio Villanustre³

4003 Accesses

Abstract

Graph theory and the study of networks can be traced back to Leonhard Euler’s original paper on the Seven Bridges of Konigsberg, in 1736 [1]. Although the mathematical foundations to understanding graphs have been laid out over the last few centuries [2–4], it wasn’t until recently, with the advent of modern computers, that parsing and analysis of large-scale graphs became tractable [5]. In the last decade, graph theory gained mainstream popularity following the adoption of graph models for new applications domains, including social networks and the web of data, both generating extremely large and dynamic graphs that cannot be adequately handled by legacy graph management applications [6].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Extract Transform and Load; a generic term in the data industry for data manipulation that occurs prior to the exercise of real interest.
2.
Examples of ‘problem’ are minimal closure, shortest path, subgraph-matching, graph-isomorphism, etc.
3.
Based upon a very early prototype of a small sub-set of the proposed KEL language.
4.
In PIG you do not need to provide data types or even declare the schema for data although the PIG manual warns “it may not work very well if you don’t”.
5.
The man that created BCPL which eventually led to C; and a great ‘Compiler Theory’ lecturer too!.
6.
In fact some of the elements and benefits of both appear within KEL. However, at all points we believe that the thought process of the encoder should be paramount rather than the academic purity of a particular abstraction.
7.
The PARSE format allows entities and facts which have been extracted from text files to appear within the knowledge base.
8.
Here the UID does not exist and so is generated based upon those fields
9.
Here the UID is existing and called UID in the underlying data
10.
Once an entity with a UID has been declared then the type can be used to implicitly declare a foreign key existing within another part of the data.
11.
Called allows for both bi-directional and unidirectional links to be used in text
12.
There is a separate category of graph function which returns scalar results; these are covered by the syntax discussed already.
13.
At the moment it is envisaged that the outputs of an algorithm are only recorded in the algorithm declaration and implicitly appear as #1, #2 etc. in the production. This may prove too sloppy if algorithms with many, many outputs are invented.

References

http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg.
Euler L. Solutio Problematis ad Geometriam Situs Pertinentis. Novi Commentarii Academiae Scientarium Imperialis Petropolitanque 7(1758–59), 9–28.
Google Scholar
Hierholzer C. Uber die Moglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechnung zu umfahren. Math Ann. 1873;6:30–2.
Article MathSciNet Google Scholar
Biggs NL, et al. Graph theory 1736–1936. Oxford: Clarendon Press; 1986.
MATH Google Scholar
Agnarsson G. Graph theory: modeling, applications, and algorithms. Upper Saddle River: Prentice Hall; 2006.
MATH Google Scholar
Cudre-Mauroux P et al Graph data management systems for new application domains. In: Proceedings of the VLDB Endowment, vol 4, No 12; 2011.
Google Scholar
Vicknair C et al. A comparison of a graph database and a relational database. ACMSE ’10, April 15–17, Oxford, MS, USA; 2010
Google Scholar
Yang X et al. Summary graphs for relational database schemas. In: Proceedings of the VLDB Endowment, vol 4, No 12; 2011.
Google Scholar
Shao B et al. Managing and mining large graphs: systems and implementations. SIGMOD ’12, May 20–24, Scottsdale, Arizona, USA; 2012.
Google Scholar
http://en.wikipedia.org/wiki/Social_network_analysis.
Singla P et al. Yes, there is a correlation—from social networks to personal behavior on the web. WWW 2008, April 21–25, Beijing, China; 2008.
Google Scholar
Malm A, et al. Social network and distance correlates of criminal associates involved in Illicit Drug Production. Secur J. 2008;21:77–94. doi:10.1057/palgrave.sj.8350069.
Article Google Scholar
Latour J. Understanding consumer behavior through data analysis and simulation: Are Social Networks changing the World economy? Master Thesis. http://essay.utwente.nl/58146/.
Averbuch A et al. Partitioning graph databases—a quantitative evaluation. Master of Science Thesis Stockholm, Sweden; 2010. arXiv:1301.5121.
Plantikow S et al. Latency-optimal walks in replicated and partitioned graphs. In: DASFAA Workshops 2011, LNCS 6637, pp 14–27; 2011.
Google Scholar
Middleton A. Data-intensive technologies for cloud computing. In: Handbook of cloud computing. Berlin: Springer; 2010
Google Scholar
http://hpccsystems.com/blog/adventures-graphland-v-graphland-gets-reality-check.

Download references

Author information

Authors and Affiliations

LexisNexis, New York, NY, USA
David Bayliss & Flavio Villanustre

Authors

David Bayliss
View author publications
You can also search for this author in PubMed Google Scholar
Flavio Villanustre
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bayliss, D., Villanustre, F. (2016). Graph Processing with Massive Datasets: A Kel Primer. In: Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-44550-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-44550-2_11
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44548-9
Online ISBN: 978-3-319-44550-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics