Abstract
In this chapter, we describe the role of databases in distributed big data analysis. Database types include relational databases, document databases, graph databases, and others, which may be used as data sources or sinks in our analytical pipelines. Most of these database types integrate well with Hadoop ecosystem components, as well as with Apache Spark. Connectivity between different kinds of database and Hadoop/Apache Spark-distributed processing may be provided by “glueware” such as Spring Data or Apache Camel. We describe relational databases, such as MySQL, NoSQL databases such as Cassandra, and graph databases such as Neo4j, and how to integrate them with the Hadoop ecosystem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Kerry Koitzsch
About this chapter
Cite this chapter
Koitzsch, K. (2017). Relational, NoSQL, and Graph Databases. In: Pro Hadoop Data Analytics . Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1910-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-4842-1910-2_4
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-1909-6
Online ISBN: 978-1-4842-1910-2
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)