Database as a Service (DBaaS) is not only a relatively new term but also a surprisingly generic one. Various companies, products, and services have claimed to offer a DBaaS, and this has led to a fair amount of confusion.

In reality though, DBaaS is a very specific term and provides very clear and well-defined benefits. In this chapter, we introduce DBaaS and broadly address the following topics:

  • What is DBaaS

  • The challenge databases pose to IT (information technology) organizations

  • Characteristics of DBaaS

  • The benefits of DBaaS

  • Other similar solutions

  • OpenStack Trove

  • Trove in the OpenStack ecosystem

  • A brief history of Trove

What Is Database as a Service?

As the name implies, DBaaS is a database that is offered to the user as a service. But, what does that really mean?

Does it, for example, imply that the DBaaS is involved in the storage and retrieval of data, and the processing of queries? Does the DBaaS perform activities such as data validation, backups, and query optimization and deliver such capabilities as high availability, replication, failover, and automatic scaling?

One way to answer these questions is to decompose a DBaaS into its two constituent parts, namely, the database and the service.

The Database

There was a time when the term database was used synonymously with relational database management system (RDBMS). That is no longer the case. Today the term is used to refer equally to RDBMS and NoSQL database technologies.

A database management system is a piece of technology, sometimes only software, sometimes with customized and specialized hardware, that allows users to store and retrieve data. The Free Online Dictionary of Computing defines a database management system as “A suite of programs which typically manage large structured sets of persistent data, offering ad hoc query facilities to many users.”

The Service

Looking now at the other half—as a Service—we can see that its very essence is the emphasis on the delivery of the service rather than the service being delivered.

In other words, Something as a Service makes it easier for an operator to provide the Something for consumption while offering the consumer quick access to, and the benefit of, the “Something” in question.

For example, consider that Email as a Service offerings from a number of vendors including Google’s Gmail and Microsoft’s Office365 make it easy for end users to consume e-mail services without the challenges of installing and managing servers and e-mail software.

The Service as a Category

The most common use of the term as a Service occurs when referring to the broad category of Software as a Service (SaaS). This term is often used to refer to applications as a service, like the Salesforce.com customer relationship management (CRM) software, which is offered as a hosted, online service. It also includes Infrastructure as a Service (IaaS) offerings like AWS and Platform as a Service (PaaS) solutions like Cloud Foundry or Engine Yard.

DBaaS is a specific example of SaaS and inherits some of the attributes of SaaS. These include the fact that DBaaS is typically centrally hosted and made available to its consumers on a subscription basis; users only pay for what they use, and when they use it.

DBaaS Defined

One can therefore broadly define a DBaaS to be a technology that

  • Offers these database servers “on demand”;

  • Provisions database servers;

  • Configures those database servers, or groups of database servers, potentially in complex topologies;

  • Automates the management of database servers and groups of database servers;

  • Scales the provided database capacity automatically in response to system load; and

  • Optimizes the utilization of the supporting infrastructure resources dynamically.

Clearly, these are very broad definitions of capabilities and different offerings may provide each of these to a different degree.

Just as Amazon offers EC2 as a compute service on its AWS public cloud, it also offers a number of DBaaS products. In particular, it provides Relational Database Service (RDS) for relational databases like MySQL or Oracle, a data warehouse as a service in Redshift, and a couple of NoSQL options in DynamoDB and SimpleDB.

OpenStack is a software platform that allows cloud operators and businesses alike to deliver cloud services to their users. It includes Nova, a computing service similar to Amazon’s EC2, and Swift, an object storage service similar to Amazon’s S3, as well as numerous other services. One of these additional services is Trove, OpenStack’s DBaaS solution.

Unlike Amazon’s DBaaS offerings, which are database specific, Trove allows you to launch a database from a list of popular relational and nonrelational databases. For each of these databases, Trove provides a variety of benefits including simplified management, configuration, and maintenance throughout the life cycle of the database.

The Challenge Databases Pose to IT Organizations

Databases, and the hardware they run on, continue to be a significant part of the cost and burden of operating an IT infrastructure. Database servers are often the most powerful machines in a data center, and they rely on extremely high performance from nearly all of a computer’s subsystems.

The interactions with client applications are network intensive, query processing is memory intensive, indexing is compute intensive, retrieving data requires extremely high random disk access rates, and data loads and bulk updates imply that disk writes be processed quickly. Traditional databases also do not tend to scale across machines very well, meaning that all of this horsepower must be centralized into a single computer or redundant pair with massive amounts of resources.

Of course, new database technologies like NoSQL and NewSQL are changing these assumptions, but they also present new challenges. They may scale out across machines more easily, reducing the oversized hardware requirements, but the coordination of distributed processing can tax network resources to an even greater degree.

The proliferation of these new database technologies also presents another challenge. Managing any particular database technology can require a great deal of specialized technical expertise. Because of this, IT organizations have typically only developed expertise in a specific database technology or in some cases a few database technologies. Because of this, they have generally only offered their customers support for a limited number of choices of database technologies. In some cases, this was justified, or rationalized as being a corporate standard.

In recent years, however, development teams and end users have realized that not all databases are created equal. There are now databases that are specialized to particular access patterns like key-value lookup, document management, map traversal, or time series indexing. As a result, there is increasing demand for technologies with which IT has limited experience.

Starting in the latter part of the 2000s, there was an explosion in the so-called NoSQL databases. While it was initially possible to resist these technologies, their benefits and popularity made this extremely difficult.

Yet, how is an IT organization supposed to support all the various flavors of NoSQL and SQL databases without having in-depth knowledge of each of them?

Amazon led the way by making computing ubiquitous and easy to consume with a couple of key-clicks and a credit card. It then automated many of the complexities around databases and offered them as a service, forcing IT organizations to respond. It did this, however, by building up staff with expertise in each of these technologies, and still, like the IT staffs of old, offering only a limited number of options.

OpenStack Trove offers IT organizations the ability to operate a complete DBaaS platform within the enterprise. IT organizations can offer a rich variety of databases to their internal customers with the same ease of use that Amazon offered with its AWS cloud and the RDS product. Users of Trove get this benefit without requiring the same scale or large investment in specialized teams of experts in specialized database technologies that Amazon was able to staff internally.

Characteristics of DBaaS

Given how broadly the term Database as a Service is used, it is worthwhile trying to understand some characteristics of DBaaS. This characterization helps one quickly assess each candidate solution and organize solutions into meaningful groups.

Some common characteristics are

  • The operating plane: data plane vs. management/control plane

  • Tenancy: single tenancy vs. multitenancy

  • Service location: private cloud vs. public cloud vs. hosted private cloud

  • Delivery model: service vs. platform

  • Supported database: single vs. multiple, SQL vs. NoSQL

The Management Plane and the Data Plane

An important characteristic of a DBaaS solution relates to the kind(s) of activities it performs, and we can categorize these activities broadly into two groups.

In the operation of a database, there are a number of activities like provisioning, installing and configuring, backing up and restoring, configuring replication (or mirroring, or clustering), resizing the storage attached with an instance, and other administrative activities. These activities broadly fall into the category of system management and are considered part of the management plane. For these operations, the actual content of the data being managed is opaque to the user issuing the commands.

There are also entirely independent but equally important activities like inserting, querying, and modifying data; creating tables, namespaces, indices, triggers, and views; and inspecting query plans. All of these activities broadly fall into the category of data management and are considered part of the data plane. In these cases the content of the data being stored is what the user is actually accessing and manipulating.

A managed database instance provides the operator and administrator a set of interfaces in the management plane while providing the end user and analyst of the database a set of interfaces in the data plane.

Different activities and clients participate in each of these planes, depicted graphically in Figure 1-1.

Figure 1-1.
figure 1figure 1

A graphical representation of the data plane and management plane

The database (depicted by the solid block) operates at two distinct planes—the data plane and the management plane.

OpenStack Trove operates almost exclusively in the management plane. Similarly, Amazon’s RDS offering features a database and code developed by Amazon that orchestrates this database. This code that Amazon developed operates almost entirely in the management plane. A similar analogy can be made to SQL Server and Microsoft’s Azure SQL Database offering. Other DBaaS offerings (such as DynamoDB from Amazon) operate in the data plane as well.

Trove, therefore, gives applications transparent and complete access to the data API (application programming interface) exposed by the database being offered while automating and simplifying the management aspects. For example, when a user uses Trove to provision MySQL, the database server that is provisioned is a standard, unmodified copy of the MySQL server and the user’s subsequent interactions to query and update data on that server are all directly with that underlying server, not with Trove itself.

Tenancy

Tenancy is a very important attribute of a DBaaS solution. There are two commonly understood tenancy models, single tenancy and multitenancy. We examine each in turn.

Figure 1-2 helps describe this concept: single tenancy on the left and multitenancy on the right.

Figure 1-2.
figure 2figure 2

Illustrating single and multitenancy with two database servers

Single-Tenant Solution

A single-tenant DBaaS solution is a solution where the database provisioned by each tenant (user, consumer) resides on dedicated resources (database, compute, storage, networking, etc.). In some cases, this means that a user who requests two database instances gets two instances, each of which has its own dedicated resources, and in other cases, this this indicates that the two instances may share the same resources but these resources are not shared with any other tenant.

Amazon RDS, Amazon RedShift, and OpenStack Trove are some examples of single-tenant solutions. Each customer request for a database would result in the creation of a single instance (potentially a virtual machine with a database instance on it). While they may be considered multitenant at the compute infrastructure level, at the DBaaS level, they are single-tenant solutions.

The benefit of a single-tenant architecture is that each user’s activity is fairly well isolated. Since each user has a dedicated pool of resources on which to run database functions, one user performing many queries or updates at a particular time is unlikely to affect the performance of the system for other users accessing their data. Note, however, that this isolation can be impacted by the lack of isolation at the infrastructure level if that tier is in fact multitenant.

Multitenant Solution

A multitenant DBaaS solution is one where databases provisioned by different tenants may share the same resources. The sharing may be on a single physical or virtual machine or across a cluster of machines.

Oracle 12c is an example of a database that when offered as a service would constitute a multitenant DBaaS solution. A single database server instance would host one or more container databases. Each container database would have a pluggable database for each customer/tenant/user, and these pluggable databases would house each user’s data. Another example would be Amazon’s DynamoDB where a user’s data is stored alongside the data of other users across a large cluster of underlying shared hardware.

While a multitenant system may result in less isolation and a greater potential for resource conflict among users, it typically provides a more efficient use of resources overall since resources not in use by one user can be more easily consumed by others sharing the same infrastructure.

Service Location

DBaaS solutions can operate in a variety of locations (e.g., a public cloud, a private cloud, or a hosted private cloud).

The Public Cloud

In the public cloud, some third party owns, manages, and operates a computing infrastructure and allows other individuals and companies to purchase and use services on this infrastructure.

Amazon AWS is the most commonly cited example of the public cloud model. Other similar solutions include Microsoft Azure, Google Cloud Platform, and Hewlett Packard’s (HP) Helion Public Cloud.

To the average consumer, public clouds provide little by way of service-level agreements (SLAs) and guarantees on response times or the ability to control the service being provided.

One common issue with public clouds is the fact that they are mostly multitenant and therefore they share infrastructure. One consumer’s compute instance could, for example, be negatively impacted by the behavior of some other compute instance owned by another tenant that just happens to share the same physical machine. This is often referred to as the noisy neighbor effect.

Some public clouds give very little control over the placement of your resources on the infrastructure beyond very coarse controls. In deploying an instance on Amazon, for example, you can choose the Availability Zone (coarse control) but cannot guarantee that two instances in the same Availability Zone aren’t on the same physical machine. This could unexpectedly lead to issues with availability. If the machine hosting both of these instances fails, both services will cease operation.

A major driver to the public cloud is that the user only pays for exactly what he or she consumes. This alleviates concerns such as resource utilization and capacity planning and transforms what is a capital expense in the private cloud into a variable, operating expense. In addition to this, it is extremely easy to get up and running on a public cloud. There is no need to set up machines and networks to begin work, and if something goes wrong, the public cloud operator is responsible for fixing it.

The Private Cloud

Many larger enterprises operate their own internal IT infrastructure as a cloud and use tools to provision and manage the various components of this infrastructure.

In many respects, the private cloud is a natural evolution of the corporate data center. These private clouds offer compute, storage, and networking infrastructure as well as corporate identity management.

In addition, some organizations allow end users to provision and consume database services where the data is stored and processed on infrastructure provided by the IT organization within the private cloud.

Private clouds most often provide their customers with SLAs such as guarantees on response times and service characteristics such as outage times and downtime.

Private clouds also often provide users with greater control over the placement and operation of the infrastructure, something that is typically lost in other models like the public cloud.

Typically, considerations such as data privacy, data security, and cost drive the choice of a private cloud solution. Risk aversion and inertia are also significant drivers of the private cloud.

Managed Private Cloud

The managed private cloud is a hybrid of the public and the private cloud whereby the resources are owned and operated by one organization, for another. Often these resources are dedicated to the customer.

Examples of the managed private cloud include companies like RackSpace, BlueBox, Peer1, and Contegix. Many public cloud operators also provide managed private cloud offerings. Amazon’s GOV Cloud and HP’s Managed Cloud Services are two examples.

As the private cloud is the evolution of the corporate data center, the managed private cloud is the evolution of the outsourced data center.

The customer gets the benefits of the private cloud (SLAs, guarantees, dedicated resources, etc.) while not having to manage the infrastructure themselves. Managed private cloud providers often operate large data centers, and customers get the economies of scale.

In some cases customers have the option of physically isolating their infrastructure (such as in locked cages) for improved security and privacy.

Service vs. Platform

When a user consumes database services from Amazon RDS, it is clear that the user is interacting with a service. The user cannot get access to the software that operates RDS. A service is software that a user can purchase and consume on demand with no care or access to systems that are at work to provide the service.

On the other hand, a company could download and install OpenStack from openstack.org and it could install OpenStack Trove, software the company can use to operate on its own infrastructure. That, by itself is not a DBaaS. The user still has to install, configure, and operate it, and do a variety of things (which will be discussed in later chapters), such as building guest images, establishing configuration groups, establishing security groups, and so on. OpenStack Trove therefore represents a platform on which to operate a DBaaS.

Similarly, companies can build and offer platforms that are fully functional DBaaS in a box products. These would include all the things one would need to operate a complete DBaaS on one’s own infrastructure. Tesora’s DBaaS platform is one example of such a platform. This platform is based on OpenStack Trove and offers complete Trove capabilities and several extensions, along with certified guest images for many commonly used databases.

The Benefits of DBaaS

DBaaS solutions attempt to automate many of the complex and error-prone procedures, as well as the mundane and repetitive tasks involved in the operation of a database, while providing application developers and client applications access to a scalable and reliable data management infrastructure.

With DBaaS (as with other as-a-Service offerings), clients have access to the database of their choice without having to concern themselves with the underlying operational complexities.

Ease of Provisioning

An immediate benefit of DBaaS is that provisioning a database instance, something which used to take weeks if not months in the old world of centralized IT, can now be performed in a matter of seconds or minutes.

The user gets to choose the database type, version, and some other basic attributes. The database is then quickly provisioned and connection information is returned.

Consistent Configurations

The complexities involved in provisioning a database frequently lead to hard-to-detect differences between one instance and the next. Unfortunately, this subtle and often innocuous difference can translate into a critical problem in the middle of the night, often involving the loss or corruption of data.

Automating the provisioning mechanism with a DBaaS solution ensures that each database instance provisioned has exactly the same configuration, with no exceptions.

It also means that when a configuration change is required, it can be easily applied to all the database instances, and any deviation is easier to detect.

Automated Operations

During the life of a database, numerous management tasks need to be performed. These include generating a backup, updating a configuration parameter, upgrading to new database versions, rebuilding indices, or reclaiming unused space.

Automation can be set up to perform these tasks either on a specific schedule (time-based, full backup on Friday, incremental backup every day) or on a certain event or threshold (when deleted record space exceeds X%, when free space falls below Y%).

Automating these activities considerably simplifies the role of an IT operations team and also ensures that these operations are performed consistently without fail.

Autoscaling

Databases face variable workloads and provisioning for peak leads to enormous underutilization during non-peak times. Autoscaling is a capability whereby the resources allocated to the database are right-sized based on workload.

Scaling a database without downtime is possible with many databases, and a very attractive feature of operating in the cloud. But, it is an exacting process and automation considerably simplifies it.

Autoscaling is an example of an automated operation that can be performed at a threshold like queries per second or levels of resource utilization.

Improvements to Development Agility

While simplified and automated provisioning makes it easier and quicker to make a database instance available, there is more to development agility than just that. In many fields, like data analysis, for example, the thought process of the analyst is iterative. One doesn’t always know what the right question is, and often the answer to the first question helps frame and qualify the next one.

While quick provisioning helps one quickly cycle through database instances during iterative discovery, it is important to recognize the value of quickly deprovisioning, or destroying, a database instance when it is no longer needed.

If DBaaS just made it easier to provision a database, it wouldn’t help agility if the database came with a long commitment. The benefit of DBaaS is that one can quickly destroy a database when one is done with it, thereby releasing the allocated resources back to the pool.

Better Resource Utilization and Planning

With a DBaaS platform, IT organizations can monitor overall database demands and trends within the organization. You can expand and renew the underlying cloud infrastructure on a regular basis. This could be based on industry trends, which drive such a change. It could also be because of newer architectures and improved price points for selected hardware configurations. Finally, it could also be in response to the changing demands within the organization.

One goal of an IT organization is to maximize resource utilization and deliver the most responsive service while keeping in mind anticipated trends in demand within the organization.

One way to achieve this goal is by operating a pool of resources that can be shared within the organization and allowing users to provision, consume, and only pay for the time(s) when the infrastructure is in use. IT organizations can also resort to the judicious use of overprovisioning in the face of unexpected demand.

This not only improves the bottom line for the organization but also allows the organization to be more responsive to the needs of its customers.

Simplified Role for Provider or Operator

In an enterprise that does not provide DBaaS, the IT organization must be completely knowledgeable in all aspects of the databases that it allows internal customers to use. The customers have some DBA (database administration) knowledge and skills, and for the most part the administration skills are centralized within the IT organization.

In essence, this means that IT organizations can only allow internal customers to use database technologies in which they have considerable expertise. This is often the rationale for restricting the choice of database technology to the so-called corporate standard.

With the evolution of on-demand services and DBaaS that embody the best practices, the software automates and simplifies most common workflows and administrative activities. This eases the burden on the IT organization and reduces the requirement that the IT organization have deep expertise in every database technology. This also enables more choice for internal customers to select the right database technology for the problem at hand.

Other DBaaS Offerings

Here are some other DBaaS solutions that offer capabilities similar to OpenStack Trove.

Amazon RDS

Amazon RDS is the umbrella product that provides managed database instances running MySQL, Oracle, PostgreSQL, SQL Server, or Amazon’s own MySQL compatible database, Aurora.

All of these are available in the Amazon AWS cloud in a variety of configurations including multi-availability zones. Amazon RDS includes useful features like automatic scaling, instance monitoring and repair, point-in-time recovery, snapshots, self-healing, and data encryption (at rest and in flight).

Amazon RedShift

Amazon RedShift is a fully managed petabyte-scale data warehouse solution that is based on the technology developed by ParAccel (now Actian). It uses standard PostgreSQL front-end connectivity, which makes it easy to deploy RedShift with standard SQL clients and tools that understand PostgreSQL. It features integrations with many other Amazon solutions like S3, DynamoDB and Elastic MapReduce (EMR).

Microsoft Azure SQL Database

Microsoft offers a version of its popular relational database, SQL Server as a Service. This service is offered as part of Microsoft’s IaaS offering Microsoft Azure.

Google Cloud SQL

Google Cloud SQL is another MySQL compatible DBaaS offering, similar to Amazon’s RDS for MySQL. This offering is delivered as part of Google Cloud Platform, formerly known as Google App Engine.

Amazon DynamoDB

Amazon DynamoDB is a fast, flexible NoSQL database service. It is a fully managed database service and supports both document and key-value data models. DynamoDB is also multitenant by design and transparently scalable and elastic. The user need do nothing in order to get the benefits of scalability with DynamoDB; that is all managed completely by the underlying service.

For this reason, DynamoDB is able to promise applications consistent (single-digit millisecond) latency at any scale.

OpenStack Trove

The stated mission of the OpenStack Trove project is as follows:

To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and nonrelational database engines, and to continue to improve its fully-featured and extensible open source framework.

It is important, therefore, to understand that unlike the other DBaaS solutions presented previously, Trove attempts, very consciously, to provide a platform for DBaaS that allows users to consume both relational and nonrelational database engines.

This mission statement is carried forward into the architecture (as discussed in a later chapter). Trove aims to provide a platform that would allow consumers to administer their databases in a technology agnostic way, while at the same time orchestrating a number of very different database technologies.

It is for this reason that Trove operates almost exclusively in the management plane and leaves data access to the application (in the native protocols supported by the selected database technology). Contrast this, for example, with DynamoDB, which provides data APIs.

Trove provides reference implementations for a variety of database technologies, or datastores. Trove users are free to modify these datastores and provide additional ones, and to modify the way in which Trove manipulates a specific datastore.

Some users have been able to extend Trove for their own purposes and to provide additional functionality not available in the reference implementations.

A Brief History of Trove

The Trove project came into existence early in 2012 with initial contributions from Rackspace and HP. At the time, the project was called RedDwarf, something that is still seen in many places in the code, such as the redstack tool, or the mysterious rd_ prefix on variables.

Initial code for Trove was available (during incubation) as part of the Grizzly and Havana releases. Trove was formally integrated into OpenStack as part of the Icehouse release.

OpenStack releases come out every six months and are alphabetical.

The initial release of Trove as an integrated project in April 2014 in the Icehouse release of OpenStack featured support for MySQL, MongoDB, Cassandra, Redis, and Couchbase. Each datastore had slightly different capabilities. It also included a basic framework of strategies that enabled extensibility and simplified the addition of capabilities in future releases.

The Juno version of Trove, six months later, debuted two new frameworks; one for replication and the other for clustering. This release included a basic implementation of replication for MySQL and clustering for MongoDB.

The Kilo release extended these frameworks and introduced additional capabilities for MySQL replication. In addition, this release also added support for many new databases, including DB2, CouchDB, and Vertica.

Tenancy in OpenStack Trove

Architecturally, OpenStack Trove is a single-tenant DBaaS platform by default. This means that each request for a new Trove database by a tenant results in the provisioning of one (or in some cases more than one) Nova instance, each with its own dedicated storage and networking resources. These instances are not shared with any other database request from either this user or any other user.

This does not in any way imply that the database instances that are created in response to a tenant’s request will have dedicated hardware. Nova is by default intended to be a multitenant system, but an operator could configure policies or plug-ins that effectively ensure that different instances do not share the same hardware. Trove does not control that.

As described previously, Trove is a platform for DBaaS, and as discussed in later chapters, this means that the tenancy model that Trove implements can in fact be changed by a provider or operator.

Trove in the OpenStack Ecosystem

OpenStack is organized into a collection of services. Each OpenStack service exposes a public API and other services can interact with the service using the public API.

Figure 1-3 shows a simplified representation of such a service.

Figure 1-3.
figure 3figure 3

Illustrating a simple OpenStack service

The OpenStack service depicted has a public API which is RESTful. Trove is one such service, and it provides the functionality of a Database as a Service.

In OpenStack, identity is managed by a service called Keystone, networking by Neutron, block storage by Cinder, object storage by Swift, and compute instances by Nova.

Horizon is the dashboard service and presents the web interface. Some other OpenStack services are Heat (orchestration), Ceilometer (event management), and Sahara (Hadoop as a Service).

A simple OpenStack deployment typically consists of at least four services: Keystone, Neutron, Cinder, and Nova. Many deployments also include Swift.

Figure 1-4 shows a typical OpenStack deployment.

Figure 1-4.
figure 4figure 4

A graphical representation of a simple OpenStack setup

Client applications and other OpenStack services alike access each of these services using their public APIs.

Trove is a service that is a client of, and consumes the services of, the other core services, and you can add it to this diagram as shown in Figure 1-5. Trove, shown at the top left of Figure 1-5, exposes its own public API. It consumes services from the other OpenStack core services by invoking these on their respective public APIs.

Figure 1-5.
figure 5figure 5

Showing a simple OpenStack setup with Trove as a client of the other services

One of the important tasks performed by Keystone is identity management—validation of access credentials by users to the public APIs of the various services. But, it serves another very important task and that is as the single directory of all OpenStack services.

All OpenStack services are required to register with Keystone and in so doing, these services then become accessible to users who have access to Keystone.

Trove registers as the database service. Therefore, a user who knew the Keystone end point and had access to Keystone could query Keystone to get the DBaaS end point registered by Trove.

In later chapters, we will also delve into more detail about how Trove works. However, at a high level, when Trove receives a request (e.g., a request for a new database instance with 300 GB of storage attached to it and of flavor type m1.large), Trove will authenticate the client (using Keystone), verify the client’s quotas by inspecting its own persistent datastore, and, if the request appears to be valid, then perform the following operations (not necessarily in this order):

  • Request that Cinder create a volume of 300 GB

  • Request that Nova create an instance of type m1.large

  • Request network interfaces from Neutron

And it would do all of these things by interacting with those services on their respective public APIs which it would determine using the directory in Keystone.

Without loss of generality therefore, you can see how each service can operate just fine on its own private machine (hardware); it would just have to register the publicly accessible IP address as the end point in Keystone.

This architecture of OpenStack makes it particularly well suited for large-scale deployments. Consider an example of an enterprise that would like to offer a highly available OpenStack service. This could be configured as shown in Figure 1-6.

Figure 1-6.
figure 6figure 6

Showing a service configured for redundancy with a load balancer

Three copies of the service are launched on three different machines, and a load balancer is placed in front of these three machines.

In this configuration, the “Service” will register the public IP of the load balancer machine in the Keystone service catalog, and therefore any client wishing to contact this service will be told to connect with the load balancer, which will then be able to forward the request appropriately.

Summary

We conclude this introductory chapter with the same question we started with: What is Database as a Service?

Broadly speaking, one can define a DBaaS as software that allows a user to simplify and automate administrative and management activities that must be performed when using database technologies. This includes capabilities like provisioning, user management, backup and restore, ensuring high availability and data resiliency, self-healing, autoscaling, patch, and upgrade management.

Some DBaaS solutions do this by abstracting away the database and the management activities and inserting themselves into the data path (data plane). Others do this by providing only an abstraction of the management and administrative actions (the management plane), and staying entirely or almost entirely out of the data path.

Some are database specific (e.g., Microsoft’s Azure Cloud Database and Cloudant), others provide database-specific capabilities under the umbrella of a single unified product set (like Amazon RDS), while OpenStack Trove is unique in that it aims to be database agnostic.

Many DBaaS solutions are single tenant by architecture. OpenStack is unique in that it is available as software that a user can deploy in a private cloud and it is also something that a service provider can deploy and offer as a managed private cloud or a public cloud.

While OpenStack Trove architecturally implements a single-tenant model, extensions can allow users to offer multitenant databases. An example of this is Oracle 12c in the Tesora DBaaS offering.

OpenStack Trove is an open source DBaaS platform that is part of the OpenStack project. Thus, OpenStack Trove can form the basis for a DBaaS solution in the private, public, or managed private cloud.

The next chapter dives headlong into Trove, starting with how to download, install, and configure it.