Building a cross region disaster recovery solution with IBM Cloud services

When you deploy a solution on IBM Cloud, it generally consists of one or more IBM Cloud services. When the solution is deployed correctly, the services provide a resilient environment for your workload. However, a number of services can also be used in your disaster recovery design, that might not be part of your workload's deployment architecture. Learn more about those services in the following section.

IBM Cloud Object Storage

One key to DR is the ability to retrieve and recover data in the DR region. For many IBM Cloud services, you can use Object Storage to make data, like backups and snapshots, available in secondary regions by using a cross-regional bucket or a replicated bucket.

Data that is stored in Object Storage is highly resilient and replicated across three locations based on the bucket type and its data dispersal method. Single site buckets have data dispersed in three locations in a data center. Regional buckets have data dispersed over the region’s three zones. Cross-region buckets have data dispersed over regions in the geo.

Cross-region buckets offer the simplest choice for disaster recovery and are available as follows:

ap-geo – Asia Pacific (Japan, Australia)
us-geo – North America (US, Canada)
eu-geo – Europe (UK, Germany, Spain)

For a cross-region bucket, a tethered endpoint keeps all data ingress and egress within a specified region while still distributing the data. However, this mechanism does not provide an automated failover if the tethered region fails. Therefore, it’s important to connect services to your buckets by using an endpoint that is available in the region where you need it.

Cross-region buckets might not be suitable for users with regulatory compliance concerns. For example, a US-based organization might not be able to store data in Canada. Or, a European Union-based company might not be able to store data outside of the European Union, such as in the UK.

In these cases, bucket replication offers a solution. With bucket replication, objects written to a source bucket are asynchronously copied to a target bucket, helping ensure redundancy with eventual consistency. You must set up and manage replication. By using this method, you can control the locations of the data and its sovereignty.

Versioning is a requirement for bucket replication so that buckets that are configured with an immutable IBM Cloud policy can't be replicated. Server-side encryption with customer-provided keys is not supported. Where server-side encryption is required, use Key Protect.

IBM Cloud Schematics

Use automation to deploy or recover an environment quickly and accurately. Automating deployments with Infrastructure as Code (IaC), Configuration as Code, and toolchains helps you recover workloads faster and more accurately by re-creating lost environments more precisely than manual methods. Using automation also helps you react faster to unexpected scenarios, especially if you need to create and configure services on-demand in a different recovery region than you first planned.

Schematics is a service that can play an important part in any disaster recovery strategy. Use IBM Cloud Schematics to implement runbooks for recovery processes. [Schematics workspaces provide Terraform-as-a-Service and automate the deployment and management of IBM Cloud infrastructure and services. Schematics actions provide Ansible-as-a-Service, automate configuration management, and run scripted day-2 operations. For more information, see Understanding Schematics use cases.

In short, with Schematics, you can deploy infrastructure and services quickly and consistently though IaC, Terraform, Ansible, Helm, and Red Hat OpenShift Operators.

Codifying the workload environment means defining and automating the setup of your workload by using code. This way, the right services can be quickly provisioned and configured in any IBM Cloud region. You can use Terraform and Ansible to provision services, including servers, storage, networking, and databases. They can also be used to configure the services, as can Helm and Operators.

IaC code should be maintained, version-controlled like application code, and stored for access across multiple regions. Using a Git-based repository is recommended.

As with any code, it is best practice not to hardcode elements, such as regions or zone names. Instead use variables, which can be resolved through the Schematics interface.

Employing DevOps toolchains can automate much of the environment build, including configuration, so invest time in considering how you can use these tools.

Deployable Architectures

A deployable architecture is a preconfigured set of IaC assets that are based on the IBM Cloud for Financial Services reference architecture. These deployable architectures enable you to meet IBM Cloud Framework for Financial Services best practices. With a deployable architecture, you can accurately deploy the same architecture into different regions, which is an important consideration for disaster recovery. Since these preconfigured architectures also provide a starting point for customization, ensure that you maintain any customizations throughout each deployment.

When you use a deployable architecture, you are responsible for the following disaster recovery actions:

Provisioning disaster recovery environments, including any dependencies
Data and configuration backup
Replicating data and configuration to the disaster recovery environment
Managing failover operations

Backup and restore options

IBM Cloud® Backup for VPC is a cloud service that supports creating and managing boot and data storage volume snapshots. Use IBM Cloud® Backup for VPC to schedule regular backups and restore applications deployed in Virtual Servers for VPC VSIs when application-consistent backups are not required.

Boot volume backup: Create snapshots of the boot volume for the Virtual Servers for VPC hosting the application. Restore the bootable snapshots and use them to re-create the application virtual server instances. Bootable snapshots don’t immediately load or "hydrate" all of their data when they’re created. Performance degradation occurs during the restoration because your data is copied from IBM Cloud® Object Storage to Block Storage for VPC in the background. Use the fast restore feature to cache snapshots in a different zone for quick restore of individual volumes.
Data volume backup: Create snapshots of the data volumes that are attached to the application. The snapshots replicate the data to other availability zones and regions to support the recovery of configuration or any other critical files. For more information, see Creating Block Storage for VPC snapshots.

IBM® Storage Protect is an enterprise-level backup solution for virtual, physical, cloud, software-defined environments, and core applications. Use IBM® Storage Protect to create application-consistent backups for database applications that are deployed on Virtual Servers for VPC and to create file or folder level backups.

Comparison of backup options for Virtual Servers for VPC
Backup feature	IBM Storage Protect	IBM Cloud Backup for VPC
Backup capabilities	Agent-based Scheduled backups Backups management	Scheduled backups Backups management Fast restore clone Cross-regional copies
Backup scope	Selected VSIs, selected volumes, or files in VSIs	Selected volumes (boot or data) attached to any VSIs
File or folder level support	Yes	No
Backup storage	Block Storage for VPC or IBM Cloud Object Storage	IBM Cloud Object Storage
Database protection	Application-consistent backups (Oracle, IBM Db2 as a Service, MongoDB, MS SQL Server)	Not supported
Encryption	In-transit and at rest	In-transit and at rest
Recommendation	Database or folder level backup for multiple virtual servers	Complex backup operations for multiple virtual servers that do not require application data consistency