IBM Cloud Docs
Configuring SAP HANA Multitarget System Replication in a RHEL HA Add-On Cluster

sap-power-virtual-server-ha-rhel-hana-sr-multitarget.md

Configuring SAP HANA Multitarget System Replication in a RHEL HA Add-On Cluster

The following information describes the configuration of a Red Hat Enterprise Linux (RHEL) HA add-on cluster for managing SAP HANA&reg system replication in a multitarget replication scenario. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.

You can connect multiple systems in an SAP HANA multitarget system replication topology to achieve a higher level of availability. A third SAP HANA instance runs on a virtual server instance in IBM Power Virtual Server in another workspace. The resource agents for SAP HANA in the Red Hat Enterprise Linux 8 (RHEL) HA add-on require that the third SAP HANA instance is managed manually and is installed on a virtual server instance outside the cluster.

In a multitarget system replication scenario, one secondary SAP HANA system runs on a virtual server instance in the cluster and another secondary HANA system runs on a virtual server instance that is deployed in a Disaster Recovery (DR) site. The DR site is implemented in a different IBM Power Virtual Server workspace in another geographical location or zone. The SAP HANA system replication operation mode must be identical for all multitarget replication levels.

A takeover of the secondary system in the DR site must be triggered manually.

This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.

Before you begin

Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing High Availability for SAP Applications on IBM Power Virtual Server References.

Prerequisites

Setting up a multitarget scenario

A multitarget scenario is an extension of the setup that is described in Configuring SAP HANA Scale-Up System Replication in a RHEL HA Add-On Cluster. Make sure that you complete the setup for the system replication cluster before you continue with the following steps.

To simplify the cluster operations, you can set the AUTOMATED_REGISTER cluster attribute of the SAPHana resource to true. With AUTOMATED_REGISTER=true, the cluster performs an automatic registration of the previous primary as a new secondary after the failed node reappears in the cluster.

On a cluster node, run the following command to verify the AUTOMATED_REGISTER cluster attribute of the resource.

pcs resource config SAPHana_${SID}_${INSTNO}

Sample output:

# pcs resource config SAPHana_${SID}_${INSTNO}
 Resource: SAPHana_HDB_00 (class=ocf provider=heartbeat type=SAPHana)
  Attributes: AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=900 InstanceNumber=00 PREFER_SITE_TAKEOVER=True SID=HDB
  Operations: demote interval=0s timeout=3600 (SAPHana_HDB_00-demote-interval-0s)
              methods interval=0s timeout=5 (SAPHana_HDB_00-methods-interval-0s)
              monitor interval=121 role=Slave timeout=700 (SAPHana_HDB_00-monitor-interval-121)
              monitor interval=119 role=Master timeout=700 (SAPHana_HDB_00-monitor-interval-119)
              promote interval=0s timeout=3600 (SAPHana_HDB_00-promote-interval-0s)
              reload interval=0s timeout=5 (SAPHana_HDB_00-reload-interval-0s)
              start interval=0s timeout=3600 (SAPHana_HDB_00-start-interval-0s)
              stop interval=0s timeout=3600 (SAPHana_HDB_00-stop-interval-0s)

If the AUTOMATED_REGISTER cluster attribute is currently set to false, use the following command to enable the automatic registration.

pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true

Providing network connectivity between the workspaces

  1. Use the information in Creating the workspace to create another workspace in a different geographic location or region.

  2. Create subnets and make sure that the IP ranges don't overlap with any subnet of the workspace that hosts the virtual server instances for the cluster. For more information, see Creating private network subnets.

  3. Set up IBM Cloud® connections up in both workspaces and activate Enable IBM Transit Gateway. For more information, see Creating Power Virtual Server Cloud Connections.

  4. Deploy an IBM Cloud Transit Gateway to interconnect the two IBM Power Virtual Server workspaces.

    IBM Cloud Transit Gateway enables the interconnection of IBM Power Virtual Server, IBM Cloud classic, and Virtual Private Cloud (VPC) infrastructures and keeps data within the IBM Cloud networks. For more information about planning and deploying IBM Cloud Transit Gateway, see Planning for IBM Cloud Transit Gateway and Ordering IBM Cloud Transit Gateway.

  5. To add the connections to your transit gateway to establish network connectivity between your IBM Power Virtual Server, open IBM Cloud console and log in to your account.

  6. Select the Menu icon Menu icon on the upper left and click Interconnectivity.

  7. Click Transit Gateway on the left navigation pane.

  8. Select the name of your transit gateway.

    In the expanded view, click View details.

  9. Click Add connection.

  10. Choose and configure the specific network connections that you want to add to the transit gateway.

  11. Choose Direct Link, and select the names of your IBM Cloud connections.

  12. Click Add to create a connection.

Preparing environment variables on NODE3

To simplify the setup, prepare the following environment variables for user ID root on NODE3. These environment variables are used in subsequent commands in the remainder of the instructions.

On NODE3, create a file with the following environment variables. Then, adapt the variables according to the configuration of your SAP HANA system.

export SID=<SID>            # SAP HANA System ID (uppercase)
export sid=<sid>            # SAP HANA System ID (lowercase)
export INSTNO=<INSTNO>      # SAP HANA Instance Number

export DC3=<Site3>          # HANA System Replication Site Name 3

export NODE1=<Hostname 1>   # Hostname of virtual server instance 1 (production primary)
export NODE2=<Hostname 2>   # Hostname of virtual server instance 2 (production secondary)
export NODE3=<Hostname 3>   # Hostname of virtual server instance 3 (production tertiary)

You must source this file before you can use the sample commands in the remainder of this document.

For example, if you created a file that is named sap_dr_site.sh, run the following command on NODE3 to set the environment variables.

source sap_dr_site.sh

Every time that you start a new terminal session, you must run the previous source command. Alternatively, you can move the environment variables file to the /etc/profile.d directory for the duration of the cluster configuration. In this example, the file is sourced automatically each time you log in to the server.

Verifying network connectivity between the virtual server instances

Verify the network connectivity between the two cluster nodes (NODE1 and NODE2) and NODE3.

  1. Log in to both NODE1 and NODE2, and ping NODE3.

    ping -c 3 ${NODE3}
    

    Sample output:

    # ping -c 3 cl-hdb-3
    PING cl-hdb-3 (10.40.20.70) 56(84) bytes of data.
    64 bytes from 10.40.20.70 (10.40.20.70): icmp_seq=1 ttl=46 time=78.2 ms
    64 bytes from 10.40.20.70 (10.40.20.70): icmp_seq=2 ttl=46 time=78.3 ms
    64 bytes from 10.40.20.70 (10.40.20.70): icmp_seq=3 ttl=46 time=78.2 ms
    
    --- cl-hdb-3 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 78.197/78.233/78.264/0.027 ms
    
  2. Log in to NODE3 and ping NODE1.

    ping -c 3 ${NODE1}
    

    Sample output:

    # ping -c 3 cl-hdb-1
    PING cl-hdb-1 (10.40.10.60) 56(84) bytes of data.
    64 bytes from cl-hdb-1 (10.40.10.60): icmp_seq=1 ttl=46 time=78.3 ms
    64 bytes from cl-hdb-1 (10.40.10.60): icmp_seq=2 ttl=46 time=78.2 ms
    64 bytes from cl-hdb-1 (10.40.10.60): icmp_seq=3 ttl=46 time=78.3 ms
    
    --- cl-hdb-1 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2002ms
    rtt min/avg/max/mdev = 78.245/78.268/78.287/0.229 ms
    
  3. Log in to NODE3 and ping NODE2.

    ping -c 3 ${NODE2}
    

    Sample output:

    # ping -c 3 cl-hdb-2
    PING cl-hdb-2 (10.40.10.194) 56(84) bytes of data.
    64 bytes from cl-hdb-2 (10.40.10.194): icmp_seq=1 ttl=46 time=77.6 ms
    64 bytes from cl-hdb-2 (10.40.10.194): icmp_seq=2 ttl=46 time=79.1 ms
    64 bytes from cl-hdb-2 (10.40.10.194): icmp_seq=3 ttl=46 time=77.7 ms
    
    --- cl-hdb-2 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 77.649/78.129/79.071/0.703 ms
    

Copying PKI SSFS storage certificate files to NODE3

The SAP HANA 2.0 data and log transmission channels for the replication process require authentication by using the system PKI SSFS storage certificate files.

The system PKI SSFS storage certificate files are stored in /usr/sap/${SID}/SYS/global/security/rsecssfs/ in subdirectories data and key.

On NODE3, run the following commands to copy files SSFS_${SID}.DAT and SSFS_${SID}.KEY from NODE1.

scp ${NODE1}:/usr/sap/${SID}/SYS/global/security/rsecssfs/data/SSFS_${SID}.DAT /usr/sap/${SID}/SYS/global/security/rsecssfs/data/SSFS_${SID}.DAT
scp ${NODE1}:/usr/sap/${SID}/SYS/global/security/rsecssfs/key/SSFS_${SID}.KEY /usr/sap/${SID}/SYS/global/security/rsecssfs/key/SSFS_${SID}.KEY

The copied PKI SSFS storage certificates on NODE3 become active during the start of the SAP HANA system. Therefore, it is recommended to copy the files when the SAP HANA system on NODE3 is stopped.

Registering NODE3 as a secondary SAP HANA DR system replication system

Register the SAP HANA system as a secondary DR system replication instance.

  1. On NODE3, stop the SAP HANA system.

    sudo -i -u ${sid}adm -- HDB stop
    
  2. On NODE3, register the secondary SAP HANA instance with NODE1.

    sudo -i -u ${sid}adm -- \
        hdbnsutil -sr_register \
          --name=${DC3} \
          --remoteHost=${NODE1} \
          --remoteInstance=${INSTNO} \
          --replicationMode=async \
          --operationMode=logreplay \
          --online
    
  3. On NODE3, start the secondary SAP HANA instance.

    sudo -i -u ${sid}adm -- HDB start
    

Checking the SAP HANA system replication status

You can monitor the system replication status by using the following tools.

  • SAP HANA cockpit
  • SAP HANA studio
  • hdbnsutil command-line tool
  • systemReplicationStatus.py Python script
  • SQL queries

The full output of the systemReplicationStatus.py script is available on only the primary system, as a database connection is required to obtain some of the status information.

On NODE1, check the system replication status by using the systemReplicationStatus.py Python script.

sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py

Sample output:

# sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2  |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2  |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2  |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |

status system replication site "3": ACTIVE
status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: SiteA

An alternative view of the system replication status is available with the hdbnsutil command.

On all nodes, run the following command to check the system replication status.

sudo -i -u ${sid}adm -- hdbnsutil -sr_state

Sample output on NODE1:

# sudo -i -u hdbadm -- hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: primary
operation mode: primary
site id: 1
site name: SiteA

is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: true
is a takeover active: false
is primary suspended: false

Host Mappings:
~~~~~~~~~~~~~~

cl-hdb-1 -> [SiteC] cl-hdb-3
cl-hdb-1 -> [SiteB] cl-hdb-2
cl-hdb-1 -> [SiteA] cl-hdb-1


Site Mappings:
~~~~~~~~~~~~~~
SiteA (primary/primary)
    |---SiteC (async/logreplay)
    |---SiteB (syncmem/logreplay)

Tier of SiteA: 1
Tier of SiteC: 2
Tier of SiteB: 2

Replication mode of SiteA: primary
Replication mode of SiteC: async
Replication mode of SiteB: syncmem

Operation mode of SiteA: primary
Operation mode of SiteC: logreplay
Operation mode of SiteB: logreplay

Mapping: SiteA -> SiteC
Mapping: SiteA -> SiteB

Hint based routing site:
done.

Sample output on NODE2:

# sudo -i -u hdbadm -- hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: syncmem
operation mode: logreplay
site id: 2
site name: SiteB

is source system: true
is secondary/consumer system: true
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false
is timetravel enabled: false
replay mode: auto
active primary site: 1

primary masters: cl-hdb-1

Host Mappings:
~~~~~~~~~~~~~~

cl-hdb-2 -> [SiteC] cl-hdb-3
cl-hdb-2 -> [SiteB] cl-hdb-2
cl-hdb-2 -> [SiteA] cl-hdb-1


Site Mappings:
~~~~~~~~~~~~~~
SiteA (primary/primary)
    |---SiteC (async/logreplay)
    |---SiteB (syncmem/logreplay)

Tier of SiteA: 1
Tier of SiteC: 2
Tier of SiteB: 2

Replication mode of SiteA: primary
Replication mode of SiteC: async
Replication mode of SiteB: syncmem

Operation mode of SiteA: primary
Operation mode of SiteC: logreplay
Operation mode of SiteB: logreplay

Mapping: SiteA -> SiteC
Mapping: SiteA -> SiteB

Hint based routing site:
done.

Sample output on NODE3:

# sudo -i -u hdbadm -- hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: async
operation mode: logreplay
site id: 3
site name: SiteC

is source system: false
is secondary/consumer system: true
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false
is timetravel enabled: false
replay mode: auto
active primary site: 1

primary masters: cl-hdb-1

Host Mappings:
~~~~~~~~~~~~~~

cl-hdb-3 -> [SiteC] cl-hdb-3
cl-hdb-3 -> [SiteB] cl-hdb-2
cl-hdb-3 -> [SiteA] cl-hdb-1


Site Mappings:
~~~~~~~~~~~~~~
SiteA (primary/primary)
    |---SiteC (async/logreplay)
    |---SiteB (syncmem/logreplay)

Tier of SiteA: 1
Tier of SiteC: 2
Tier of SiteB: 2

Replication mode of SiteA: primary
Replication mode of SiteC: async
Replication mode of SiteB: syncmem

Operation mode of SiteA: primary
Operation mode of SiteC: logreplay
Operation mode of SiteB: logreplay

Mapping: SiteA -> SiteC
Mapping: SiteA -> SiteB

Hint based routing site:
done.
done.

On all nodes, run the following command to check the replication mode and the operation mode.

sudo -i -u ${sid}adm -- \
    hdbnsutil -sr_state \
        --sapcontrol=1 2>/dev/null | grep -E "site(Operation|Replication)Mode"

Sample output:

# sudo -i -u ${sid}adm -- hdbnsutil -sr_state --sapcontrol=1 2>/dev/null | grep -E "site(Operation|Replication)Mode"
siteReplicationMode/SiteA=primary
siteReplicationMode/SiteB=syncmem
siteOperationMode/SiteA=primary
siteOperationMode/SiteB=logreplay

Enabling automatic registration of secondaries after a takeover

In multitarget replication scenarios, SAP HANA can automatically reregister the secondaries that were previously registered before a takeover. To enable this feature, set the parameter register_secondaries_on_takeover in the [system_replication] section in the global.ini file to true. After a failover of an SAP HANA primary system to a secondary, the other secondary system reregisters automatically to the new primary system.

This option must be added to the global.ini file on all potential primary sites.

On all three nodes, run the following command to change the parameter.

sudo -i -u ${sid}adm -- <<EOT
    python \$DIR_INSTANCE/exe/python_support/setParameter.py \
      -set SYSTEM/global.ini/system_replication/register_secondaries_on_takeover=true
EOT

Verify the [system_replication] section in the global.ini configuration file.

cat /hana/shared/${SID}/global/hdb/custom/config/global.ini

Testing the SAP HANA system replication cluster

It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.

For example, the description of each test case includes the following information.

  • Component that is tested
  • Description of the test
  • Prerequisites and the initial state before the failover test
  • Test procedure
  • Expected behavior and results
  • Recovery procedure

Test1 - Testing the failure of the primary database instance

Use the following information to test the failure of the primary database instance.

Test1 - Description

Simulate a crash of the primary SAP HANA database instance that runs on NODE1.

Test1 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both cluster nodes are active.
  • The cluster is started on NODE1 and NODE2.
  • The cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA system replication status:
    • SAP HANA multitarget system replication is activated and in sync.
    • The primary SAP HANA system runs on NODE1.
    • The secondary SAP HANA system runs on NODE2.
    • Another secondary SAP HANA system runs on NODE3 at the DR site and is registered with NODE1.

Check the current system replication status on NODE1.

sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py

Sample output:

# sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteA     |cl-hdb-2  |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteA     |cl-hdb-2  |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteA     |cl-hdb-2  |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |

status system replication site "3": ACTIVE
status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: SiteA

Test1 - Test procedure

Crash SAP HANA primary by sending a SIGKILL signal as user ${sid}adm.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test1 - Expected behavior

  • The SAP HANA primary instance on NODE1 crashes.
  • The cluster detects the stopped primary and marks the resource as undefined.
  • The cluster promotes the secondary SAP HANA system on NODE2, which takes over as primary.
  • The cluster releases the virtual IP address on NODE1, and acquires it on the primary on NODE2.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
  • The secondary HANA system that runs on NODE3 at the DR site is automatically reregistered to the new primary that runs on NODE2.
  • The cluster waits until the primary on NODE2 is fully active and registers the failed instance on NODE1 as a secondary.
  • The cluster starts the secondary HANA instance on NODE1.

On NODE1, run the following command to check the cluster status.

pcs status --full

Sample output:

pcs status --full
Cluster name: HDB_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: cl-hdb-1 (1) (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
  * Last updated: Mon Oct  9 10:46:59 2023
  * Last change:  Mon Oct  9 10:46:54 2023 by root via crm_attribute on cl-hdb-2
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ cl-hdb-1 (1) cl-hdb-2 (2) ]

Full List of Resources:
  * res_fence_ibm_powervs       (stonith:fence_ibm_powervs):     Started cl-hdb-1
  * vip_HDB_00_primary  (ocf::heartbeat:IPaddr2):        Started cl-hdb-2
  * Clone Set: SAPHanaTopology_HDB_00-clone [SAPHanaTopology_HDB_00]:
    * SAPHanaTopology_HDB_00    (ocf::heartbeat:SAPHanaTopology):        Started cl-hdb-1
    * SAPHanaTopology_HDB_00    (ocf::heartbeat:SAPHanaTopology):        Started cl-hdb-2
  * Clone Set: SAPHana_HDB_00-clone [SAPHana_HDB_00] (promotable):
    * SAPHana_HDB_00    (ocf::heartbeat:SAPHana):        Slave cl-hdb-1
    * SAPHana_HDB_00    (ocf::heartbeat:SAPHana):        Master cl-hdb-2

Node Attributes:
  * Node: cl-hdb-1 (1):
    * hana_hdb_clone_state              : DEMOTED
    * hana_hdb_op_mode                  : logreplay
    * hana_hdb_remoteHost               : cl-hdb-2
    * hana_hdb_roles                    : 4:S:master1:master:worker:master
    * hana_hdb_site                     : SiteA
    * hana_hdb_sra                      : -
    * hana_hdb_srah                     : -
    * hana_hdb_srmode                   : syncmem
    * hana_hdb_sync_state               : SOK
    * hana_hdb_version                  : 2.00.070.00
    * hana_hdb_vhost                    : cl-hdb-1
    * lpa_hdb_lpt                       : 30
    * master-SAPHana_HDB_00             : 100
  * Node: cl-hdb-2 (2):
    * hana_hdb_clone_state              : PROMOTED
    * hana_hdb_op_mode                  : logreplay
    * hana_hdb_remoteHost               : cl-hdb-1
    * hana_hdb_roles                    : 4:P:master1:master:worker:master
    * hana_hdb_site                     : SiteB
    * hana_hdb_sra                      : -
    * hana_hdb_srah                     : -
    * hana_hdb_srmode                   : syncmem
    * hana_hdb_sync_state               : PRIM
    * hana_hdb_version                  : 2.00.070.00
    * hana_hdb_vhost                    : cl-hdb-2
    * lpa_hdb_lpt                       : 1696841214
    * master-SAPHana_HDB_00             : 150

Migration Summary:
  * Node: cl-hdb-1 (1):
    * SAPHana_HDB_00: migration-threshold=5000 fail-count=1 last-failure='Mon Oct  9 10:39:58 2023'

Failed Resource Actions:
  * SAPHana_HDB_00_monitor_119000 on cl-hdb-1 'master (failed)' (9): call=31, status='complete', exitreason='', last-rc-change='2023-10-09 10:39:58 +02:00', queued=0ms, exec=0ms

Tickets:

PCSD Status:
  cl-hdb-1: Online
  cl-hdb-2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

On NODE2, run the following command to check the system replication status.

sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py

Sample output:

# sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-1  |    30001 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-1  |    30007 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-1  |    30003 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |

status system replication site "3": ACTIVE
status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: SiteB

The SAP HANA primary runs on NODE2 at SiteB. The secondary on NODE3 is automatically reregistered to the new primary that runs on NODE2. As the cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true, the cluster registers the SAP HANA system on NODE1 automatically as a secondary to the primary on NODE2.

Test2 - Testing the manual move of a SAPHana resource to another node

Use the following information to test the manual move of the SAPHana resource to another node.

Test2 - Description

Use cluster commands to move the primary instance to the other cluster node.

Test2 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both cluster nodes are active.
  • The cluster is started on NODE1 and NODE2.
  • The cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA system replication status:
    • HANA system replication is activated and in sync.
    • The primary SAP HANA system runs on NODE2.
    • The secondary SAP HANA system runs on NODE1.
    • Another secondary SAP HANA system runs on NODE3 at the DR site and is registered with NODE2.

Test2 - Test procedure

  1. On a cluster node, run the following command to move the primary back to NODE1.

    pcs resource move SAPHana_${SID}_${INSTNO}-clone
    

    Sample output:

    # pcs resource move SAPHana_${SID}_${INSTNO}-clone
    Warning: Creating location constraint 'cli-ban-SAPHana_HDB_00-clone-on-cl-hdb-2' with a score of -INFINITY for resource SAPHana_HDB_00-clone on cl-hdb-2.
         This will prevent SAPHana_HDB_00-clone from running on cl-hdb-2 until the constraint is removed
         This will be the case even if cl-hdb-2 is the last node in the cluster
    

    After the primary is active on NODE1, SAP HANA automatically reregisters the instance on NODE3 as a secondary to NODE1.

  2. Wait until the primary is up on NODE1. Then, remove the location constraint.

    pcs resource clear SAPHana_${SID}_${INSTNO}-clone
    

    Sample output:

    # pcs resource clear SAPHana_${SID}_${INSTNO}-clone
    Removing constraint: cli-ban-SAPHana_HDB_00-clone-on-cl-hdb-2
    

    This command clears the location constraint that was created by the move command. The cluster starts the SAP HANA system on NODE2.

  3. Verify the system replication status on all three nodes as described in Checking the SAP HANA system replication status.

Test2 - Expected behavior

  • The cluster creates a location constraint to move the resource.
  • The cluster triggers a takeover to the secondary HANA system on NODE1.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
  • Register NODE2 with the primary on NODE1.
  • Run pcs resource clear command to remove the location constraint. This command triggers the start of the secondary instance on NODE2.
  • The secondary HANA system that runs on NODE3 at the DR site is automatically reregistered to the new primary that runs on NODE1.

Test2 - Recovery procedure

No recovery procedure is required. The test sequence reestablished the initial SAP HANA multitarget system replication topology.

Test3 - Testing failure of node that runs the primary database

Use the following information to test the failure of the node that runs the primary database.

Test3 - Description

Simulate a crash of the node that runs the primary HANA database.

Test3 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both cluster nodes are active.
  • The cluster is started on NODE1 and NODE2.
  • The cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA system replication status:
    • HANA system replication is activated and in sync.
    • The primary SAP HANA system runs on NODE1.
    • The secondary SAP HANA system runs on NODE2.
    • Another secondary SAP HANA system runs on NODE3 at the DR site and is registered with NODE1.

Test3 - Test procedure

Crash the primary on NODE1 by sending a crash system request.

On NODE1, run the following command.

sync; echo c > /proc/sysrq-trigger

Test3 - Expected behavior

  • NODE1 crashes.
  • The cluster detects the failed node and sets its state to OFFLINE.
  • The cluster promotes the secondary HANA database on NODE2 to take over as the new primary.
  • The cluster acquires the virtual IP address on NODE2.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
  • The secondary SAP HANA system that runs on NODE3 at the DR site is automatically reregistered to NODE2.

Verify the SAP HANA system replication status on NODE2.

sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py

Sample output:

# sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
|HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |

status system replication site "3": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: SiteB

Test3 - Recovery procedure

  1. Log in to the IBM Cloud console and start NODE1.

  2. On NODE1, run the following command to start the cluster services.

    pcs cluster start
    
  3. On a cluster node, run the following command to check the cluster status.

    pcs status --full
    
  4. On NODE2, verify the SAP HANA system replication status.

    sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
    
    

    Sample output:

    # sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
    |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    |SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-1  |    30001 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-1  |    30007 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-1  |    30003 |        1 |SiteA     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    
    status system replication site "3": ACTIVE
    status system replication site "1": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 2
    site name: SiteB
    
  5. Run the steps in Test2 - Test the manual move of SAP Hana resource to another node to revert to the initial topology.

Test4 - Testing DR activation on the node that runs at the DR site

Use the following information to test the failure of both nodes in the primary workspace.

Test4 - Description

Simulate a crash of the nodes that run the primary and secondary SAP HANA databases.

Test4 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both cluster nodes are active.
  • The cluster is started on NODE1 and NODE2.
  • The cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA system replication status:
    • SAP HANA multitarget system replication is activated and in sync.
    • The primary SAP HANA system runs on NODE1.
    • The secondary SAP HANA system runs on NODE2.
    • Another secondary SAP HANA system runs on NODE3 at the DR site and is registered with NODE1.

Test4 - Test procedure

Crash primary on NODE1 and secondary on NODE2 by sending a crash system request on both nodes.

  1. On NODE1, run the following command.

    sync; echo c > /proc/sysrq-trigger
    
  2. On NODE2, run the following command.

    sync; echo c > /proc/sysrq-trigger
    
  3. On NODE3, run the following command to activate the HANA system as primary.

    sudo -i -u ${sid}adm -- hdbnsutil -sr_takeover
    

    Sample output:

    # sudo -i -u hdbadm -- hdbnsutil -sr_takeover
    done.
    

Test4 - Expected behavior

  • NODE1 and NODE2 halt immediately.
  • After the manual takeover, NODE3 runs the primary SAP HANA system.
  • An application, such as SAP NetWeaver, can connect to the SAP HANA system on NODE3.

NODE3 is not part of the cluster and does not takeover the virtual IP address after a HANA system replication takeover. The start up of application servers that connect to NODE3 at the DR site requires extra effort, which is not described in this document.

On NODE3, run the following command to verify that the SAP HANA system runs as primary.

sudo -i -u ${sid}adm -- hdbnsutil -sr_state

Sample output:

# sudo -i -u hdbadm -- hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: primary
operation mode: primary
site id: 3
site name: SiteC

is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false

Host Mappings:
~~~~~~~~~~~~~~

cl-hdb-3 -> [SiteC] cl-hdb-3
cl-hdb-3 -> [SiteB] cl-hdb-2


Site Mappings:
~~~~~~~~~~~~~~
SiteC (primary/primary)

Tier of SiteC: 1

Replication mode of SiteC: primary

Operation mode of SiteC: primary


Hint based routing site:
done.

Test4 - Recovery procedure

The recovery procedure after a takeover to the DR site is complex and is documented as a separate test in the Test5 section.

Test5 - Restoring the original SAP HANA multitarget system replication topology

Use the following information to revert to the original system replication topology after a takeover to the SAP HANA system that runs at the DR site.

Check the following SAP documentation.

Test5 - Description

Restore the original system replication topology and reactivate the cluster in the primary workspace.

Test5 - Prerequisites

  • A two-node RHEL HA Add-On cluster for HANA system replication in the primary workspace.
  • Both virtual server instances of the cluster are stopped.
  • The primary SAP HANA system runs on NODE3 at the DR site.

Test5 - Test procedure

  1. Restart virtual server instances in the primary workspace.

    1. Log in to the IBM Cloud console and start both NODE1 and NODE2.
    2. Wait until both nodes are available.
    3. Make sure that the Red Hat HA Add-On cluster services are stopped on both cluster nodes.
  2. Register the SAP HANA system on NODE1 as a secondary.

    1. On NODE3, verify that SAP HANA system replication is enabled.

      sudo -i -u ${sid}adm -- hdbnsutil -sr_state
      
    2. On NODE1, run the following command to set an environment variable with the hostname of NODE3.

      export NODE3=<Hostname 3>   # Hostname of virtual server instance 3 (production tertiary)
      
    3. On NODE1, run the following command to register the SAP HANA system with the primary on NODE3.

      sudo -i -u ${sid}adm -- \
          hdbnsutil -sr_register \
            --name=${DC1} \
            --remoteHost=${NODE3} \
            --remoteInstance=${INSTNO} \
            --replicationMode=async \
            --operationMode=logreplay \
            --online
      
    4. On NODE1, check the system replication configuration.

      sudo -i -u ${sid}adm -- hdbnsutil -sr_state
      

      Sample output:

      
      System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~
      
      online: false
      
      mode: async
      operation mode: unknown
      site id: 1
      site name: SiteA
      
      is source system: unknown
      is secondary/consumer system: true
      has secondaries/consumers attached: unknown
      is a takeover active: false
      is primary suspended: false
      is timetravel enabled: false
      replay mode: auto
      active primary site: 3
      
      primary masters: cl-hdb-3
      done.
      
    5. On NODE1, start the SAP HANA system to start the system replication.

      sudo -i -u ${sid}adm -- HDB start
      
    6. On NODE3, check the system replication status and wait until the secondary on NODE1 is fully synchronized.

      sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      

      Sample output:

      # sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
      |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
      |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
      |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
      |SYSTEMDB |cl-hdb-3 |30001 |nameserver   |        1 |      3 |SiteC     |cl-hdb-1  |    30001 |        1 |SiteA     |YES           |ASYNC       |ACTIVE      |               |        True |
      |HDB      |cl-hdb-3 |30007 |xsengine     |        2 |      3 |SiteC     |cl-hdb-1  |    30007 |        1 |SiteA     |YES           |ASYNC       |ACTIVE      |               |        True |
      |HDB      |cl-hdb-3 |30003 |indexserver  |        3 |      3 |SiteC     |cl-hdb-1  |    30003 |        1 |SiteA     |YES           |ASYNC       |ACTIVE      |               |        True |
      
      status system replication site "1": ACTIVE
      overall system replication status: ACTIVE
      
      Local System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      mode: PRIMARY
      site id: 3
      site name: SiteC
      
  3. Initiate a fallback to the primary workspace.

    You need a downtime window to perform the move of the primary role back to NODE1.

    To optimize the downtime window, the SAP HANA system on NODE2 can be registered as secondary to NODE3 now before the downtime window. The drawback is that a higher amount of data is transferred between the two Power Virtual Server workspaces.

    In the following, the SAP HANA system on NODE2 is registered as secondary to NODE1 after NODE1 becomes primary again.

    1. Stop all applications and SAP application servers that are connected to NODE3.

    2. On NODE1, run the following command to takeover the primary role.

      A takeover with handshake suspends all transactions on the primary system on NODE3 and the takeover is only executed when all remaining redo log is available on NODE1.

      sudo -i -u ${sid}adm -- hdbnsutil -sr_takeover --suspendPrimary
      
    3. On NODE1, check that the HANA system runs as primary.

      sudo -i -u ${sid}adm -- hdbnsutil -sr_state
      
    4. On NODE3, run the following command to verify the system replication status.

      sudo -i -u ${sid}adm -- hdbnsutil -sr_state
      

      Sample output:

      # sudo -i -u ${sid}adm -- hdbnsutil -sr_state
      
      System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~
      
      online: true
      
      SUSPEND PRIMARY ACTIVE
      mode: primary
      operation mode: primary
      site id: 3
      site name: SiteC
      
      is source system: true
      is secondary/consumer system: false
      has secondaries/consumers attached: true
      is a takeover active: false
      is primary suspended: true
      
      Host Mappings:
      ~~~~~~~~~~~~~~
      
      cl-hdb-3 -> [SiteC] cl-hdb-3
      cl-hdb-3 -> [SiteB] cl-hdb-2
      cl-hdb-3 -> [SiteA] cl-hdb-1
      
      
      Site Mappings:
      ~~~~~~~~~~~~~~
      SiteC (primary/primary)
          |---SiteA (async/logreplay)
      
      Tier of SiteC: 1
      Tier of SiteA: 2
      
      Replication mode of SiteC: primary
      Replication mode of SiteA: async
      
      Operation mode of SiteC: primary
      Operation mode of SiteA: logreplay
      
      Mapping: SiteC -> SiteA
      
      Hint based routing site:
      done.
      

    The following summary shows the status after these steps.

    • NODE1 runs as primary, but no application is connected.
    • NODE2 is up, but SAP HANA is not started.
    • NODE3 is up and SAP HANA is blocked in suspendPrimary mode.
    • Red Hat HA Add-On cluster services are stopped on NODE1 and NODE2.
  4. Register the SAP HANA system on NODE2 as a secondary.

    1. On NODE2, run the following command to register the SAP HANA instance with the primary on NODE1.

      sudo -i -u ${sid}adm -- \
          hdbnsutil -sr_register \
            --name=${DC2} \
            --remoteHost=${NODE1} \
            --remoteInstance=${INSTNO} \
            --replicationMode=syncmem \
            --operationMode=logreplay \
            --online
      
    2. On NODE2, start SAP HANA to start the replication.

      sudo -i -u ${sid}adm -- HDB start
      
    3. On NODE1, check the system replication status and wait until the secondary on NODE2 is fully synchronized.

      sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      

      Sample output:

      # sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
      |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
      |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
      |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
      |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2  |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2  |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2  |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      
      status system replication site "2": ACTIVE
      overall system replication status: ACTIVE
      
      Local System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      mode: PRIMARY
      site id: 1
      site name: SiteA
      

    The following summary shows the status after these steps.

    • NODE1 runs as primary, but no application is connected.
    • NODE2 runs as secondary.
    • NODE3 is up and SAP HANA is blocked in suspendPrimary mode.
    • Red Hat HA Add-On cluster services are stopped on NODE1 and NODE2.
  5. Restart the cluster on NODE1 and NODE2.

    1. Stop the SAP HANA systems on NODE1 and NODE2.

      On NODE1, run

      sudo -i -u ${sid}adm -- HDB stop
      

      On NODE2, run

      sudo -i -u ${sid}adm -- HDB stop
      
    2. On a cluster node, run the following command to start the cluster.

      pcs cluster start --all
      
    3. Check the cluster status and verify that it is fully operational again.

      pcs status --full
      
    4. On NODE1, check the system replication status.

      sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      

      Sample output:

      # sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
      |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
      |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
      |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2  |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2  |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2  |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
      
      status system replication site "2": ACTIVE
      overall system replication status: ACTIVE
      
      Local System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      mode: PRIMARY
      site id: 1
      site name: SiteA
      

    The following summary shows the status after these steps.

    • NODE1 runs as primary.
    • NODE2 runs as secondary.
    • Red Hat HA Add-On cluster services are started and the cluster manages SAP HANA system replication on NODE1 and NODE2.
    • NODE3 is up and SAP HANA is blocked in suspendPrimary mode.
  6. Register the SAP HANA system on NODE3 as a secondary.

    1. On NODE3, run the following command to register the system with NODE1.

      sudo -i -u ${sid}adm -- \
          hdbnsutil -sr_register \
            --name=${DC3} \
            --remoteHost=${NODE1} \
            --remoteInstance=${INSTNO} \
            --replicationMode=async \
            --operationMode=logreplay \
            --online
      
    2. On NODE1, run the following command to verify the new SAP HANA system replication topology.

      sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      

      The previous hdbnsutil -sr_register command triggers a restart of the SAP HANA system. During this restart, you might observe a CONNECTION TIMEOUT status in the output.

      Sample output:

      # sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
      |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary|Secondary |Secondary |Secondary |Secondary          |Replication |Replication |Replication    |Secondary    |
      |         |         |      |             |          |        |          |Host     |Port      |Site ID   |Site Name |Active Status      |Mode        |Status      |Status Details |Fully Synced |
      |-------- |-------- |----- |------------ |--------- |------- |--------- |-------- |--------- |--------- |--------- |------------------ |----------- |----------- |-------------- |------------ |
      |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2 |    30001 |        2 |SiteB     |YES                |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2 |    30007 |        2 |SiteB     |YES                |SYNCMEM     |ACTIVE      |               |        True |
      |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2 |    30003 |        2 |SiteB     |YES                |SYNCMEM     |ACTIVE      |               |        True |
      |SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-3 |    30001 |        3 |SiteC     |CONNECTION TIMEOUT |ASYNC       |UNKNOWN     |               |       False |
      |HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-3 |    30007 |        3 |SiteC     |CONNECTION TIMEOUT |ASYNC       |UNKNOWN     |               |       False |
      |HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-3 |    30003 |        3 |SiteC     |CONNECTION TIMEOUT |ASYNC       |UNKNOWN     |               |       False |
      
      status system replication site "2": ACTIVE
      status system replication site "3": UNKNOWN
      overall system replication status: UNKNOWN
      
      Local System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      mode: PRIMARY
      site id: 1
      site name: SiteA
      

      In case the system does not automatically restart after the hdbnsutil -sr_register command, you need to stop and start it manually.

      The following is a sample output of such a situation. The replication status of NODE3 shows IS PRIMARY (e.g. after takeover) and it does not change when you check the status multiple times.

      # sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
      |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary|Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication                      |Secondary    |
      |         |         |      |             |          |        |          |Host     |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details                   |Fully Synced |
      |-------- |-------- |----- |------------ |--------- |------- |--------- |-------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------------------------- |------------ |
      |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-3 |    30001 |        3 |          |PRIMARY       |            |            |IS PRIMARY (e.g. after takeover) |       False |
      |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-3 |    30007 |        3 |          |PRIMARY       |            |            |IS PRIMARY (e.g. after takeover) |       False |
      |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-3 |    30003 |        3 |          |PRIMARY       |            |            |IS PRIMARY (e.g. after takeover) |       False |
      |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2 |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |                                 |        True |
      |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2 |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |                                 |        True |
      |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2 |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |                                 |        True |
      
      status system replication site "3": ERROR
      status system replication site "2": ACTIVE
      overall system replication status: ERROR
      
      Local System Replication State
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      mode: PRIMARY
      site id: 1
      site name: SiteA
      

      On NODE3, run the following command to restart the secondary HANA system.

      sudo -i -u ${sid}adm -- HDB restart
      

    The following summary shows the final status after these steps.

    • NODE1 runs as primary.
    • NODE2 runs as secondary.
    • NODE3 runs as another secondary at the DR site.
    • NODE2 and NODE3 are both registered to NODE1.
    • Red Hat HA Add-On cluster services are started and the cluster manages SAP HANA system replication on NODE1 and NODE2.

    On NODE1, run the following command to verify the SAP HANA system replication topology.

    sudo -i -u ${sid}adm -- HDBSettings.sh systemReplicationStatus.py
    

    Sample output:

    # sudo -i -u hdbadm -- HDBSettings.sh systemReplicationStatus.py
    |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |---------|----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |cl-hdb-1 |30001 |nameserver   |        1 |      1 |SiteA     |cl-hdb-2  |    30001 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |HDB      |cl-hdb-1 |30007 |xsengine     |        2 |      1 |SiteA     |cl-hdb-2  |    30007 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |HDB      |cl-hdb-1 |30003 |indexserver  |        3 |      1 |SiteA     |cl-hdb-2  |    30003 |        2 |SiteB     |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |SYSTEMDB |cl-hdb-2 |30001 |nameserver   |        1 |      2 |SiteB     |cl-hdb-3  |    30001 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30007 |xsengine     |        2 |      2 |SiteB     |cl-hdb-3  |    30007 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    |HDB      |cl-hdb-2 |30003 |indexserver  |        3 |      2 |SiteB     |cl-hdb-3  |    30003 |        3 |SiteC     |YES           |ASYNC       |ACTIVE      |               |        True |
    
    status system replication site "2": ACTIVE
    status system replication site "3": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 1
    site name: SiteA