Hybrid HPC with persistent cloud resource pools

This reference architecture summarizes the best practices for deploying a Hybrid High Performance Computing (HPC) environment connecting an on-premises HPC environment to a persistent pool of HPC compute resource on IBM Cloud®. An organization with an existing HPC on-premises facility might decide to augment this facility with additional cloud-based resources to meet increased or changing business demands.

Hybrid HPC high level architecture — High level Hybrid HPC architecture

In the diagram, an existing HPC environment on-premises is connected to an HPC environment in IBM Cloud. Jobs can be submitted to, and run on either compute environment. There is a communications link between the on-premises data center and the IBM Cloud data center, for example, using a Direct Link. The Hybrid HPC environment is enabled by a hybrid HPC capability that hides the complexity of the environment from the user and automates decisions about which compute jobs are processed where through a set of rules and policies. Hybrid HPC environments where data needs to be transferred or kept synchronized between on-premises and the cloud will also require a capability to enable this data movement.

Architecture diagrams

The architecture for the Hybrid HPC with persistent cloud resource pools has multiple constituent layers:

Infrastructure layer
Software layer
Data layer

Infrastructure layer

Review the following architecture diagram for the infrastructure and cloud services that are used to deliver the Hybrid HPC with persistent cloud resource pools pattern:

Infrastructure architecture diagram for Hybrid HPC with persistent cloud resource pools. — Infrastructure architecture diagram for Hybrid HPC with persistent cloud resource pools

From an infrastructure perspective, the HPC environment in IBM Cloud consists of one or more HPC Cluster Management Nodes. Multiple nodes are deployed to provide resilience of the environment. These HPC Cluster Management Nodes distribute workloads across a pool of Compute Nodes. The number and type of Compute Nodes depends on the characteristics of the workload(s) needing to be run within the environment. Both HPC Cluster Management Nodes and Compute Nodes are deployed as Virtual Server Instances (VSIs) within a Virtual Private Cloud (VPC). Some organisations may choose to use VPC Bare Metal servers as Compute Nodes.

Many HPC applications process data. A number of Storage Nodes are deployed to deliver this data to the Compute Nodes. The Storage Nodes typically use attached Block Storage and present a Shared File system within which the data is stored. The Compute Nodes read and write data to the Shared File system. There is also a requirement for a Shared File system to hold the metadata used by the HPC Cluster Management Nodes. This is separate to the file system used to store the application data. It is recommended to use VPC Bare Metal servers as Storage Nodes.

The HPC environment in the cloud will likely need its own DNS service to provide name resolution and Virtual Private Endpoints for secure access to other cloud services such as Monitoring, Logging, Identity and Access management (IAM), and so on.

Network connectivity between the enterprise data center and IBM Cloud is likely to use a Direct Link private network.

Software layer

Review the following architecture diagram for the software that is used to deliver the Hybrid HPC with persistent cloud resource pools pattern:

Software architecture diagram for Hybrid HPC with persistent cloud resource pools. — Software architecture diagram for Hybrid HPC with persistent cloud resource pools

Execution flow

To understand the interactions between the components of the Hybrid HPC with persistent cloud resource pools architecture, consider how a job is processed within the system. A typical problem that runs on an HPC system will be split into multiple jobs which are then run on the many compute nodes within the system. The way these are executed is:

The user invokes a computation through an application, web browser, or command line interface.
The job request is sent to the multicluster manager which uses pre-defined rules and policies to determine whether the job should be processed using on-premises HPC resources or those in the cloud.
The multicluster manager sends the compute job to the cluster manager for the respective cluster (on-premises or cloud). The job is then queued locally for dispatch to the compute node(s).
The cluster manager sends the job to the HPC Agent. One agent runs on each of the compute nodes. The agent runs the relevant application executables to perform the computation. As part of the computation, the application might access data stored in the shared file system.

When the computational job completes, notification is sent back by the reverse route. The HPC agent informs the local cluster manager. This, in turn, informs the multicluster manager which returns the completion status to the user.

Data layer

For those HPC applications that require data, there needs to be a mechanism to deliver data from on-premises to the cloud environment. There are three potential approaches that might be used to manage the movement of data:

No data movement occurs. Data is accessed by compute nodes local to that data.
The data movement is managed by the HPC workload scheduler.
The data movement is managed by the file system(s) holding the data.

The following is the architecture diagram for the data layer that is used to deliver the Hybrid HPC with persistent cloud resource pools pattern. This outlines the two data movement approaches. The approach where data does not move is not depicted for clarity.

Data architecture diagram for Hybrid HPC with persistent cloud resource pools. — Data architecture diagram for Hybrid HPC with persistent cloud resource pools

Execution flow for workload scheduler-managed data movement

Applications process data. This data needs to be available on the compute node(s) where the computation on that data will occur. If the data resides on-premises and the computation will occur in the cloud, then the data will need to be moved to the cloud. This data movement can be controlled by the workload scheduler.

The user invokes a compute job through an application, web browser or command line interface.
The job request is sent to the multicluster manager which uses pre-defined rules and policies to determine whether the job should be processed using on-premises HPC resources or in the cloud.
The multicluster manager communicates with the data manager components. These are responsible for moving the data from the on-premises file system to the cloud file system. The data requirement information includes the in-cloud cluster that the job is eligible to be forwarded to.
The data manager in the cloud requests that the data file(s) get copied to the local staging area (cache) in the cloud. The data requirement information includes the candidate cluster(s) that the job is eligible to be forwarded to.
The required files are copied from the on-premises file system(s) to the in-cloud staging area.
When the files are available in the cloud, the data manager informs the cluster manager that the data is in place and that the compute jobs can be run.

Execution flow for file system-managed data movement

IBM Storage Scale provides a high-performance parallel file system to meet the needs of HPC workloads. Part of Storage Scale is a capability called Active File Management (AFM). Active File Management provides on-demand movement of applications that is transparent to the application using the file system. This is illustrated by Path A in the diagram.

When a user attempts to access a file in the cloud that is physically located on-premises, the Active File Management capability seamlessly manages the transfer of the file to the in-cloud file system. The in-cloud file system can be configured in different ways depending on whether it is to be used as a read-only "cache" or as a read-write file system. Data changes made to the in-cloud copy of the file are seamlessly transferred back to the primary copy stored on-premises.

Design concepts

Review the design considerations and architecture decisions for the following aspects and domains:

Compute: Virtual Servers
Storage: Primary Storage
Networking: Domain Name Services
Security: Data Security, Identity & Access
Resiliency: High Availability
Service Management: Monitoring, Logging, Auditing and Tracking, Management and Orchestration

Design choices

HPC cluster management software

Different HPC problems require different HPC cluster management software solutions. IBM Cloud provides two options. IBM Spectrum LSF (Load Sharing Facility) is a batch scheduler. Users submit jobs onto a queue and these are processed in turn according to the policies and rules that have been defined. IBM Spectrum Symphony is a realtime scheduler that's designed to deliver faster response times and aimed specifically at the needs of the Financial Services industry. IBM Cloud provides tiles that can automatically deploy these software solutions into an HPC cluster.

Other HPC cluster scheduler solutions from the open source community such as Slurm and Condor, or from other commercial organizations are also available. These must be manually deployed. Refer to the websites of the open source projects or the documentation provided by commercial organizations for further information on how these can be used within the Hybrid HPC with persistent cloud resource pools architecture.

Storage options

Most HPC environments consume data stored in file systems. IBM Cloud provides two shared file systems. VPC File Storage provides a slower performance file system that can be used to store the metadata required by the cluster management software or for low-use data storage for HPC workloads, for example, application binaries. Workloads needing high performance parallel file systems should use IBM Storage Scale. This is best deployed on VPC Bare metal servers.

Compute nodes

Computation is performed in Virtual Servers (VSIs). There are many different VSI profiles that can be chosen to best meet the compute and memory requirements of the application(s) being run within the HPC environment. Consider an application that requires 3 vCPUs of compute and 7GB of memory. The needs of this application might be met by the cx2-4x8 VSI profile which provides 4 vCPUs and 8GB of memory. If this profile is chosen, one instance of the application runs on one compute node at any one time. As a comparison, a VSI profile such as cx2-64x128 could be chosen instead. The workload scheduler can be configured to pack multiple jobs into a single VSI. In this case, 18 instances of the application could run within a single VSI with this profile.

It is recommended that in environments where multiple applications with different resource needs run simultaneously, that the compute nodes be sized to support the application footprint requiring the most CPU and memory. The IBM workload scheduling software is able to run multiple instances of workloads with reduced CPU and memory needs on these VSIs to make optimal use of the compute resources available.

Requirements

The following table outlines the requirements used in the architecture for each aspect:

Requirements
Aspect	Requirements
Compute	Provide properly isolated compute resources with adequate compute capacity for the application. Remember to allow for the resource needs of the operating system and any other software needed.
Storage	Provide storage that meets the application data volume and performance requirements.
Networking	Deploy workloads in an isolated environment and enforce information flow policies. Provide secure and encrypted connectivity to the cloud's private network for management purposes. Distribute resolution to support the use of hostnames instead of IP addresses.
Security	Protect the boundaries of the application against the Denial of Service and application-layer attacks. If it's required, encrypt all the application data in transit and at rest to protect from unauthorized disclosure. Encrypt all the security data and operational and audit logs to protect from unauthorized disclosure.
Resiliency	Support application availability targets. Ensure availability of the application in the event of a planned and an unplanned outage. Provide highly available compute, storage, network, and other cloud services to handle application load and performance requirements. Provide highly available storage for security data logs and backup data. Automate recovery tasks to minimize down time.
Service management	Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. Generate alerts and notifications about issues that might impact the availability of applications to trigger appropriate responses to minimize down time. Monitor audit logs to track changes and detect potential security problems. Provide a mechanism to identify and send notifications about issues found in the audit logs.

Components

The following table outlines the products or services used in the architecture for each aspect:

Components
Aspects	Architecture component	How the component is used
Compute	Virtual Servers for VPC	HPC cluster management nodes and compute nodes
Storage	VPC File Storage	Low performance storage for HPC management metadata and/or lightweight application data access needs
	Storage Scale	High performance parallel file system for data-intensive HPC workloads
Networking	Virtual Private Endpoint (VPE)	For private network access to Cloud Services, e.g., Key Protect, IAM, etc.
	Public Gateway	For secure client access to the HPC environment over the Internet
	Direct Link	For private, dedicated connectivity between on-premises and cloud HPC resources
	DNS	Domain Name Services for the HPC environment
Security	Identity and Access Management	Identity and Access Management
	Key protect or Hyper Protect Crypto Services	Hardware security module (HSM) and Key Management Service
	Secrets Manager	Certificate and Secrets Management
Resiliency	Virtual Servers for VPC in conjunction with Spectrum LSF or Spectrum Symphony	The HPC workload is split across multiple VSIs. The HPC Management software (LSF or Symphony) manages failures of compute nodes by resubmitting failed compute jobs to other VSIs
Service Management	Spectrum LSF or Spectrum Symphony	HPC cluster management software provides application and performance status & monitoring, resource consumption and utilization
	IBM Cloud Monitoring	Operational monitoring
	IBM Cloud Logs	Operational and Audit logs