Hybrid HPC with persistent cloud resource pools
This reference architecture summarizes the best practices for deploying a Hybrid High Performance Computing (HPC) environment connecting an on-premises HPC environment to a persistent pool of HPC compute resource on IBM Cloud®. An organization with an existing HPC on-premises facility might decide to augment this facility with additional cloud-based resources.
In the diagram, an existing HPC environment on-premises is connected to an HPC environment in IBM Cloud. Jobs can be submitted to, and run on either compute environment. There is a communications link between the on-premises data center and the IBM Cloud data center, for example, using a Direct Link. The Hybrid HPC environment is enabled by a hybrid HPC capability that hides the complexity of the environment from the user and automates decisions about which compute jobs are processed where through a set of rules and policies. Hybrid HPC environments where data needs to be transferred or kept synchronized between on-premises and the cloud will also require a capability to enable this data movement.
Architecture diagrams
The architecture for the Hybrid HPC with persistent cloud resource pools has multiple constituent layers:
- Infrastructure layer
- Software layer
- Data layer
Infrastructure layer
Review the following architecture diagram for the infrastructure and cloud services that are used to deliver the Hybrid HPC with persistent cloud resource pools pattern:
From an infrastructure perspective, the HPC environment in IBM Cloud consists of one or more HPC Cluster Management Nodes. Multiple nodes are deployed to provide resilience of the environment. These HPC Cluster Management Nodes distribute workloads across a pool of Compute Nodes. The number and type of Compute Nodes depends on the characteristics of the workload(s) needing to be run within the environment. Both HPC Cluster Management Nodes and Compute Nodes are deployed as Virtual Server Instances (VSIs) within a Virtual Private Cloud (VPC). Some organisations may choose to use VPC Bare Metal servers as Compute Nodes.
Many HPC applications process data. A number of Storage Nodes are deployed to deliver this data to the Compute Nodes. The Storage Nodes typically use attached Block Storage and present a Shared File system within which the data is stored. The Compute Nodes read and write data to the Shared File system. There is also a requirement for a Shared File system to hold the metadata used by the HPC Cluster Management Nodes. This is separate to the file system used to store the application data. It is recommended to use VPC Bare Metal servers as Storage Nodes.
The HPC environment in the cloud will likely need its own DNS service to provide name resolution and Virtual Private Endpoints for secure access to other cloud services such as Monitoring, Logging, Identity and Access management (IAM), and so on.
Network connectivity between the enterprise data center and IBM Cloud is likely to use a Direct Link private network.
Software layer
Review the following architecture diagram for the software that is used to deliver the Hybrid HPC with persistent cloud resource pools pattern:
Execution flow
To understand the interactions between the components of the Hybrid HPC with persistent cloud resource pools architecture, consider how a job is processed within the system. A typical problem that runs on an HPC system will be split into multiple jobs which are then run on the many compute nodes within the system. The way these are executed is:
- The user invokes a computation through an application, web browser, or command line interface.
- The job request is sent to the multicluster manager which uses pre-defined rules and policies to determine whether the job should be processed using on-premises HPC resources or those in the cloud.
- The multicluster manager sends the compute job to the cluster manager for the respective cluster (on-premises or cloud). The job is then queued locally for dispatch to the compute node(s).
- The cluster manager sends the job to the HPC Agent. One agent runs on each of the compute nodes. The agent runs the relevant application executables to perform the computation. As part of the computation, the application might access data stored in the shared file system.
When the computational job completes, notification is sent back by the reverse route. The HPC agent informs the local cluster manager. This, in turn, informs the multicluster manager which returns the completion status to the user.
Data layer
For those HPC applications that require data, there needs to be a mechanism to deliver data from on-premises to the cloud environment. There are three potential approaches that might be used to manage the movement of data:
- No data movement occurs. Data is accessed by compute nodes local to that data.
- The data movement is managed by the HPC workload scheduler.
- The data movement is managed by the file system(s) holding the data.
The following is the architecture diagram for the data layer that is used to deliver the Hybrid HPC with persistent cloud resource pools pattern. This outlines the two data movement approaches. The approach where data does not move is not depicted for clarity.
Execution flow for workload scheduler-managed data movement
Applications process data. This data needs to be available on the compute node(s) where the computation on that data will occur. If the data resides on-premises and the computation will occur in the cloud, then the data will need to be moved to the cloud. This data movement can be controlled by the workload scheduler.
- The user invokes a compute job through an application, web browser or command line interface.
- The job request is sent to the multicluster manager which uses pre-defined rules and policies to determine whether the job should be processed using on-premises HPC resources or in the cloud.
- The multicluster manager communicates with the data manager components. These are responsible for moving the data from the on-premises file system to the cloud file system. The data requirement information includes the in-cloud cluster that the job is eligible to be forwarded to.
- The data manager in the cloud requests that the data file(s) get copied to the local staging area (cache) in the cloud. The data requirement information includes the candidate cluster(s) that the job is eligible to be forwarded to.
- The required files are copied from the on-premises file system(s) to the in-cloud staging area.
- When the files are available in the cloud, the data manager informs the cluster manager that the data is in place and that the compute jobs can be run.
Execution flow for file system-managed data movement
IBM Storage Scale provides a high-performance parallel file system to meet the needs of HPC workloads. Part of Storage Scale is a capability called Active File Management (AFM). Active File Management provides on-demand movement of applications that is transparent to the application using the file system. This is illustrated by Path A in the diagram.
When a user attempts to access a file in the cloud that is physically located on-premises, the Active File Management capability seamlessly manages the transfer of the file to the in-cloud file system. The in-cloud file system can be configured in different ways depending on whether it is to be used as a read-only "cache" or as a read-write file system. Data changes made to the in-cloud copy of the file are seamlessly transferred back to the primary copy stored on-premises.
Design concepts
Review the design considerations and architecture decisions for the following aspects and domains:
- Compute: Virtual Servers
- Storage: Primary Storage
- Networking: Domain Name Services
- Security: Data Security, Identity & Access
- Resiliency: High Availability
- Service Management: Monitoring, Logging, Auditing and Tracking, Management and Orchestration
Design choices
HPC cluster management software
Different HPC problems require different HPC cluster management software solutions. IBM Cloud provides two options. IBM Spectrum LSF (Load Sharing Facility) is a batch scheduler. Users submit jobs onto a queue and these are processed in turn according to the policies and rules that have been defined. IBM Spectrum Symphony is a realtime scheduler that's designed to deliver faster response times and aimed specifically at the needs of the Financial Services industry. IBM Cloud provides tiles that can automatically deploy these software solutions into an HPC cluster.
Other HPC cluster scheduler solutions from the open source community such as Slurm and Condor, or from other commercial organizations are also available. These must be manually deployed. Refer to the websites of the open source projects or the documentation provided by commercial organizations for further information on how these can be used within the Hybrid HPC with persistent cloud resource pools architecture.
Storage options
Most HPC environments consume data stored in file systems. IBM Cloud provides two shared file systems. VPC File Storage provides a slower performance file system that can be used to store the metadata required by the cluster management software or for low-use data storage for HPC workloads, for example, application binaries. Workloads needing high performance parallel file systems should use IBM Storage Scale. This is best deployed on VPC Bare metal servers.
Compute nodes
Computation is performed in Virtual Servers (VSIs). There are many different VSI profiles that can be chosen to best meet the compute and memory requirements of the application(s) being run within the HPC environment. Consider an application that requires 3 vCPUs of compute and 7GB of memory. The needs of this application might be met by the cx2-4x8 VSI profile which provides 4 vCPUs and 8GB of memory. If this profile is chosen, one instance of the application runs on one compute node at any one time. As a comparison, a VSI profile such as cx2-64x128 could be chosen instead. The workload scheduler can be configured to pack multiple jobs into a single VSI. In this case, 18 instances of the application could run within a single VSI with this profile.
It is recommended that in environments where multiple applications with different resource needs run simultaneously, that the compute nodes be sized to support the application footprint requiring the most CPU and memory. The IBM workload scheduling software is able to run multiple instances of workloads with reduced CPU and memory needs on these VSIs to make optimal use of the compute resources available.
Requirements
The following table outlines the requirements used in the architecture for each aspect:
Aspect | Requirements |
---|---|
Compute | Provide properly isolated compute resources with adequate compute capacity for the application. Remember to allow for the resource needs of the operating system and any other software needed. |
Storage | Provide storage that meets the application data volume and performance requirements. |
Networking | Deploy workloads in an isolated environment and enforce information flow policies. Provide secure and encrypted connectivity to the cloud's private network for management purposes. Distribute resolution to support the use of hostnames instead of IP addresses. |
Security | Protect the boundaries of the application against the Denial of Service and application-layer attacks. If it's required, encrypt all the application data in transit and at rest to protect from unauthorized disclosure. Encrypt all the security data and operational and audit logs to protect from unauthorized disclosure. |
Resiliency | Support application availability targets. Ensure availability of the application in the event of a planned and an unplanned outage. Provide highly available compute, storage, network, and other cloud services to handle application load and performance requirements. Provide highly available storage for security data logs and backup data. Automate recovery tasks to minimize down time. |
Service management | Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. Generate alerts and notifications about issues that might impact the availability of applications to trigger appropriate responses to minimize down time. Monitor audit logs to track changes and detect potential security problems. Provide a mechanism to identify and send notifications about issues found in the audit logs. |
Components
The following table outlines the products or services used in the architecture for each aspect:
Aspects | Architecture component | How the component is used |
---|---|---|
Compute | Virtual Servers for VPC | HPC cluster management nodes and compute nodes |
Storage | VPC File Storage | Low performance storage for HPC management metadata and/or lightweight application data access needs |
Storage Scale | High performance parallel file system for data-intensive HPC workloads | |
Networking | Virtual Private Endpoint (VPE) | For private network access to Cloud Services, e.g., Key Protect, IAM, etc. |
Public Gateway | For secure client access to the HPC environment over the Internet | |
Direct Link | For private, dedicated connectivity between on-premises and cloud HPC resources | |
DNS | Domain Name Services for the HPC environment | |
Security | Identity and Access Management | Identity and Access Management |
Key protect or Hyper Protect Crypto Services | Hardware security module (HSM) and Key Management Service | |
Secrets Manager | Certificate and Secrets Management | |
Resiliency | Virtual Servers for VPC in conjunction with Spectrum LSF or Spectrum Symphony | The HPC workload is split across multiple VSIs. The HPC Management software (LSF or Symphony) manages failures of compute nodes by resubmitting failed compute jobs to other VSIs |
Service Management | Spectrum LSF or Spectrum Symphony | HPC cluster management software provides application and performance status & monitoring, resource consumption and utilization |
IBM Cloud Monitoring | Operational monitoring | |
IBM Cloud Log Analysis | Operational logs | |
Activity Tracker Event Routing | Audit logs |