FAQs for Spectrum LSF
This document provides a list of frequently asked questions and answers about a specific topic for IBM Spectrum LSF.
What Spectrum LSF packages are included in a cluster deployed with this offering?
IBM Spectrum LSF Standard Edition is included in Spectrum LSF program.
What locations are available for deploying VPC resources?
The available regions and zones for deploying VPC resources, mapping of those to city locations and data centers can be found in Locations for resource deployment.
What permissions do you need to create a cluster using the offering?
The instructions to set the appropriate permissions for IBM Cloud services platform roles and service roles can be seen in the below screenshots:


How do I SSH among nodes?
All the nodes in the HPC cluster have the same public key that you register at your cluster creation. You can use ssh-agent forwarding, which is a common technique to access remote nodes that have the same public key. It automates to securely forward private keys to remote nodes. Forwarded keys are deleted immediately after a session is closed.
To securely forward private keys to remote nodes, you need to do ssh-add
and ssh -A
.
[your local PC]~$ ssh-add {id_rsa for lsf cluster}
[your local PC]~# ssh -A -J root@jumpbox_fip root@management_private_ip
...
[root@management]~# ssh -A worker_private_ip
For Mac OS X, you can persist ssh-add
by adding the following configuration to .ssh/config
:
Host *
UseKeychain yes
AddKeysToAgent yes
You can even remove -A
by adding "ForwardAgent yes" to .ssh/config
.
How many worker nodes can you deploy in the Spectrum LSF cluster through this offering?
Before deploying a cluster, it is important to ensure that the VPC resource quota settings are appropriate for the size of the cluster that you would like to create (see Quotas and service limits).
The maximum number of worker nodes that are supported for the deployment value worker_node_max_count
is 500 (see Deployment values. The worker_node_instance_type
variable specifies the number of worker nodes that are provisioned at the time the cluster is created, which exist throughout the life of the cluster. The delta between those two variables specifies the maximum number of worker nodes that
can either be created or destroyed by the LSF resource connector auto scaling feature. In configurations where that delta exceeds 250, it is recommended to take caution if the characteristics of the workload are expected to result in >250
cluster node join or remove operation requests at a single point in time. In those cases, it is recommended to pace the job start and stop requests, if possible. Otherwise, you might see noticeable delays in some subset of the nodes joining
or being removed from the cluster.
This worker node instance type supports a combination of multiple instance profile type that could be choosen for different number of instance count. For example, you can choose to create 100 instance from bx2-4x16
and 10 instance
from mx3d-8x80
. So you get totally a count of 110 static worker nodes with different instance profile, based upon your requirement.
Why there are two different resource group parameters that can be specified in the IBM Cloud catalog tile?
The first resource group parameter entry in the Configure your workspace section in the IBM Cloud catalog applies to the resource group where the Schematics workspace is provisioned on your IBM Cloud account. The value for this parameter can
be different than the one used for the second entry in the Parameters with default values section in the catalog. The second entry applies to the resource group where VPC resources are provisioned. As specified in the description for this
second resource_group
parameter. Only the default resource group is supported for use of the LSF Resource Connector auto-scaling feature.
Where are the Terraform files used by the IBM Spectrum LSF tile located?
The Terraform-based templates can be found in this GitHub repository.
Where can you find the custom image name to image ID mappings for each cloud region?
The mappings can be found in the image-map.tf
file in this GitHub repository.
Which Spectrum LSF and Spectrum Scale versions are used in cluster nodes deployed with this offering?
Cluster nodes that are deployed with this offering include IBM Spectrum LSF 10.1.0.14 Standard Edition plus Data Manager plus License Scheduler. See the following for a brief description of each of those programs: IBM Spectrum LSF 10 family of products
If the cluster uses Storage Scale storage, the storage nodes include IBM Storage Scale 5.2.2-0 software. For more information, see the IBM Storage Scale product documentation.
Why is the CPU number displayed on an LSF worker node different than what is shown in the LSF Application Center GUI?
The CPU column in the LSF Application Center GUI and the ncpus
column when you run the lscpu
command on an LSF worker node might not show the same value.
The CPU column output that you get by running lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU(s)'
on an LSF worker node shows the number of CPU threads (not physical cores) on that compute instance.
If EGO_DEFINE_NCPUS=threads
, then “ncpus=number of processors x number of cores x number of threads” and the CPU column value in the LSF Application Center GUI will match what you see when running lscpu
on an LSF worker
node.
If EGO_DEFINE_NCPUS=cores
, then “ncpus=number of processors x number of cores” and the CPU column value in the LSF Application Center GUI will be half of what you see when running lscpu
on an LSF worker node.
For more information, see ncpus calculation in LSF.
What file storage for IBM Cloud Virtual Private Cloud (VPC) profiles are supported for the IBM® Spectrum LSF cluster shared storage?
IBM Cloud File Storage for VPC is a zonal file storage offering that provides NFS-based file storage services. You create file share mounts from a subnet in an availability zone within a region. You can also share them with multiple virtual server instances within the same zone across multiple VPCs. IBM® Spectrum LSF supports the use of dp2 profiles.
Can you specify the total IOPS (input or output operations per second) for a file share when deploying an Spectrum LSF cluster?
Yes, when you deploy an Spectrum LSF cluster, you can choose the required IOPS value appropriate for your file share size.
What are the supported operating systems for dynamic node creation with Spectrum LSF?
You can deploy your Spectrum LSF environment to automatically create Red Hat Enterprise Linux (RHEL) compute nodes. The supported image hpcaas-lsf10-rhel810-compute-v8
is used for the compute_image_name
deployment input
value, to dynamically create nodes for the applicable operating system.
As part of dynamic node provisioning, Ubuntu based operating system is not supported.
As a cluster administrator, how do I best restart the LSF daemon processes?
A cluster administrator can choose to restart all the cluster daemons. In an Spectrum LSF environment, these daemons are the most used and relevant to LSF:
lim
(on all nodes)res
(on all nodes)sbatchd
(on all nodes)mbatchd
(only on the primary management node)mbschd
(only on the primary management node)
Other LSF processes exist, but they are started by these main daemons. Choose between two methods for restarting LSF daemon processes: a wrapper to run on each host, or commands to run to affect all hosts in the cluster.
Restarting LSF daemons on an individual host
To restart the cluster daemons on an individual node, use the lsf_deamons
script. To stop all the daemons on a node, run lsf_deamons stop
.
Likewise, to start all the daemons on a node, run lsf_deamons start
.
Repeat these commands on each node if you want to restart the full cluster. Run the commands on both management and compute nodes that join the cluster.
No daemons are run on the login node, as the login node is used for running particular tasks: to submit Spectrum LSF jobs; monitor Spectrum LSF job status; display hosts and their static resource information; display and filter information about LSF jobs; and display the LSF version number, cluster name, and the management hostname.
Restarting LSF daemons for all hosts in the cluster
You can also restart all the daemons on all the hosts in your cluster, including both management nodes and compute nodes that join your cluster.
To restart all the daemons on all the nodes in your cluster, use the lsfrestart
command.
To shut down all the daemons on all the nodes in your cluster, use the lsfshutdown
command.
LSF also provides an lsfstartup
command, which starts all the daemons on all the management (not compute) nodes in your cluster. If you have compute nodes that joined your cluster and you want to continue to use them (for example,
after you run lsfshutdown
to shut down all daemons on all hosts, which include the compute nodes), then you must SSH to connect to each host and run the lsf_deamons start
script to bring back the compute nodes.
Alternatively, since the compute nodes are within your Spectrum LSF environment, you can also leave them alone and they are returned to the resource pool in ten minutes (by default). New compute nodes can join upon new job requests.
No daemons are run on the login node, as the login node is used for running particular tasks: to submit Spectrum LSF jobs; monitor Spectrum LSF job status; display hosts and their static resource information; display and filter information about LSF jobs; and display the LSF version number, cluster name, and the management hostname.
How do I secure LSF Application Center connections by importing the cacert.pem
certificate into a browser?
LSF Application Center requires that the $GUI_CONFDIR/https/cacert.pem
certificate (generated by LSF Application Center) is installed in the browser to secure specific functions, such as remote consoles and HTTPS. Import this certificate into your browser to securely connect with IBM Spectrum LSF Application Center.
What are the limitations of available profiles for dedicated hosts?
The offering automatically selects instance profiles for dedicated hosts to be the same prefix (for example, bx2 and cx2) as ones for worker instances (worker_node_instance_type
). However, available instance prefixes can be limited,
depending on your target region. If you use dedicated hosts, check ibmcloud target -r {region_name}
and ibmcloud is dedicated-host-profiles
to see whether your worker_node_instance_type
has the available
prefix for your target region.