Managing GPUs and accelerators
The GPU-enabled family of profiles provides on demand, cost-effective access to GPUs and accelerators. GPUs and accelerators help to accelerate the processing time that is required for compute intensive workloads such as AI, machine learning, inferencing and more. To use the GPUs and accelerators, make sure that you install the appropriate driver and associated toolkit for your workloads.
Configuring a virtual server instance with an NVIDIA GPU
-
Provision a virtual server instance by choosing an NVIDIA GPU profile in the Profile field. Stock and custom operating system images are supported.
-
Install the NVIDIA GPU driver for your virtual server instance's image and GPU profile. The following table describes minimum driver and CUDA software version levels for Linux and Windows operating systems. For more information, see NVIDIA's Download drivers page. For an overview of drivers for NVIDIA data center products, see NVIDIA Data Center Drivers.
NVIDIA drivers and CUDA version for Linux
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 550 12.4 L4 550 12.4 L40s 550 12.4 V100 535 12.2 H100 550 12.4 H200 570 12.8 NVIDIA drivers and CUDA version for Windows 2019, 2022
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 538 12.2 L4 538 12.2 L40s 538 12.2 V100 535 12.2 H100 N/A N/A H200 N/A N/A NVIDIA drivers and CUDA version for Windows 2016
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 529 12.0 L4 529 12.0 L40s N/A N/A V100 535 12.0 H100 N/A N/A H200 N/A N/A -
Install associated toolkit for your workload. Visit NVIDIA's CUDA toolkit downloads page.
For detailed instructions to complete Steps 2 and 3, other GPU tools, and examples, see How to Use V100-Based GPUs on IBM Cloud VPC.
For a Linux-focused guide on installing the NVIDIA drivers, see the NVIDIA Driver Installation Guide.
If you want to automate the installation of the drivers, you can use the User data section of the virtual server. By using the user data field, you can input a script that issues the commands to install the NVIDIA drivers.
Configuring a virtual server instance with an Intel Gaudi 3 AI Accelerator
- Provision a virtual server instance by choosing the Intel® Gaudi® 3 AI Accelerator instance profile in the Profile field. Stock and custom operating system images are supported.
- Install the Intel Gaudi 3 AI Accelerator software and drivers for your virtual server. To download the drivers, see Intel Gaudi Driver and Software Installation page.
Configuring a virtual server instance with an AMD Instinct MI300X Accelerator
- Provision a virtual server instance by choosing the AMD Instinct™ MI300X Accelerator instance profile in the Profile field. Stock and custom operating system images are supported.
- Install the necessary drivers for your virtual server. To download the drivers, see Installing ROCm and machine learning frameworks page.
- If the guest OS for your virtual server is Ubuntu, you must remove
nomodeset
from the command line and restart the virtual server.- These commands must be run as root. Sudo to root.
sudo -i
- Remove
nomodeset
from the settings file. The following example uses vi.vi /etc/default/grub.d/50-cloudimg-settings.cfg
- Verify that
nomodeset
is removed from the settings file.cat /etc/default/grub.d/50-cloudimg-settings.cfg
- Update grub.
update-grub
- Restart the virtual server.
- These commands must be run as root. Sudo to root.
Integrating drivers into a custom image from volume
- Provision a virtual server instance with a GPU and install the drivers.
- Create an image from the virtual server instance stock image boot volume. For more information, see Creating an image from a volume.
- Repeat the Image from volume process to deploy across multiple instances.
Next steps
For more information, see the NVIDIA driver documentation.