Managing GPUs and accelerators

The GPU-enabled family of profiles provides on demand, cost-effective access to GPUs and accelerators. GPUs and accelerators help to accelerate the processing time that is required for compute intensive workloads such as AI, machine learning, inferencing and more. To use the GPUs and accelerators, make sure that you install the appropriate driver and associated toolkit for your workloads.

Configuring a virtual server instance with an NVIDIA GPU

Provision a virtual server instance by choosing an NVIDIA GPU profile in the Profile field. Stock and custom operating system images are supported.

Install the NVIDIA GPU driver for your virtual server instance's image and GPU profile. The following table describes minimum driver and CUDA software version levels for Linux and Windows operating systems. For more information, see NVIDIA's Download drivers page. For an overview of drivers for NVIDIA data center products, see NVIDIA Data Center Drivers.

NVIDIA drivers and CUDA version for Linux
GPUs and minimum NVIDIA drivers and CUDA versions
GPU	NVIDIA driver	CUDA version
A100	550	12.4
L4	550	12.4
L40s	550	12.4
V100	535	12.2
H100	550	12.4
H200	570	12.8

NVIDIA drivers and CUDA version for Windows 2019, 2022
GPUs and minimum NVIDIA drivers and CUDA versions
GPU	NVIDIA driver	CUDA version
A100	538	12.2
L4	538	12.2
L40s	538	12.2
V100	535	12.2
H100	N/A	N/A
H200	N/A	N/A

NVIDIA drivers and CUDA version for Windows 2016
GPUs and minimum NVIDIA drivers and CUDA versions
GPU	NVIDIA driver	CUDA version
A100	529	12.0
L4	529	12.0
L40s	N/A	N/A
V100	535	12.0
H100	N/A	N/A
H200	N/A	N/A

Install associated toolkit for your workload. Visit NVIDIA's CUDA toolkit downloads page.

For detailed instructions to complete Steps 2 and 3, other GPU tools, and examples, see How to Use V100-Based GPUs on IBM Cloud VPC.

For a Linux-focused guide on installing the NVIDIA drivers, see the NVIDIA Driver Installation Guide.

If you want to automate the installation of the drivers, you can use the User data section of the virtual server. By using the user data field, you can input a script that issues the commands to install the NVIDIA drivers.

Configuring a virtual server instance with an Intel Gaudi 3 AI Accelerator

Provision a virtual server instance by choosing the Intel® Gaudi® 3 AI Accelerator instance profile in the Profile field. Stock and custom operating system images are supported.
Install the Intel Gaudi 3 AI Accelerator software and drivers for your virtual server. To download the drivers, see Intel Gaudi Driver and Software Installation page.

Configuring a virtual server instance with an AMD Instinct MI300X Accelerator

Provision a virtual server instance by choosing the AMD Instinct™ MI300X Accelerator instance profile in the Profile field. Stock and custom operating system images are supported.
Install the necessary drivers for your virtual server. To download the drivers, see Installing ROCm and machine learning frameworks page.
If the guest OS for your virtual server is Ubuntu, you must remove nomodeset from the command line and restart the virtual server.
1. These commands must be run as root. Sudo to root.
```
sudo -i
```
2. Remove nomodeset from the settings file. The following example uses vi.
```
vi /etc/default/grub.d/50-cloudimg-settings.cfg
```
3. Verify that nomodeset is removed from the settings file.
```
cat /etc/default/grub.d/50-cloudimg-settings.cfg
```
4. Update grub.
```
update-grub
```
5. Restart the virtual server.

Integrating drivers into a custom image from volume

Provision a virtual server instance with a GPU and install the drivers.
Create an image from the virtual server instance stock image boot volume. For more information, see Creating an image from a volume.
Repeat the Image from volume process to deploy across multiple instances.

Next steps

For more information, see the NVIDIA driver documentation.