Updating clusters, worker nodes, and cluster components
Review the following sections for steps to keep your cluster master and worker nodes up-to-date.
Updating the master
- How do I know when to update the master?
- You are notified in the console, announcements, and the CLI when updates are available. You can also periodically check the supported versions page.
- How many versions behind the latest can the master be?
- You can update the API server only to the next version ahead of its current version (
n+1
). - Can my worker nodes run a later version than the master?
- Your worker nodes can't run a later
major.minor
Kubernetes version than the master. Additionally, your worker nodes can only be one version behind the master version (n-1
). First, update your master to the latest Kubernetes version. Then, update the worker nodes in your cluster.
Worker nodes can run later patch versions than the master, such as patch versions that are specific to worker nodes for security updates.
- How are patch updates applied?
- By default, patch updates for the master are applied automatically over the course of several days, so a master patch version might show up as available before it is applied to your master. The update automation also skips clusters that are
in an unhealthy state or have operations currently in progress. Occasionally, IBM might disable automatic updates for a specific master fix pack, such as a patch that is only needed if a master is updated from one minor version to another.
In any of these cases, you can check the Red Hat OpenShift on IBM Cloud version information for any potential impact and choose to safely use the
ibmcloud oc cluster master update
command yourself without waiting for the update automation to apply.
Unlike the master, you must update your workers for each patch version.
- What happens during the master update?
- Your master is highly available with three replica master pods. The master pods have a rolling update, during which only one pod is unavailable at a time. Two instances are up and running so that you can access and change the cluster during the update. Your worker nodes, apps, and resources continue to run.
- Can I roll back the update?
- No, you can't roll back a cluster to a previous version after the update process takes place. Be sure to use a test cluster and follow the instructions to address potential issues before you update your production master.
- What process can I follow to update the master?
- The following diagram shows the process that you can take to update your master.
Steps to update the cluster master
Before you begin, make sure that you have the Operator or Administrator IAM platform access role.
To update the Red Hat OpenShift master major or minor version:
-
Review the Red Hat OpenShift on IBM Cloud version information and make any updates marked Update before master.
-
Review any Kubernetes helpful warnings, such as deprecation notices.
-
Check the add-ons and plug-ins that are installed in your cluster for any impact that might be caused by updating the cluster version.
-
Checking add-ons
- List the add-ons in the cluster.
ibmcloud oc cluster addon ls --cluster CLUSTER
- Check the supported Red Hat OpenShift version for each add-on that is installed.
ibmcloud oc addon-versions
- If the add-on must be updated to run in the Red Hat OpenShift version that you want to update your cluster to, update the add-on.
- List the add-ons in the cluster.
-
Checking plug-ins
- In the Helm catalog, find the plug-ins that you installed in your cluster.
- From the side menu, expand the SOURCES & TAR FILE section.
- Download and open the source code.
- Check the
README.md
orRELEASENOTES.md
files for supported versions. - If the plug-in must be updated to run in the Red Hat OpenShift version that you want to update your cluster to, update the plug-in by following the plug-in instructions.
-
-
Update your API server and associated master components by using the IBM Cloud console or running the CLI
ibmcloud oc cluster master update
command. -
Wait a few minutes, then confirm that the update is complete. Review the API server version on the IBM Cloud clusters dashboard or run
ibmcloud oc cluster ls
. -
Install the version of the
oc cli
that matches the API server version that runs in the master. Kubernetes does not supportoc
client versions that are two or more versions apart from the server version (n +/- 2).
When the master update is complete, you can update your worker nodes, depending on the type of cluster infrastructure provider that you have.
Updating classic worker nodes
You notice that an update is available for your worker nodes in a classic infrastructure cluster. What does that mean? As security updates and patches are put
in place for the API server and other master components, you must be sure that the worker nodes remain in sync. You can make two types of updates: updating only the patch version, or updating the major.minor
version with the patch
version.
- Patch: A worker node patch update includes security fixes. You can update the classic worker node to the latest patch by using the
ibmcloud oc worker reload
orupdate
commands. Keep in mind that theupdate
command also updates the worker node to the samemajor.minor
version as the master and latest patch version, if amajor.minor
version update is also available. - Major.minor: A
major.minor
update moves up the Kubernetes version of the worker node to the same version as the master. This type of update often includes changes to the Kubernetes API or other behaviors that you must prepare your cluster for. Remember that your worker nodes can only be one version behind the master version (n-1
). You can update the classic worker node to the same patch by using theibmcloud oc worker update
command.
For more information, see Update types.
- What happens to my apps during an update?
- If you run apps as part of a deployment on worker nodes that you update, the apps are rescheduled onto other worker nodes in the cluster. These worker nodes might be in a different worker pool, or if you have stand-alone worker nodes, apps might be scheduled onto stand-alone worker nodes. To avoid downtime for your app, you must ensure that you have enough capacity in the cluster to carry the workload.
- How can I control how many worker nodes go down at a time during an update or reload?
- If you need all your worker nodes to be up and running, consider resizing your worker pool or adding stand-alone worker nodes to add more worker nodes. You can remove the additional worker nodes after the update is completed.
In addition, you can create a Kubernetes config map that specifies the maximum number of worker nodes that can be unavailable at a time, such as during an update. Worker nodes are identified by the worker node labels. You can use IBM-provided labels or custom labels that you added to the worker node.
The Kubernetes config map rules are used for updating worker nodes only. These rules do not impact worker node reloads which means reloading happens immediately when requested.
- What if I choose not to define a config map?
- When the config map is not defined, the default is used. By default, a maximum of 20% of all your worker nodes in each cluster can be unavailable during the update process.
Prerequisites
Before you update your classic infrastructure worker nodes, review the prerequisite steps.
Updates to worker nodes can cause downtime for your apps and services. Your worker node machine is reimaged, and data is deleted if not stored outside the pod.
- For the latest security patches and fixes, make sure to update your worker nodes to the latest patch as soon as possible after it is available. For more information about the latest updates, review the Red Hat OpenShift on IBM Cloud version information.
- Access your Red Hat OpenShift cluster.
- Update the master. The worker node version can't be higher than the API server version that runs in your Kubernetes master.
- Make any changes that are marked with Update after master in the Red Hat OpenShift version preparation guide.
- If you want to apply a patch update, review the Red Hat OpenShift on IBM Cloud version information.
- Consider adding more worker nodes so that your cluster has enough capacity to rescheduling your workloads during the update. For more information, see Adding worker nodes to Classic clusters or Adding worker nodes to VPC clusters.
- Make sure that you have the Operator or Administrator IAM platform access role.
Updating classic worker nodes in the CLI with a configmap
Set up a ConfigMap to perform a rolling update of your classic worker nodes.
-
Complete the prerequisite steps.
-
List available worker nodes and note their private IP address.
ibmcloud oc worker ls --cluster CLUSTER
-
View the labels of a worker node. You can find the worker node labels in the Labels section of your CLI output. Every label consists of a
NodeSelectorKey
and aNodeSelectorValue
.oc describe node PRIVATE-WORKER-IP
Example output
NAME: 10.184.58.3 Roles: <none> Labels: arch=amd64 beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-south failure-domain.beta.kubernetes.io/zone=dal12 ibm-cloud.kubernetes.io/encrypted-docker-data=true ibm-cloud.kubernetes.io/iaas-provider=softlayer ibm-cloud.kubernetes.io/machine-type=u3c.2x4.encrypted kubernetes.io/hostname=10.123.45.3 privateVLAN=2299001 publicVLAN=2299012 Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true CreationTimestamp: Tue, 03 Apr 2022 15:26:17 -0400 Taints: <none> Unschedulable: false
-
Create a config map and define the unavailability rules for your worker nodes. The following example shows four checks, the
zonecheck.json
,regioncheck.json
,defaultcheck.json
, and a check template. You can use these example checks to define rules for worker nodes in a specific zone (zonecheck.json
), region (regioncheck.json
), or for all worker nodes that don't match any of the checks that you defined in the config map (defaultcheck.json
). Use the check template to create your own check. For every check, to identify a worker node, you must choose one of the worker node labels that you retrieved in the previous step.For every check, you can set only one value for
NodeSelectorKey
andNodeSelectorValue
. If you want to set rules for more than one region, zone, or other worker node labels, create a new check. Define up to 15 checks in a config map. If you add more checks, only 1 worker node is reloaded at a time until all workers requested are updated.Example
apiVersion: v1 kind: ConfigMap metadata: name: ibm-cluster-update-configuration namespace: kube-system data: drain_timeout_seconds: "120" zonecheck.json: | { "MaxUnavailablePercentage": 30, "NodeSelectorKey": "failure-domain.beta.kubernetes.io/zone", "NodeSelectorValue": "dal13" } regioncheck.json: | { "MaxUnavailablePercentage": 20, "NodeSelectorKey": "failure-domain.beta.kubernetes.io/region", "NodeSelectorValue": "us-south" } defaultcheck.json: | { "MaxUnavailablePercentage": 20 } <check_name>: | { "MaxUnavailablePercentage": <value_in_percentage>, "NodeSelectorKey": "<node_selector_key>", "NodeSelectorValue": "<node_selector_value>" }
drain_timeout_seconds
- Optional: The timeout in seconds to wait for the drain to complete. Draining a worker node safely removes all existing pods from the worker node and reschedules the pods onto other worker nodes in the cluster. Accepted values are integers in the range 1 - 180. The default value is 30.
zonecheck.json
andregioncheck.json
- Two checks that define a rule for a set of worker nodes that you can identify with the specified
NodeSelectorKey
andNodeSelectorValue
. Thezonecheck.json
identifies worker nodes based on their zone label, and theregioncheck.json
uses the region label that is added to every worker node during provisioning. In the example, 30% of all worker nodes that havedal13
as their zone label and 20% of all the worker nodes inus-south
can be unavailable during the update. defaultcheck.json
- If you don't create a config map or the map is configured incorrectly, the Kubernetes default is applied. By default, only 20% of the worker nodes in the cluster can be unavailable at a time. You can override the default value by adding
the default check to your config map. In the example, every worker node that is not specified in the zone and region checks (
dal13
orus-south
) can be unavailable during the update. MaxUnavailablePercentage
- The maximum number of nodes that are allowed to be unavailable for a specified label key and value, which is specified as a percentage. A worker node is unavailable during the deploying, reloading, or provisioning process. The queued worker nodes are blocked from updating if it exceeds any defined maximum unavailable percentages.
NodeSelectorKey
- The label key of the worker node for which you want to set a rule. You can set rules for the default labels that are provided by IBM, as well as on worker node labels that you created. If you want to add a rule for worker nodes that
belong to one worker pool, you can use the
ibm-cloud.kubernetes.io/machine-type
label. NodeSelectorValue
- The label value that the worker node must have to be considered for the rule that you define.
-
Create the configuration map in your cluster.
oc apply -f <filepath/configmap.yaml>
-
Verify that the config map is created.
oc get configmap --namespace kube-system
-
Update the worker nodes.
ibmcloud oc worker update --cluster CLUSTER --worker WORKER-NODE-1-ID --worker WORKER-NODE-2-ID
-
Optional: Verify the events that are triggered by the config map and any validation errors that occur. The events can be reviewed in the Events section of your CLI output.
oc describe -n kube-system cm ibm-cluster-update-configuration
-
Confirm that the update is complete by reviewing the Kubernetes version of your worker nodes.
oc get nodes
-
Verify that you don't have duplicate worker nodes. Sometimes, older clusters might list duplicate worker nodes with a
NotReady
status after an update. To remove duplicates, see troubleshooting.
Next steps:
- Repeat the update process with other worker pools.
- Inform developers who work in the cluster to update their
oc
CLI to the version of the Kubernetes master. - If the Kubernetes dashboard does not display utilization graphs, delete the
kube-dashboard
pod.
Updating classic worker nodes in the console
After you set up the config map for the first time, you can then update worker nodes by using the IBM Cloud console.
To update worker nodes from the console:
- Complete the prerequisite steps and set up a config map to control how your worker nodes are updated.
- From the IBM Cloud console menu , click Red Hat OpenShift.
- From the Clusters page, click your cluster.
- From the Worker Nodes tab, select the checkbox for each worker node that you want to update. An action bar is displayed over the table header row.
- From the action bar, click Update.
If you have Portworx installed in your cluster, you must restart the Portworx pods on updated worker nodes. For more information, see Portworx limitations.
Updating VPC worker nodes
You notice that an update is available for your worker nodes in a VPC cluster. What does that mean? As security updates and patches are put in place for the API server and other master components, you must be sure that the worker nodes remain
in sync. You can make two types of updates: updating only the patch version, or updating the major.minor
version with the patch version.
If you have Portworx deployed in your cluster, follow the steps to update VPC worker nodes with Portworx volumes.
If you have OpenShift Data Foundation deployed in your cluster, follow the steps to update VPC worker nodes with OpenShift Data Foundation.
- Patch: A worker node patch update includes security fixes. You can update the VPC worker node to the latest patch by using the
ibmcloud oc worker replace
command. - Major.minor: A
major.minor
update moves up the Kubernetes version of the worker node to the same version as the master. This type of update often includes changes to the Kubernetes API or other behaviors that you must prepare your cluster for. Remember that your worker nodes can only be one version behind the master version (n-1
). You can update the VPC worker node to the same patch by using theibmcloud oc worker replace
command with the--update
option.
- What happens to my apps during an update?
- If you run apps as part of a deployment on worker nodes that you update, the apps are rescheduled onto other worker nodes in the cluster. These worker nodes might be in a different worker pool. To avoid downtime for your app, you must ensure that you have enough capacity in the cluster to carry the workload, such as by resizing your worker pools. For more information, see Adding worker nodes to Classic clusters or Adding worker nodes to VPC clusters.
- What happens to my worker node during an update?
- Your VPC worker node is replaced by removing the old worker node and provisioning a new worker node that runs at the updated patch or
major.minor
version. The replacement worker node is created in the same zone, same worker pool, and with the same flavor as the deleted worker node. However, the replacement worker node is assigned a new private IP address, and loses any custom labels or taints that you applied to the old worker node (worker pool labels and taints are still applied to the replacement worker node). - What if I replace multiple worker nodes at the same time?
- If you replace multiple worker nodes at the same time, they are deleted and replaced concurrently, not one by one. Make sure that you have enough capacity in your cluster to reschedule your workloads before you replace worker nodes.
- What if a replacement worker node is not created?
- A replacement worker node is not created if the worker pool does not have automatic rebalancing enabled.
Prerequisites
Before you update your VPC infrastructure worker nodes, review the prerequisite steps.
Updates to worker nodes can cause downtime for your apps and services. Your worker node machine is removed, and data is deleted if not stored outside the pod.
- For the latest security patches and fixes, make sure to update your worker nodes to the latest patch as soon as possible after it is available. For more information about the latest updates, review the Red Hat OpenShift on IBM Cloud version information.
- Access your Red Hat OpenShift cluster.
- Update the master. The worker node version can't be higher than the API server version that runs in your Kubernetes master.
- Make any changes that are marked with Update after master in the Red Hat OpenShift version preparation guide.
- If you want to apply a patch update, review the Red Hat OpenShift on IBM Cloud version information.
- Make sure that you have the Operator or Administrator IAM platform access role.
Updating VPC worker nodes in the CLI
Complete the following steps to update your worker nodes by using the CLI.
- Complete the prerequisite steps.
- Optional: Add capacity to your cluster by resizing the worker pool. The pods on the worker node can be rescheduled and continue running on the added worker nodes during the update. For more information, see Adding worker nodes to Classic clusters or Adding worker nodes to VPC clusters.
- List the worker nodes in your cluster and note the ID and Primary IP of the worker node that you want to update.
ibmcloud oc worker ls --cluster CLUSTER
- Replace the worker node to update either the patch version or the
major.minor
version that matches the master version.- To update the worker node to the same
major.minor
version as the master, include the--update
option.ibmcloud oc worker replace --cluster CLUSTER --worker WORKER-NODE-ID --update
- To update the worker node to the latest patch version at the same
major.minor
version, don't include the--update
option.ibmcloud oc worker replace --cluster CLUSTER --worker WORKER-NODE-ID
- To update the worker node to the same
- Repeat these steps for each worker node that you must update.
- Optional: After the replaced worker nodes are in a Ready status, resize the worker pool to meet the cluster capacity that you want. For more information, Adding worker nodes to VPC clusters.
If you are running Portworx in your VPC cluster, you must manually attach your Block Storage for VPC volume to your new worker node.
Updating VPC worker nodes in the console
You can update your VPC worker nodes in the console. Before you begin, consider adding worker nodes to the cluster to help avoid downtime for your apps.
- Complete the prerequisite steps.
- From the IBM Cloud console menu , click Red Hat OpenShift.
- From the Clusters page, click your cluster.
- From the Worker Nodes tab, select the checkbox for each worker node that you want to update. An action bar is displayed over the table header row.
- From the action bar, click Update.
Updating flavors (machine types)
Before you begin:
- Access your Red Hat OpenShift cluster.
- Data on the worker node is deleted. Consider storing your data on persistent storage outside of the worker node.
- Make sure that you have the Operator or Administrator IAM platform access role.
To update flavors:
-
List available worker nodes and note their private IP address.
- List available worker pools in your cluster.
ibmcloud oc worker-pool ls --cluster CLUSTER
- List the worker nodes in the worker pool. Note the ID and Private IP.
ibmcloud oc worker ls --cluster CLUSTER --worker-pool WORKER-POOL
- Get the details for a worker node. In the output, note the zone and either the private and public VLAN ID for classic clusters or the subnet ID for VPC clusters.
ibmcloud oc worker get --cluster CLUSTER --worker WORKER-ID
- List available worker pools in your cluster.
-
List available flavors in the zone.
ibmcloud oc flavors --zone <zone>
-
Create a worker node with the new machine type.
- Create a worker pool with the number of worker nodes that you want to replace.
- Classic clusters:
ibmcloud oc worker-pool create classic --name WORKER-POOL --cluster CLUSTER --flavor FLAVOR --size-per-zone NUMBER-OF-WORKERS-PER-ZONE
- VPC Generation 2 clusters:
ibmcloud oc worker-pool create vpc-gen2 --name NAME --cluster CLUSTER --flavor FLAVOR --size-per-zone NUMBER-OF-WORKERS-PER-ZONE --label LABEL
- Classic clusters:
- Verify that the worker pool is created.
ibmcloud oc worker-pool ls --cluster CLUSTER
- Add the zone to your worker pool that you retrieved earlier. When you add a zone, the worker nodes that are defined in your worker pool are provisioned in the zone and considered for future workload scheduling. If you want to spread your
worker nodes across multiple zones, choose a classic or VPC multizone location.
- Classic clusters:
ibmcloud oc zone add classic --zone ZONE --cluster CLUSTER --worker-pool WORKER-POOL --private-vlan PRIVATE-VLAN-ID --public-vlan PUBLIC-VLAN-ID
- VPC clusters:
ibmcloud oc zone add vpc-gen2 --zone ZONE --cluster CLUSTER --worker-pool WORKER-POOL --subnet-id VPC-SUBNET-ID
- Classic clusters:
- Create a worker pool with the number of worker nodes that you want to replace.
-
Wait for the worker nodes to be deployed. When the worker node state changes to Normal, the deployment is finished.
ibmcloud oc worker ls --cluster CLUSTER
-
Remove the old worker node. Note: If you are removing a flavor that is billed monthly (such as bare metal), you are charged for the entire the month.
- Remove the worker pool with the old machine type. Removing a worker pool removes all worker nodes in the pool in all zones. This process might take a few minutes to complete.
ibmcloud oc worker-pool rm --worker-pool WORKER-POOL --cluster CLUSTER
- Verify that the worker pool is removed.
ibmcloud oc worker-pool ls --cluster CLUSTER
- Remove the worker pool with the old machine type. Removing a worker pool removes all worker nodes in the pool in all zones. This process might take a few minutes to complete.
-
Verify that the worker nodes are removed from your cluster.
ibmcloud oc worker ls --cluster CLUSTER
-
Repeat these steps to update other worker pools or stand-alone worker nodes to different flavors.
How are worker pools scaled down?
When the number of worker nodes in a worker pool is decreased, such as during a worker node update or with the ibmcloud oc worker-pool resize
command, the worker nodes are prioritized for deletion based on several properties including state, health, and version.
This priority logic is not relevant to the autoscaler add-on.
The following table shows the order in which worker nodes are prioritized for deletion.
You can run the ibmcloud oc worker ls
command to view all the worker node properties listed in the table.
Priority | Property | Description |
---|---|---|
1 | Worker node state | Worker nodes in non-functioning or low-functioning states are prioritized for removal. This list shows the states ordered from highest to lowest priority: provision_failed , deploy_failed , deleting ,
provision_pending , provisioning , deploying , provisioned , reloading_failed , reloading , deployed . |
2 | Worker node health | Unhealthy worker nodes are prioritized over healthy worker nodes. This list shows the health states ordered from highest to lowest priority: critical , warning , pending , unsupported , normal . |
3 | Worker node version | Worker nodes that run on older versions are at a higher priority for deletion. |
4 | Desired placement setting | For workers running on a dedicated host only. Worker nodes running on a dedicated host that has the DesiredPlacementDisabled option set to true are at a higher priority for deletion. |
5 | Alphabetical order | After worker nodes are prioritized based on the factors listed above, they are deleted in alphabetical order. Note that, based on worker node ID conventions, IDs for workers on classic and VPC clusters correlate with age, so older worker nodes are removed first. |
Updating cluster components
Your Red Hat OpenShift on IBM Cloud cluster comes with components, such as Ingress, that are installed automatically when you provision the cluster. By default, these components are updated automatically by IBM. However, you can disable automatic updates for some components and manually update them separately from the master and worker nodes.
- What default components can I update separately from the cluster?
- You can optionally disable automatic updates for the following components:
- Are there components that I can't update separately from the cluster?
- Yes. Your cluster is deployed with the following managed components and associated resources that can't be changed, except to scale pods or edit configmaps for certain performance benefits. If you try to change one of these deployment components, their original settings are restored on a regular interval when they are updated with the cluster master. However, note that resources that you create that are associated with these components, such as Calico network policies that you create to be implemented by the Calico deployment components, are not updated.
calico
componentscoredns
componentsibm-cloud-provider-ip
ibm-file-plugin
ibm-keepalived-watcher
ibm-master-proxy
ibm-storage-watcher
kubernetes-dashboard
componentsmetrics-server
olm-operator
andcatalog
components (1.16 and later)vpn
- Can I install other plug-ins or add-ons than the default components?
- Yes. Red Hat OpenShift on IBM Cloud provides other plugin-ins and add-ons that you can choose from to add capabilities to your cluster. For example, you might want to use Helm charts to install strongSwan VPN. Or you might want to enable IBM-managed add-ons in your cluster, such as the Diagnostics and Debug Tool. You must update these Helm charts and add-ons separately by following the instructions in the Helm chart readme files or by following the steps to update managed add-ons.
Managing automatic updates for Fluentd
When you create a logging configuration for a source in your cluster to forward to an external server, a Fluentd component is created in your cluster. To change your logging or filter configurations, the Fluentd component must be at the latest version. By default, automatic updates to the component are enabled.
You can manage automatic updates of the Fluentd component in the following ways. Note: To run the following commands, you must have the Administrator IBM Cloud IAM platform access role for the cluster.
- Check whether automatic updates are enabled by running the
ibmcloud oc logging autoupdate get --cluster CLUSTER
command. - Disable automatic updates by running the
ibmcloud oc logging autoupdate disable
command. - If automatic updates are disabled, but you need to change your configuration, you have two options:
-
Turn on automatic updates for your Fluentd pods.
ibmcloud oc logging autoupdate enable --cluster CLUSTER
-
Force a one-time update when you use a logging command that includes the
--force-update
option. Note: Your pods update to the latest version of the Fluentd component, but Fluentd does not update automatically going forward. Example commandibmcloud oc logging config update --cluster CLUSTER --id LOG-CONFIG-ID --type LOG-TYPE --force-update
-
Managing automatic updates for Ingress ALBs
Control when the Ingress application load balancer (ALB) component is updated. For information about keeping ALBs up-to-date, see Managing the Ingress ALB lifecycle.
Updating managed add-ons
Managed IBM Cloud Kubernetes Service cluster add-ons are an easy way to enhance your cluster with open-source capabilities, such as Istio. The version of the open-source tool that you add to your cluster is tested by IBM and approved for use in IBM Cloud Kubernetes Service. To update managed add-ons that you enabled in your cluster to the latest versions, see Updating managed add-ons.