IBM Cloud Docs
Monitoring cluster health

Monitoring cluster health

For cluster metrics and app monitoring, Red Hat® OpenShift® on IBM Cloud® clusters include built-in tools to help you manage the health of your single cluster instance. You can also set up IBM Cloud tools for multi-cluster analysis or other use cases, such as IBM Cloud Kubernetes Service cluster add-ons: IBM Cloud Logs and IBM Cloud Monitoring.

Understanding options for monitoring

To help understand when to use the built-in Red Hat OpenShift tools or IBM Cloud integrations, review the following information.

Monitoring limitation in private-only clusters with RHCOS worker nodes: The monitoring agent relies on kernel headers in the operating system, however RHCOS doesn't have kernel headers. In this scenario, the agent reaches back to sysdig.com to use the pre-compiled agent. In clusters with no public network access this process fails. To allow monitoring on RHCOS clusters, you must either allow outbound traffic or see the Sysdig documentation for installing the agent on air-gapped environments.

IBM Cloud Monitoring

Review the following details about IBM Cloud Monitoring.

  • Customizable user interface for a unified look at your cluster metrics, container security, resource usage, alerts, and custom events.
  • Quick integration with the cluster via a script.
  • Aggregated metrics and container monitoring across clusters and cloud providers.
  • Historical access to metrics that is based on the timeline and plan, and ability to capture and download trace files.
  • Highly available, scalable, and compliant with industry security standards.
  • Integrated with IBM Cloud IAM for user access management.

Built-in Red Hat OpenShift monitoring tools

OpenShift includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components on a per-cluster basis. This monitoring includes built-in Prometheus and Grafana deployments in the openshift-monitoring project for cluster metrics, which is available in a single zone only. You can view and manage your monitoring dashboards, metrics, and alerts from the Red Hat OpenShift web console. For more information, see Monitoring in the Red Hat OpenShift documentation.

By default, the monitoring stack does not use persistent storage to back up metric history, and instead uses a temporary EmptyDir volume in the host file system. The retention period for metrics history ranges from 11 to 15 days, depending on your cluster version. For some workloads, these settings might use a significant amount of disk space and memory, or might not meet requirements for metrics retention. You can configure the monitoring stack to use persistent storage, change the metrics retention policies, or run Prometheus on dedicated nodes. For more information, see Configuring the monitoring stack.

Note that Red Hat OpenShift on IBM Cloud version 4.16 sets a default 10 GB size retention.

Monitoring Red Hat® OpenShift® on IBM Cloud® storage metrics

Red Hat® OpenShift® on IBM Cloud® clusters include built-in tools to help cluster administrators get information about the availability and capacity of storage volumes.

If you are unable to view storage metrics in the Red Hat OpenShift monitoring dashboard, see Debugging Block Storage for VPC metrics.

The following metrics can be monitored for Red Hat® OpenShift® on IBM Cloud® clusters.

  • kubelet_volume_stats_available_bytes
  • kubelet_volume_stats_capacity_bytes
  • kubelet_volume_stats_inodes
  • kubelet_volume_stats_inodes_free
  • kubelet_volume_stats_inodes_used

Want to set up storage monitoring alerts for platforms such as email or Slack? See Sending notifications to external systems in the Red Hat OpenShift documentation.

Before monitoring metrics for Block Storage for VPC, you must have a cluster with the Block Storage for VPC cluster add-on enabled and you must have a Block Storage for VPC volume attached to a worker node. Red Hat® OpenShift® on IBM Cloud® Storage Metrics are populated only for mounted storage volumes.

  1. Navigate to the Red Hat OpenShift web console and select Monitoring and then Metrics.

  2. Input the metric you want to monitor in the dialog box and select Run queries.

    kubelet_volume_stats_used_bytes{persistentvolumeclaim="NAME OF PVC"} / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="NAME OF PVC"}
    

    Example output

    endpoint       instance      job     metrics_path  namespace  node         persistentvolumeclaim  prometheus               service  value 
    https-metrics  11.111.1.1:XX kubelet /metrics      default    11.111.1.1   PVC-NAME               openshift-monitoring/k8s kubelet  0.003596851526321722
    

For more information, see Monitoring.

If your volume is reaching capacity, try setting up volume expansion.

Migrating logging and monitoring agents to Cloud Logs

The observability CLI plug-in ibmcloud ob and the v2/observe endpoints are deprecated and support ends on 28 March 2025. There is no direct replacement, but you can now manage your logging and monitoring integrations from the console or through the Helm charts. For the latest steps, Managing the Logging agent for Red Hat OpenShift on IBM Cloud clusters and Working with the Red Hat OpenShift monitoring agent.

What happens after 28 March 2025?
You can no longer use the ob plugin, Terraform, or API to install observability agents on a cluster or to modify your existing configuration. Sysdig agents continue to send metrics to the specified IBM Cloud Monitoring instance. LogDNA agents can no longer send logs since IBM Cloud Log Analysis is replaced by IBM Cloud Logs.
What needs to be done before 28 March 2025?
If you are still using Activity Tracker, migrate to Cloud Logs.
If you use observability (ob) plugin to install LogDNA or Sysdig agents on your cluster, uninstall the agents and reinstall using the Container dashboard, using Terraform, or manually

Reviewing your observability agents

The observability plugin installs Sysdig and LogDNA agents in the ibm-observe namespace.

Before March 28, 2025:

  1. If needed, install the ob plugin and list your logging configs.
  1. List your logging configs.
    ibmcloud ob logging config list --cluster CLUSTER
    
  2. List your monitoring configs.
    ibmcloud ob logging config list --cluster CLUSTER
    

If there is no logging or monitoring config, then any observability agents in the cluster were not installed with the IKS observability (ob) plugin.

After March 28, 2025:

  1. Access your Red Hat OpenShift cluster.
  1. Review the configmaps in the ibm-observe namespace.
    kubectl get cm -n ibm-observe
    
    Example output
    
    NAME                                   DATA   AGE
    
    e405f1fc-feba-4350-9337-e7e249af871c   6      25m
    
    f59851a6-ede6-4719-afa0-eee7ce65eeb5   6      20m
    
  1. Observability agents installed by the observability plug-in use a configmap with the GUID of the IBM Cloud Monitoring instance or the IBM Cloud Log Analysis instance that logs or metrics are being sent to. If your cluster has agents in a namespace other than ibm-observe or the configmaps in ibm-observe are not named with the instance GUIDs, then these agents were not installed with the IKS observability (ob) plugin.

Removing the observability plug-in agents

  • Before 28 March 2025, you can still use the ob plug-in to delete your observability configs.

    ibmcloud ob logging config delete --cluster <cluster> --instance <logging instance guid>
    
    ibmcloud ob monitoring config delete --cluster <cluster> --instance <monitoring instance guid>
    
  • After March 28, 2025, when support for the ob plugin ends, you must delete each component individually.

    1. Clean up the daemonsets and configmaps.
      kubectl delete daemonset logdna-agent -n ibm-observe
      kubectl delete daemonset sysdig-agent -n ibm-observe
      kubectl delete configmap <logdna-configmap> -n ibm-observe
      kubectl delete configmap <sysdig-configmap> -n ibm-observe
      
    2. Optional: Delete the namespace. After no other resources are running in the namespace.
      kubectl delete namespace ibm-observe
      

After removing the plug-in has been removed, reinstall Logging and Monitoring agents in your cluster using the Cluster dashboard, Terraform, or manually.

For more information, see the following links:

Enabling remote health reporting

Telemetry is a remote health monitoring feature that collects aggregated data about your cluster, such as the health of your components and the number and types of resources in use. If you have a public cluster, you can elect to have your own Telemetry data visible in your account for your use. For more information, see Telemetry for remote health monitoring.