Monitoring cluster health
For cluster metrics and app monitoring, Red Hat® OpenShift® on IBM Cloud® clusters include built-in tools to help you manage the health of your single cluster instance. You can also set up IBM Cloud tools for multi-cluster analysis or other use cases, such as IBM Cloud Kubernetes Service cluster add-ons: IBM Log Analysis and IBM Cloud Monitoring.
Understanding options for monitoring
To help understand when to use the built-in Red Hat OpenShift tools or IBM Cloud integrations, review the following information.
Monitoring limitation in private-only clusters with RHCOS worker nodes: The monitoring agent relies on kernel headers in the operating system, however RHCOS doesn't have kernel headers. In this scenario, the agent reaches back
to sysdig.com
to use the pre-compiled agent. In clusters with no public network access this process fails. To allow monitoring on RHCOS clusters, you must either allow outbound traffic or see the Sysdig documentation for installing the agent on air-gapped environments.
IBM Cloud Monitoring
Review the following details about IBM Cloud Monitoring.
- Customizable user interface for a unified look at your cluster metrics, container security, resource usage, alerts, and custom events.
- Quick integration with the cluster via a script.
- Aggregated metrics and container monitoring across clusters and cloud providers.
- Historical access to metrics that is based on the timeline and plan, and ability to capture and download trace files.
- Highly available, scalable, and compliant with industry security standards.
- Integrated with IBM Cloud IAM for user access management.
Built-in Red Hat OpenShift monitoring tools
OpenShift includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components on a per-cluster basis. This monitoring includes built-in Prometheus and Grafana deployments in the
openshift-monitoring
project for cluster metrics, which is available in a single zone only. You can view and manage your monitoring dashboards, metrics, and alerts from the Red Hat OpenShift web console. For more information,
see Monitoring in the Red Hat OpenShift documentation.
By default, the monitoring stack does not use persistent storage to back up metric history, and instead uses a temporary EmptyDir
volume in the host filesystem. The retention period for metrics history ranges from 11 to 15 days,
depending on your cluster version. For some workloads, these settings might use a significant amount of disk space and memory, or might not meet requirements for metrics retention. You can configure the monitoring stack to use persistent
storage, change the metrics retention policies, or run Prometheus on dedicated nodes. For more information, see Configuring the monitoring stack.
Note that Red Hat OpenShift on IBM Cloud version 4.16 sets a default 10 GB size retention.
Monitoring Red Hat® OpenShift® on IBM Cloud® storage metrics
Red Hat® OpenShift® on IBM Cloud® clusters include built-in tools to help cluster administrators get information about the availability and capacity of storage volumes.
If you are unable to view storage metrics in the Red Hat OpenShift monitoring dashboard, see Debugging Block Storage for VPC metrics.
The following metrics can be monitored for Red Hat® OpenShift® on IBM Cloud® clusters.
kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_free
kubelet_volume_stats_inodes_used
Want to set up storage monitoring alerts for platforms such as email or Slack? See Sending notifications to external systems in the Red Hat OpenShift documentation.
Before monitoring metrics for Block Storage for VPC, you must have a cluster with the Block Storage for VPC cluster add-on enabled and you must have a Block Storage for VPC volume attached to a worker node. Red Hat® OpenShift® on IBM Cloud® Storage Metrics are populated only for mounted storage volumes.
-
Navigate to the Red Hat OpenShift web console and select Monitoring and then Metrics.
-
Input the metric you want to monitor in the dialog box and select Run queries.
kubelet_volume_stats_used_bytes{persistentvolumeclaim="NAME OF PVC"} / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="NAME OF PVC"}
Example output
endpoint instance job metrics_path namespace node persistentvolumeclaim prometheus service value https-metrics 11.111.1.1:XX kubelet /metrics default 11.111.1.1 PVC-NAME openshift-monitoring/k8s kubelet 0.003596851526321722
For more information, see Monitoring.
If your volume is reaching capacity, try setting up volume expansion.
Forwarding cluster and app metrics to IBM Cloud Monitoring
The observability CLI plug-in ibmcloud ob
and the v2/observe
endpoints are deprecated and support ends on 28 March 2025. You can now manage your logging and monitoring integrations from the console or through the Helm
charts. For the latest steps, see Working with the Kubernetes agent or Working with the Red Hat OpenShift agent.
Enabling remote health reporting
Telemetry is a remote health monitoring feature that collects aggregated data about your cluster, such as the health of your components and the number and types of resources in use. If you have a public cluster, you can elect to have your own Telemetry data visible in your account for your use. For more information, see Telemetry for remote health monitoring.