Debugging the metrics-server
Virtual Private Cloud Classic infrastructure
The following symptoms might indicate a need to adjust the metrics-server
resources:
-
The
metrics-server
is restarting frequently. -
Deleting a namespace results in the namespace being stuck in a
Terminating
state andkubectl describe namespace
includes a condition reporting a metrics API discovery error. -
kubectl top pods
,kubectl top nodes
, otherkubectl
commands, or applications that use the Kubernetes API to log Kubernetes errors such as:The server is currently unable to handle the request (get pods.metrics.k8s.io)
Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
-
HorizontalPodAutoscalers (HPAs) do not scale deployments.
-
Running
kubectl get apiservices v1beta1.metrics.k8s.io
results in a status such as:NAME SERVICE AVAILABLE AGE v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 139d
Your cluster has a metrics service provided by the metrics-server
deployment in the kube-system
namespace. The metrics-server
resource requests and limits are based on the number of nodes in the cluster and
are optimized for clusters with 30 or less pods per worker node. If the memory requests are too low, it can fail with out-of-memory errors and can respond very slowly. If the CPU requests are too low, it can possibly fail liveness and readiness
probes due to CPU throttling.
Problems with the metrics-server
can also be cause problems in other areas. The metrics APIs is not available if the control plane is not able to communicate with the metrics-server by using Konnectivity. Admission control webhooks
can prevent the control plane from creating pods, including the metrics-server
pod.
Follow these steps to troubleshoot.
-
Verify that metrics-server pods exist.
kubectl get pod -n kube-system -l k8s-app=metrics-server
If no pods are listed, there is likely a problem with an
admission-control
webhook. See Why do cluster operations fail due to a broken webhook?. -
Verify that the apiserver can connect to the
metrics-server
.kubectl logs POD -n kube-system -c metrics-server --tail 5
Replace
POD
with the pod name shown earlier. The content of any logs returned does not matter.If you get an error message that contains text such as
<workerIP>:10250: getsockopt: connection timed out
, seekubectl
commands time out. -
If the previous steps do not show a problem, adjust the resources for the
metrics-server
. See Adjusting cluster metrics provider resources.