Debugging webhooks

When running oc commands, you see error messages similar to the following examples.

Error from server (InternalError): error when creating "testjob.yaml": Internal error occurred: failed calling webhook "mywebhook.test.io": Post https://admission-webhook.default.svc:443/validate?timeout=30s: dial tcp 172.21.189.228:443: connect: connection timed out

error creating namespace "test": Internal error occurred: admission plugin "MutatingAdmissionWebhook" failed to complete mutation in 13s

Failing webhooks might also cause problems similar to the following issues.

You can't create or modify pods, secrets, or namespaces.
You can't add worker nodes to a cluster or create a secret that holds the LUKS encryption key.
You can't patch, update, or upgrade and the underlying failure is related to creating resources in the cluster.

A problem in the called service or the secure tunnel can cause requests to fail due to timeouts. You might be unaware that you have admission control webhooks installed until this happens.

Admission control webhooks provide the ability to validate or modify, or mutate, Kubernetes API requests. These webhooks are called from the cluster apiserver or openshift-apiserver and typically call a service running in the cluster. Admission control webhooks have rules defining the kind of resource such as pod, namespace, etc and the operation they are called for such as create, retrieve, update, delete (CRUD).

Webhooks have a failure policy that indicates whether Kubernetes can ignore connection errors when calling the webhook or whether connection error must fail the operation. A ValidatingWebhookConfiguration resource inspects the request while a MutatingWebhookConfiguration resource modifies the request data before it is processed.

Webhooks can also deny requests as part of normal operation: A webhook might deny requests that violate security policies or it might perform other data validation. In such cases the failure information contains a denied the request response with a reason indicating the problem.

admission webhook "mutate.configuration.upsert.appconnect.ibm.com" denied the request: version is not supported

In Red Hat OpenShift on IBM Cloud, webhooks that call services running in the cluster do so by using a secure tunnel that connects the cluster control plane in an IBM Cloud account to cluster worker nodes in your customer account.

Complete the following steps to identify the webhook that is causing the issue. Then debug the related service and remove or recreate your webhook if needed.

Run the following commands to get the VPN pod logs. If you can't get the VPN logs, follow the steps to Debug common CLI issues and return to this page when you are able to retrieve the logs. If the commands succeed and you can get the logs, the VPN tunnel is working and you can continue to the next step.
```
oc get pods -n kube-system -l app=vpn
```
```
oc logs -n kube-system -l app=vpn
```

Describe your admission control webhooks and save the output to a file called webhooks.txt.

kubectl describe mutatingwebhookconfigurations,validatingwebhookconfigurations > webhooks.txt

Review the webhooks.txt file for error messages. Webhook related error messages from an application, including oc, can help to identify the webhook.
Review the apiserver metrics for rejection type, count, and the rejection code. You can get a snapshot of the metrics by using the following command.
```
kubectl get --raw /metrics | grep apiserver_admission_webhook_rejection_count
```
```
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="check-ignore-label.gatekeeper.sh",operation="UPDATE",rejection_code="0",type="validating"} 16
```
A rejection_code value of 0 indicates an error occurred when calling the webhook. A non-zero rejection_code value indicates the webhook rejected the request.

There are 3 apiserver instances. The oc command gets metrics from one of them and reflects the activity there. Each apiserver returns different data. Instances that have not processed the failing requests might not return this metric.
Review the command output from the previous steps and look for webhook descriptions to identify the specific MutatingWebhookConfiguration or ValidatingWebhookConfiguration value. If the errors, logs, or metrics do not help, review the webhook descriptions that you retrieved earlier. Each webhook configuration has a set of rules that specify the kinds of resources and actions the webhook is called for. This information can be used to identify the webhook(s) that might be involved.
- If there is an error calling the webhook, review the documentation for that service for product-specific debugging steps.
- If the webhook is rejecting the requests, look at the policies and configuration options for the webhook. It might be possible to adjust them to allow the request. Or, the request might be violating the policies and the request or the application making the request need to be changed. For more information, see What are the best practices for using webhooks.

Reviewing the service that the webhook is calling

Get the details of the service and its endpoints.
```
kubectl get svc NAME -n NAMESPACE
```
```
kubectl get ep NAME -n NAMESPACE
```
- If the webhook is calling a service that doesn't exist, the webhook might be leftover from an incomplete or improper removal of an application. In this case, look for service-specific documentation and follow the steps to uninstall the service.
- If you are unable to uninstall the service, delete the webhook configuration.
```
kubectl delete validatingwebhookconfiguration NAME
```
```
kubectl delete mutatingwebhookconfiguration NAME
```
If the service exists, but has no endpoints, check the health of the pods. First, get the pod labels from the service.
```
kubectl describe svc NAME -n NAMESPACE
```
Example output
```
Selector:          app=my-webhook
```
List the pods that are using the labels. For example, the label in the following command is app=mywebhook.
```
kubectl get pods -n NAMESPACE -l app=my-webhook
```
Review the command output. If the pods are not healthy check the pod events, logs, worker node health, and other components to troubleshoot. For more information, see Debugging app deployments.

Disabling or removing a webhook

Temporarily ignore connection and timeouts by setting the failure policy to Ignore. Edit the webhook by running the following commands.
```
kubectl edit validatingwebhookconfiguration NAME
```
```
kubectl edit mutatingwebhookconfiguration NAME
```
Search for failurePolicy and change the value to Ignore.
Save the configuration and exit the editor. If adjusting the failure policy doesn't resolve the issue, repeat the previous steps and change the value back to Fail.

Temporarily remove the webhook. Save the existing webhook configuration to a file before deleting it.

kubectl get validatingwebhookconfiguration NAME -o yaml > webhook-config.yaml

kubectl get mutatingwebhookconfiguration NAME -o yaml > webhook-config.yaml

Delete the webhook configuration.

kubectl delete validatingwebhookconfiguration NAME

kubectl delete mutatingwebhookconfiguration NAME

Wait a few minutes, then retry the kubectl commands that were failing to see if the problem is resolved.
Recreate the webhook.
```
kubectl apply -f webhook-config.yaml
```
If the issue persists, contact support. Open a support case. In the case details, be sure to include any relevant log files, error messages, or command outputs.