Why do I see DNS failures after adding a custom DNS resolver?

Virtual Private Cloud 1.30 and later

You see DNS failures after creating a custom DNS resolver in your VPC where a 1.30 cluster already exists.

In each of the following cases, syncing the kube-<clusterID> security group and replace the worker nodes as described in the following steps. Note that issue might not be seen immediately after the resolver is created and enabled in the VPC. DNS includes a cache which might resolve name lookup until a worker is replaced or restarted, or pods are restarted.

Adding or replacing a worker node to the cluster fails and ibmcloud ks workers shows a worker status similar to the following
```
Infrastructure instance status is 'failed': Can't start instance because provisioning failed.
```

Running oc get oc you see an error similar to the following.

dial tcp: lookup s3.direct.eu-de.cloud-object-storage.appdomain.cloud on 172.21.0.10:53: server misbehaving.

Restarting a worker results in and pods on that worker are in Terminating state, the OpenShift web console no longer opens, or Ingress is in Critical state.

If you have a custom DNS resolver enabled in your VPC before creating a 1.30 cluster, then IBM Cloud Kubernetes Service automatically adds rules to allow traffic through the DNS resolver IP addresses to your IBM Cloud managed security group (kube-<clusterID>).

However, If you enable a custom DNS resolver on a VPC that already contains a 1.30 cluster, then those existing clusters lose access to DNS.

To correct this problem, allow access to the DNS resolvers on your existing clusters by syncing the kube-<clusterID> security group. Syncing the kube-<clusterID> security group adds rules that allow traffic through the DNS resolver IP addresses

Find the security group ID of your kube-<clusterID> security group.
```
ibmcloud is security-groups
```

Sync the security group.

ibmcloud ks security-group sync --cluster <clusterID> --security-group <security-group-ID>

Replace the worker nodes in your cluster.

ibmcloud ks worker replace --cluster <cluster_name_or_ID> --worker <worker_node_ID>

If the issue persists, contact support. Open a support case. In the case details, be sure to include any relevant log files, error messages, or command outputs.

Other scenarios

Error conditions that might be due to the combination of a custom DNS resolver in the VPC and 1.30 clusters in the same VPC.

If the cluster is created after the DNS resolver is created

If you create a 1.30 cluster after creating a custom DNS resolver and your worker do not enter a Normal or Active state, then the rules for the DNS custom resolver are not added to the kube-<clusterID> security group before they are needed by the logic that deploys the workers.

List the details of the kube-<clusterID> security group to verify the custom DNS resolver IPs are added as targets in the security group.
```
ibmcloud is sg kube-<clusterID>
```

Replace the worker nodes in your cluster.

ibmcloud ks worker replace --cluster <cluster_name_or_ID> --worker <worker_node_ID>

If the issue persists, contact support. Open a support case. In the case details, be sure to include any relevant log files, error messages, or command outputs.