Why do I see DNS failures after adding a custom DNS resolver?
Virtual Private Cloud 1.30 and later
You see DNS failures after creating a custom DNS resolver in your VPC where a 1.30 cluster already exists.
In each of the following cases, syncing the kube-<clusterID>
security group and replace the worker nodes as described in the following steps. Note that issue might not be seen immediately after the resolver is created and enabled
in the VPC. DNS includes a cache which might resolve name lookup until a worker is replaced or restarted, or pods are restarted.
- Adding or replacing a worker node to the cluster fails and
ibmcloud ks workers
shows a worker status similar to the followingInfrastructure instance status is 'failed': Can't start instance because provisioning failed.
- Running
oc get oc
you see an error similar to the following.dial tcp: lookup s3.direct.eu-de.cloud-object-storage.appdomain.cloud on 172.21.0.10:53: server misbehaving.
- Restarting a worker results in and pods on that worker are in
Terminating
state, the OpenShift web console no longer opens, or Ingress is inCritical
state.
If you have a custom DNS resolver enabled in your VPC before creating a 1.30 cluster, then IBM Cloud Kubernetes Service automatically adds rules to allow traffic through the DNS resolver IP addresses to your IBM Cloud managed security group (kube-<clusterID>
).
However, If you enable a custom DNS resolver on a VPC that already contains a 1.30 cluster, then those existing clusters lose access to DNS.
To correct this problem, allow access to the DNS resolvers on your existing clusters by syncing the kube-<clusterID>
security group. Syncing the kube-<clusterID>
security group adds rules that allow traffic
through the DNS resolver IP addresses
-
Find the security group ID of your
kube-<clusterID>
security group.ibmcloud is security-groups
-
Sync the security group.
ibmcloud ks security-group sync --cluster <clusterID> --security-group <security-group-ID>
-
Replace the worker nodes in your cluster.
ibmcloud ks worker replace --cluster <cluster_name_or_ID> --worker <worker_node_ID>
-
If the issue persists, contact support. Open a support case. In the case details, be sure to include any relevant log files, error messages, or command outputs.
Other scenarios
Error conditions that might be due to the combination of a custom DNS resolver in the VPC and 1.30 clusters in the same VPC.
If the cluster is created after the DNS resolver is created
If you create a 1.30 cluster after creating a custom DNS resolver and your worker do not enter a Normal
or Active
state, then the rules for the DNS custom resolver are not added to the kube-<clusterID>
security group before they are needed by the logic that deploys the workers.
- List the details of the
kube-<clusterID>
security group to verify the custom DNS resolver IPs are added as targets in the security group.ibmcloud is sg kube-<clusterID>
- Replace the worker nodes in your cluster.
ibmcloud ks worker replace --cluster <cluster_name_or_ID> --worker <worker_node_ID>
- If the issue persists, contact support. Open a support case. In the case details, be sure to include any relevant log files, error messages, or command outputs.