Classic clusters: Why does source IP preservation fail when using tainted nodes?
Classic infrastructure
In a classic cluster, you enabled source IP preservation for a version 1.0 load balancer service by changing externalTrafficPolicy
to Local
in
the service's configuration file.
However, no traffic reaches the back-end service for your app.
When you enable source IP preservation for load balancer services, the source IP address of the client request is preserved.
The service forwards traffic to app pods on the same worker node only to ensure that the request packet's IP address isn't changed. Typically, load balancer service pods are deployed to the same worker nodes that the app pods are deployed to. However, some situations exist where the service pods and app pods might not be scheduled onto the same worker node. If you use Kubernetes taints on worker nodes, any pods that don't have a taint toleration are prevented from running on the tainted worker nodes. Source IP preservation might not be working based on the type of taint you used:
-
Edge node taints: You added the
dedicated=edge
label to two or more worker nodes on each public VLAN in your cluster to ensure that load balancer pods deploy to those worker nodes only. Then, you also tainted those edge nodes to prevent any other workloads from running on edge nodes. However, you didn't add an edge node affinity rule and toleration to your app deployment. Your app pods can't be scheduled on the same tainted nodes as the service pods, and no traffic reaches the back-end service for your app. -
Custom taints: You used custom taints on several nodes so that only app pods with that taint toleration can deploy to those nodes. You added affinity rules and tolerations to the deployments of your app and load balancer service so that their pods deploy to only those nodes. However,
ibm-cloud-provider-ip
keepalived
pods that are automatically created in theibm-system
namespace ensure that the load balancer and the app pods are always scheduled onto the same worker node. Thesekeepalived
pods don't have the tolerations for the custom taints that you used. They can't be scheduled on the same tainted nodes that your app pods are running on, and no traffic reaches the back-end service for your app.
Resolve the issue by choosing one of the following options:
Edge node taints: To ensure that your load balancer and app pods deploy to tainted edge nodes, add edge node affinity rules and tolerations to your app deployment. Load balancer pods have these affinity rules and tolerations by default.
Custom taints: Remove custom taints that the keepalived
pods don't have tolerations for. Instead, you can label worker nodes as edge nodes, and then taint those edge nodes.
If you complete one of the previous options but the keepalived
pods are still not scheduled, you can get more information about the keepalived
pods:
- Get the
keepalived
pods.kubectl get pods -n ibm-system
- In the output, look for
ibm-cloud-provider-ip
pods that have a Status ofPending
. Example:ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-7zv9t 0/1 Pending 0 2m <none> <none> ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-8ptvg 0/1 Pending 0 2m <none> <none>
- Describe each
keepalived
pod and look for the Events section. Address any error or warning messages that are listed.kubectl describe pod ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-7zv9t -n ibm-system