Get started with Taints and Tolerations in Kubernetes

Photo by Growtika on Unsplash

Get started with Taints and Tolerations in Kubernetes

In Kubernetes, taints and tolerations are two distinct concepts used together to control how pods are scheduled on nodes. They help ensure that pods are only scheduled on appropriate nodes.

Taints (Marking Nodes)

A taint is applied to a node to mark it as repelling pods, unless those pods explicitly tolerate the taint. Taints have three properties:

  • Key: The name of the taint.

  • Value: The value associated with the taint, providing additional specificity.

  • Effect: Describes what happens to pods that do not tolerate the taint. The effects can be:

    • NoSchedule: Pods that do not tolerate this taint are not scheduled on the node. Pods currently running on the node are not evicted.

    • PreferNoSchedule: Kubernetes will try to avoid placing a pod on the node but it's not guaranteed.

    • NoExecute: Pods that do not tolerate this taint are evicted if they are already running on the node, and new pods will not be scheduled on the node.

Tolerations (Specifying Pod Compatibility)

A toleration is applied to a pod and allows (but does not require) the pod to be scheduled on a node with matching taints. Tolerations specify which taints they can accept. They consist of:

  • Key

  • Value

  • Effect

  • Operator: Determines how to match the taint's key and value. Commonly used operators include Equal (default) and Exists.

  • TolerationSeconds: Only relevant with the NoExecute effect, this specifies how long a pod should stay on the node after the taint appears.

A toleration "matches" a taint if the keys are the same and the effects are the same, and:

  • the operator is Exists (in which case no value should be specified), or

  • the operator is Equal and the values should be equal.

The default value for operator is Equal.

Differences

  • Where they apply: Taints are applied to nodes, while tolerations are applied to pods.

  • Purpose: Taints make a node repulsive to certain pods, while tolerations make a pod capable of resisting certain taints, thereby allowing it to be scheduled on or remain on a node with those taints.

How They Are Used Together

The combination of taints and tolerations works as a mechanism to ensure that pods are scheduled on suitable nodes. For example, you might have a set of nodes with special hardware like GPUs. You can taint these nodes to repel ordinary workloads and add tolerations to GPU-enabled pods so they can still be scheduled there.

Image credits: kubecost

Another example would be to use taints and tolerations to utilize spot nodes in AWS. If a workload is qualified to run on a spot node, the ideal way to configure Kubernetes to utilize these spot nodes is to add taints and tolerations so these workloads will tolerate the tainted spot instances.

These mechanisms can be particularly useful in multi-tenant environments, where certain nodes may be reserved for specific teams or types of workloads, helping in maintaining the desired level of security and performance isolation.

Example of Tainting a Node

First, you can taint a node using the kubectl command line tool. This command taints a node to prevent any pods from being scheduled on it unless they explicitly tolerate the taint:

kubectl taint nodes node1 key1=value1:NoSchedule

This command applies a taint with:

  • key = key1

  • value = value1

  • effect = NoSchedule

Example Pod Specification with Tolerations

Below is a YAML example of a pod specification that includes a toleration. This pod will tolerate the taint we added to node1:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: mycontainer
    image: nginx
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"

This configuration means the pod mypod can be scheduled on the node node1 despite its taint.

More Complex Taint and Toleration Scenario

If a node has a taint that should evict existing pods and prevent new ones from being scheduled unless they tolerate the taint, you can do the following:

Tainting the Node

kubectl taint nodes node2 key2=value2:NoExecute

This applies a taint that has the NoExecute effect, which will evict pods that do not tolerate this taint.

Pod with a Toleration Including TolerationSeconds

apiVersion: v1
kind: Pod
metadata:
  name: mypod2
spec:
  containers:
  - name: mycontainer2
    image: nginx
  tolerations:
  - key: "key2"
    operator: "Equal"
    value: "value2"
    effect: "NoExecute"
    tolerationSeconds: 3600

This pod tolerates the NoExecute taint on node2 and will not be evicted for at least 3600 seconds (1 hour) if the taint is applied after the pod is already running on the node. This gives time to handle the pod appropriately, such as gracefully shutting down or relocating its workload.

Best practices

When using taints and tolerations in Kubernetes, it’s important to follow best practices to ensure efficient, secure, and reliable cluster operations. Here are some key best practices:

1. Use Specific Taints for Specialized Hardware

Taints are highly effective for nodes with specialized hardware such as GPUs or high-performance SSDs. Use taints to ensure that only workloads requiring such resources are scheduled on those nodes. This optimizes resource utilization and prevents general workloads from consuming resources that are expensive or scarce.

2. Minimize Use of NoExecute for Stability

The NoExecute taint effect can cause pods to be evicted from a node, which might lead to service disruption. Use this effect sparingly and mainly when it’s crucial to remove pods from a node due to compliance, security, or hardware decommissioning reasons. Always define tolerationSeconds to manage the eviction timing gracefully.

3. Balance Tolerance and Restriction

While taints can restrict pods from scheduling on certain nodes, excessive use of taints can lead to inefficient scheduling and resource fragmentation. Use taints judiciously and ensure there are enough tolerations in place to maintain a balanced and efficient workload distribution.

4. Integrate with Affinity/Anti-affinity

Taints and tolerations should be used in conjunction with pod affinity and anti-affinity specifications to fine-tune pod placement. For instance, while taints can prevent certain pods from running on a node, affinity rules can be used to co-locate pods that benefit from running in proximity to each other (e.g., for performance reasons).

5. Document and Manage Taints and Tolerations

As the number of taints and tolerations grows, it’s crucial to document why and where they are used. This avoids configuration errors and helps new team members understand the setup. Consider using infrastructure as code (IaC) tools to manage and version control taint and toleration configurations.

6. Security and Multi-tenancy

In multi-tenant clusters, use taints to isolate nodes dedicated to different tenants, ensuring that workloads from one tenant cannot be accidentally or maliciously scheduled on another’s nodes. This is particularly useful for enhancing security and compliance in shared environments.

7. Testing and Validation

Regularly test your taints and tolerations setup in a staging environment to ensure that they behave as expected, especially after modifications or updates to your cluster configuration. This can help avoid unexpected scheduling behavior in production.

8. Monitoring and Alerts

Set up monitoring and alerts for the scheduling process. If pods are unschedulable due to taints or if certain nodes are underutilized, you should have visibility into these events to make timely adjustments.

References:

  1. Kubernetes Documentation: Taints and Tolerations

  2. Kubecost blog: Kubernetes Taints & Tolerations: Tutorial With Examples