Get started with Taints and Tolerations in Kubernetes
Table of contents
- Taints (Marking Nodes)
- Tolerations (Specifying Pod Compatibility)
- Differences
- How They Are Used Together
- Example of Tainting a Node
- Example Pod Specification with Tolerations
- More Complex Taint and Toleration Scenario
- Best practices
- 1. Use Specific Taints for Specialized Hardware
- 2. Minimize Use of NoExecute for Stability
- 3. Balance Tolerance and Restriction
- 4. Integrate with Affinity/Anti-affinity
- 5. Document and Manage Taints and Tolerations
- 6. Security and Multi-tenancy
- 7. Testing and Validation
- 8. Monitoring and Alerts
- References:
In Kubernetes, taints and tolerations are two distinct concepts used together to control how pods are scheduled on nodes. They help ensure that pods are only scheduled on appropriate nodes.
Taints (Marking Nodes)
A taint is applied to a node to mark it as repelling pods, unless those pods explicitly tolerate the taint. Taints have three properties:
Key: The name of the taint.
Value: The value associated with the taint, providing additional specificity.
Effect: Describes what happens to pods that do not tolerate the taint. The effects can be:
NoSchedule
: Pods that do not tolerate this taint are not scheduled on the node. Pods currently running on the node are not evicted.PreferNoSchedule
: Kubernetes will try to avoid placing a pod on the node but it's not guaranteed.NoExecute
: Pods that do not tolerate this taint are evicted if they are already running on the node, and new pods will not be scheduled on the node.
Tolerations (Specifying Pod Compatibility)
A toleration is applied to a pod and allows (but does not require) the pod to be scheduled on a node with matching taints. Tolerations specify which taints they can accept. They consist of:
Key
Value
Effect
Operator: Determines how to match the taint's key and value. Commonly used operators include
Equal
(default) andExists
.TolerationSeconds: Only relevant with the
NoExecute
effect, this specifies how long a pod should stay on the node after the taint appears.
A toleration "matches" a taint if the keys are the same and the effects are the same, and:
the
operator
isExists
(in which case novalue
should be specified), orthe
operator
isEqual
and the values should be equal.
The default value for operator
is Equal
.
Differences
Where they apply: Taints are applied to nodes, while tolerations are applied to pods.
Purpose: Taints make a node repulsive to certain pods, while tolerations make a pod capable of resisting certain taints, thereby allowing it to be scheduled on or remain on a node with those taints.
How They Are Used Together
The combination of taints and tolerations works as a mechanism to ensure that pods are scheduled on suitable nodes. For example, you might have a set of nodes with special hardware like GPUs. You can taint these nodes to repel ordinary workloads and add tolerations to GPU-enabled pods so they can still be scheduled there.
Image credits: kubecost
Another example would be to use taints and tolerations to utilize spot nodes in AWS. If a workload is qualified to run on a spot node, the ideal way to configure Kubernetes to utilize these spot nodes is to add taints and tolerations so these workloads will tolerate the tainted spot instances.
These mechanisms can be particularly useful in multi-tenant environments, where certain nodes may be reserved for specific teams or types of workloads, helping in maintaining the desired level of security and performance isolation.
Example of Tainting a Node
First, you can taint a node using the kubectl
command line tool. This command taints a node to prevent any pods from being scheduled on it unless they explicitly tolerate the taint:
kubectl taint nodes node1 key1=value1:NoSchedule
This command applies a taint with:
key =
key1
value =
value1
effect =
NoSchedule
Example Pod Specification with Tolerations
Below is a YAML example of a pod specification that includes a toleration. This pod will tolerate the taint we added to node1
:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mycontainer
image: nginx
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
This configuration means the pod mypod
can be scheduled on the node node1
despite its taint.
More Complex Taint and Toleration Scenario
If a node has a taint that should evict existing pods and prevent new ones from being scheduled unless they tolerate the taint, you can do the following:
Tainting the Node
kubectl taint nodes node2 key2=value2:NoExecute
This applies a taint that has the NoExecute
effect, which will evict pods that do not tolerate this taint.
Pod with a Toleration Including TolerationSeconds
apiVersion: v1
kind: Pod
metadata:
name: mypod2
spec:
containers:
- name: mycontainer2
image: nginx
tolerations:
- key: "key2"
operator: "Equal"
value: "value2"
effect: "NoExecute"
tolerationSeconds: 3600
This pod tolerates the NoExecute
taint on node2
and will not be evicted for at least 3600 seconds (1 hour) if the taint is applied after the pod is already running on the node. This gives time to handle the pod appropriately, such as gracefully shutting down or relocating its workload.
Best practices
When using taints and tolerations in Kubernetes, it’s important to follow best practices to ensure efficient, secure, and reliable cluster operations. Here are some key best practices:
1. Use Specific Taints for Specialized Hardware
Taints are highly effective for nodes with specialized hardware such as GPUs or high-performance SSDs. Use taints to ensure that only workloads requiring such resources are scheduled on those nodes. This optimizes resource utilization and prevents general workloads from consuming resources that are expensive or scarce.
2. Minimize Use of NoExecute
for Stability
The NoExecute
taint effect can cause pods to be evicted from a node, which might lead to service disruption. Use this effect sparingly and mainly when it’s crucial to remove pods from a node due to compliance, security, or hardware decommissioning reasons. Always define tolerationSeconds
to manage the eviction timing gracefully.
3. Balance Tolerance and Restriction
While taints can restrict pods from scheduling on certain nodes, excessive use of taints can lead to inefficient scheduling and resource fragmentation. Use taints judiciously and ensure there are enough tolerations in place to maintain a balanced and efficient workload distribution.
4. Integrate with Affinity/Anti-affinity
Taints and tolerations should be used in conjunction with pod affinity and anti-affinity specifications to fine-tune pod placement. For instance, while taints can prevent certain pods from running on a node, affinity rules can be used to co-locate pods that benefit from running in proximity to each other (e.g., for performance reasons).
5. Document and Manage Taints and Tolerations
As the number of taints and tolerations grows, it’s crucial to document why and where they are used. This avoids configuration errors and helps new team members understand the setup. Consider using infrastructure as code (IaC) tools to manage and version control taint and toleration configurations.
6. Security and Multi-tenancy
In multi-tenant clusters, use taints to isolate nodes dedicated to different tenants, ensuring that workloads from one tenant cannot be accidentally or maliciously scheduled on another’s nodes. This is particularly useful for enhancing security and compliance in shared environments.
7. Testing and Validation
Regularly test your taints and tolerations setup in a staging environment to ensure that they behave as expected, especially after modifications or updates to your cluster configuration. This can help avoid unexpected scheduling behavior in production.
8. Monitoring and Alerts
Set up monitoring and alerts for the scheduling process. If pods are unschedulable due to taints or if certain nodes are underutilized, you should have visibility into these events to make timely adjustments.