- 1. 概述
- 2. 定向调度
- 2.1 概述
- 2.2 nodeName
- 2.3 nodeSelector
- 3. 亲和性调度(Affinity)
- 3.1 概述
- 3.2 nodeAffinity
- 3.2.1 requiredDuringSchedulingIgnoredDuringExecution
- 3.2.2 preferredDuringSchedulingIgnoredDuringExecution
- 3.3 podAffinity
- 3.3.1 requiredDuringSchedulingIgnoredDuringExecution
- 3.4 podAntiAffinity
- 3.4.1 requiredDuringSchedulingIgnoredDuringExecution
- 4. 污点和容忍
- 4.1 污点(Taints)
- 4.1.1 污点语法
- 4.2 容忍(Toleration)
- 4.2.1 tolerations实操
一个Pod在哪个Node节点上运行,默认是由Scheduler组件采用相应的算法计算出来的
同时kubernetes提供了三大类人工控制的调度方式:
- 定向调度:NodeName、NodeSelector
- 亲和性调度:NodeAffinity、PodAffinity、PodAntiAffinity
- 污点(容忍)调度:Taints、Toleration
利用在Pod上声明nodeName或nodeSelector,将Pod调度到指定的Node节点上。如果目标Node不存在,则Pod运行失败
2.2 nodeName将Pod调度到指定的name的Node节点上
新建pod-schedule.yaml,内容如下:
[root@k8s-master ~]# cat pod-schedule.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
nodeName: k8s-node1
[root@k8s-master ~]#
2.3 nodeSelector
通过kubernetes的label-selector机制,将Pod调度到指定标签的Node节点上
首先给k8s-node1节点添加标签,然后查看该节点标签
[root@k8s-master ~]# kubectl label node k8s-node1 nodeEnv=production
node/k8s-node1 labeled
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get node k8s-node1 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-node1 Ready 5d8h v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux,nodeEnv=production
[root@k8s-master ~]#
删除标签使用kubectl label node k8s-node1 nodeEnv-
命令
新建pod-schedule.yaml,内容如下:
[root@k8s-master ~]# cat pod-schedule.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
nodeSelector:
nodeEnv: production
[root@k8s-master ~]#
3. 亲和性调度(Affinity)
3.1 概述
优先选择满足条件的Node进行调度,如果没有,也可以调度到不满足条件的节点上
主要分为三类:
- nodeAffinity(node亲和性):Pod优先调度到指定的Node节点
- podAffinity(pod亲和性):Pod优先调度到指定Pod的拓扑域中。适用于两个频繁交互的Pod,减少网络通信
- podAntiAffinity(pod反亲和性):Pod不调度到指定Pod的拓扑域中。适用于同一应用的不同副本,将副本分散到不同的Node节点,提高高可用
nodeAffinity的可选配置项:
pod.spec.affinity.nodeAffinity
requiredDuringSchedulingIgnoredDuringExecution # Node节点必须满足指定的所有规则才可以,相当于硬限制,类似定向调度
nodeSelectorTerms # 节点选择列表
- matchExpressions # 通过节点标签进行节点选择列表
- key # 键
values # 值
operator # 关系符,支持Exists、DoesNotExist、In、NotIn、Gt、Lt
preferredDuringSchedulingIgnoredDuringExecution # 优先调度到满足指定规则的Node,相当于软限制
- weight # 优先的权重,在范围1~100
preference # 一个节点选择器
matchExpressions # 通过节点标签进行节点选择列表
- key # 键
values # 值
operator # 关系符,支持Exists、DoesNotExist、In、NotIn、Gt、Lt
关系符的使用说明:
matchExpressions:
- key: nodeEnv # 匹配存在标签的key为nodeEnv的节点
operator: Exists
- key: nodeEnv # 匹配标签的key为nodeEnv, 且value是"xxx"或"yyy"的节点
operator: In
values: ["xxx", "yyy"]
- key: nodeEnv # 匹配标签的key为nodeEnv, 且value大于"xxx"的节点
operator: Gt
values: "xxx"
nodeAffinity的注意事项:
- 如果同时定义了nodeSelector和nodeAffinity,那么必须两个条件都满足,Pod才能运行在指定的Node上
- 如果nodeAffinity指定了多个nodeSelectorTerms,那么只需要其中一个能够匹配成功即可
- 如果一个nodeSelectorTerms中有多个matchExpressions,则一个节点必须满足所有的才能匹配成功
- 如果一个Pod所在的Node在Pod运行期间其标签发生了改变,不再符合该Pod的nodeAffinity的要求,则系统将忽略此变化
首先给k8s-node1节点添加标签,然后查看该节点标签
[root@k8s-master ~]# kubectl label node k8s-node1 nodeEnv=production
node/k8s-node1 labeled
[root@k8s-master ~]#
新建pod-schedule.yaml,内容如下:
[root@k8s-master ~]# cat pod-schedule.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeEnv
operator: In
values: ["production", "xxx"]
[root@k8s-master ~]#
3.2.2 preferredDuringSchedulingIgnoredDuringExecution
首先给k8s-node1节点添加标签,然后查看该节点标签
[root@k8s-master ~]# kubectl label node k8s-node1 nodeEnv=production
node/k8s-node1 labeled
[root@k8s-master ~]#
新建pod-schedule.yaml,内容如下:
[root@k8s-master ~]# cat pod-schedule.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: nodeEnv
operator: In
values: ["production", "xxx"]
[root@k8s-master ~]#
3.3 podAffinity
以运行的Pod为参照,实现让新创建的Pod和参照的Pod在一个拓扑域
PodAffinity的可选配置项:
pod.spec.affinity.podAffinity
requiredDuringSchedulingIgnoredDuringExecution
- namespaces: ["n1", "n2"] # 指定参照pod的namespace列表。默认和新创建的Pod同一Namespace
topologyKey # 指定调度作用域
labelSelector # 标签选择器
matchExpressions # 通过节点标签进行节点选择列表
- key # 键
values # 值
operator # 关系符,支持Exists、DoesNotExist、In、NotIn
preferredDuringSchedulingIgnoredDuringExecution
- weight # 优先权重,在范围1~100
podAffinityTerm # 选项
namespaces: ["n1", "n2"]
topologyKey
labelSelector
matchExpressions
- key
values
operator
matchLabels
topologyKey用于指定调度的作用域
- 如果指定为kubernetes.io/hostname,那就是以Node节点为区分范围
- 如果指定为beta.kubernetes.io/os,则以Node节点的操作系统类型来区分
运行参照的Pod,如下所示:
[root@k8s-master ~]# cat pod-schedule-target.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule-target
namespace: dev
labels:
podEnv: production
spec:
containers:
- name: nginx-container
image: nginx:latest
nodeName: k8s-node1
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f pod-schedule-target.yaml
pod/pod-schedule-target created
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod pod-schedule-target -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-schedule-target 1/1 Running 0 8m2s 10.244.36.85 k8s-node1
[root@k8s-master ~]#
运行新创建的Pod,如下所示:
[root@k8s-master ~]# cat pod-schedule-new.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule-new
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- namespaces: ["dev"]
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: podEnv
operator: In
values: ["production", "xxx"]
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f pod-schedule-new.yaml
pod/pod-schedule-new created
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod pod-schedule-new -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-schedule-new 1/1 Running 0 77s 10.244.36.86 k8s-node1
[root@k8s-master ~]#
3.4 podAntiAffinity
以运行的Pod为参照,实现让新创建的Pod和参照的Pod不在一个拓扑域。配置方式和podAffinity一样
3.4.1 requiredDuringSchedulingIgnoredDuringExecution运行参照的Pod,如下所示:
[root@k8s-master ~]# cat pod-schedule-anti-target.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule-anti-target
namespace: dev
labels:
podEnv: production
spec:
containers:
- name: nginx-container
image: nginx:latest
nodeName: k8s-node1
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f pod-schedule-anti-target.yaml
pod/pod-schedule-anti-target created
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod pod-schedule-anti-target -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-schedule-anti-target 1/1 Running 0 50s 10.244.36.88 k8s-node1
[root@k8s-master ~]#
运行新创建的Pod,如下所示:
[root@k8s-master ~]# cat pod-schedule-new.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule-new
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- namespaces: ["dev"]
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: podEnv
operator: In
values: ["production", "xxx"]
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f pod-schedule-new.yaml
pod/pod-schedule-new created
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod pod-schedule-new -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-schedule-new 1/1 Running 0 25s 10.244.169.162 k8s-node2
[root@k8s-master ~]#
4. 污点和容忍
4.1 污点(Taints)
前面的调度方式都是站在Pod的角度上,通过在Pod上添加属性,来确定Pod是否要调度到指定的Node上。其实我们也可以站在Node的角度上,通过在Node上添加污点属性,让Node和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,甚至可以将已经存在的Pod驱逐出去
污点的格式为:key=value:effect,key和value是污点的标签。effect描述污点的作用,支持如下三个选项:
- PreferNoSchedule:kubernetes将尽量避免把Pod调度到具有该污点的Node上,除非没有其他节点可以调度
- NoSchedule:kubernetes将不会把Pod调度到具有该污点的Node上,但是不会影响当前Node上已经存在的Pod
- NoExecute:kubernetes将不会把Pod调度到具有该污点的Node上,同时也会将Node上已经存在的Pod驱逐出去
Kubernetes默认会给master节点打上如下两个污点
Taints: node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
4.1.1 污点语法
设置污点
[root@k8s-master ~]# kubectl taint node k8s-node1 nodeEnv=production:NoSchedule
node/k8s-node1 tainted
[root@k8s-master ~]#
可以通过kubectl describe node k8s-node1
命令查看是否已经打上了污点
去除污点:
[root@k8s-master ~]# kubectl taint node k8s-node1 nodeEnv:NoSchedule-
node/k8s-node1 untainted
[root@k8s-master ~]#
去除所有污点:
[root@k8s-master ~]# kubectl taint node k8s-node1 nodeEnv-
node/k8s-node1 untainted
[root@k8s-master ~]#
4.2 容忍(Toleration)
容忍可以让一个Pod调度到一个有污点的Node上去
通过kubectl explain pod.spec.tolerations
命令查看容忍的属性,整理如下:
pod.spec.tolerations
- key # 对应着要容忍的污点的键,空意味着匹配所有的键
value # 对应着要容忍的污点的值
operator # 匹配key-value的运算符,支持Equal和Exists(默认,只需要对应key,不需要和value对应)
effect # 对应污点的effect,这里必须和标记的污点规则相同。空意味着匹配所有影响
tolerationSeconds # 容忍时间, 当effect为NoExecute时生效。表示pod在Node上的停留时间
注意如下:
- 当operator为Equal的时候,如果Node节点有多个Taint,那么Pod每个Taint都需要容忍才能部署上去
- 当operator为Exists的时候,有如下三种写法:
- 容忍指定的污点,污点带有指定的effect
tolerations:
- key: "xxx"
operator: Exists
effect: NoExecute
- 容忍指定的污点,不考虑具体的effect
tolerations:
- key: "xxx"
operator: Exists
- 容忍一切污点(慎用)
tolerations:
- operator: Exists
4.2.1 tolerations实操
首先给k8s-node1节点添加污点
[root@k8s-master ~]# kubectl taint node k8s-node1 nodeEnv=production:NoSchedule
node/k8s-node1 tainted
[root@k8s-master ~]#
新建pod-schedule.yaml,内容如下。然后进行pod创建
[root@k8s-master ~]# cat pod-schedule.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-schedule
namespace: dev
labels:
user: bulut
spec:
containers:
- name: nginx-container
image: nginx:latest
tolerations:
- key: "nodeEnv"
operator: Equal
value: "production"
effect: NoSchedule
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f pod-schedule.yaml
pod/pod-schedule created
[root@k8s-master ~]#
查看pod是可以调度到k8s-node1节点上的
[root@k8s-master ~]# kubectl get pod -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-schedule 1/1 Running 0 31s 10.244.36.90 k8s-node1
[root@k8s-master ~]#