Envision

Explore

Kubernetes

Kubernetes cluster customizations

/Pulling Training image in cpu node to be used for image streaming

In Kube-Client Repo run this command kubectl apply -f image-prepull-job.yaml with the necessary configs of node selector in the yaml file to run an image pre-pull job on a cpu node. This will cache images on the cpu node that can be used by image streaming to quickly deploy the image on a new gpu node

This removes the need to keep a gpu node scaled up constantly, optimizing cluster costs

/Creating a daemonset that pre-pulls the training image in the cpu node when a cpu node is created

In kube-client repo

kubectl apply -f image-prepull-daemonset.yml

the above command will create a daemonset on the cluster that will pull the training image on a new node created in the basic pool (cpu node pools)

this process automates the image pulling on cpu nodes

check existing daemonsets kubectl get daemonsets

The prepull job will pull the training image and the pods sleeps for 26 mins

Since the pod is ran through a daemonset it restarts after finishing, this is safe and does not consume unnecessary resources since after the first time all pod retries will use cached image and go to sleep after container initialization consuming no resources while sleeping

/Adding anti afiniti rules to pods to make one to one correspondence of pods to nodes:

we can create afiniti/anti-affiniti rules to specify how pods select nodes to run

in our case we specified that only a single custom pod could run on a gpu node by checking if a job with same label is already runing on the node through an anti-afiniti rule

template = k8s_client.V1PodTemplateSpec(

metadata=k8s_client.V1ObjectMeta(labels={"app": "ml"}),

spec=k8s_client.V1PodSpec(

restart_policy="Never",

containers=[container],

node_selector={"cloud.google.com/gke-accelerator": "nvidia-tesla-t4"},

affinity=k8s_client.V1Affinity(

pod_anti_affinity=k8s_client.V1PodAntiAffinity(

required_during_scheduling_ignored_during_execution=[

k8s_client.V1PodAffinityTerm(

label_selector=k8s_client.V1LabelSelector(

match_expressions=[

k8s_client.V1LabelSelectorRequirement(

key="app",

operator="In",

values=["ml"]

)

]

topology_key="kubernetes.io/hostname",

)

]

)

tolerations=[toleration]

)

/Specifying purpose for node pools and disallowing system pods to run on them:

Problem statement: GKE would schedule system pods on gpu nodes. This would cause those nodes to stay up and not scale down, adding in cost. solution:

we added a taint to gpu node pool such that every node has a taint.

this taint would tell GKE to not schedule any system-pods on these gpu nodes solving the issue of nodes not scaling down

we added tolerations againt the node taint to the custom jobs, allowing the specific purposeful jobs to run on the gpu nodes

node pool taint command

gcloud container node-pools create default-pool --cluster=gpu-cluster-auto --zone=us-central1-f --node-taints=temporary=true:NoSchedule --machine-type=n1-standard-4 --accelerator=type=nvidia-tesla-t4,count=1,gpu-driver-version=default --scopes="https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/trace.append" --num-nodes=0 --min-nodes=0 --max-nodes=10

code for adding tolerations to custom jobs

toleration = k8s_client.V1Toleration(

key="temporary",

operator="Equal",

value="true",

effect="NoSchedule"

)