/Pulling Training image in cpu node to be used for image streaming
In Kube-Client Repo run this command kubectl apply -f image-prepull-job.yaml with the necessary configs of node selector in the yaml file to run an image pre-pull job on a cpu node. This will cache images on the cpu node that can be used by image streaming to quickly deploy the image on a new gpu node
This removes the need to keep a gpu node scaled up constantly, optimizing cluster costs
/Creating a daemonset that pre-pulls the training image in the cpu node when a cpu node is created
In kube-client repo
kubectl apply -f image-prepull-daemonset.yml
the above command will create a daemonset on the cluster that will pull the training image on a new node created in the basic pool (cpu node pools)
this process automates the image pulling on cpu nodes
check existing daemonsets kubectl get daemonsets
The prepull job will pull the training image and the pods sleeps for 26 mins
Since the pod is ran through a daemonset it restarts after finishing, this is safe and does not consume unnecessary resources since after the first time all pod retries will use cached image and go to sleep after container initialization consuming no resources while sleeping
/Adding anti afiniti rules to pods to make one to one correspondence of pods to nodes:
we can create afiniti/anti-affiniti rules to specify how pods select nodes to run
in our case we specified that only a single custom pod could run on a gpu node by checking if a job with same label is already runing on the node through an anti-afiniti rule