Speed up Kubernetes Pod Creation of Large Images
If you work with large Docker images (5Gbs), then you know that they can slow down pod creation. To speed up this whole process, I’ll show how to pre-pull images to Kubernetes using a DaemonSet, meaning that when a pod is scheduled to run on a node, the relevant Docker image will already be on that node instead of being pulled at the time of deployment. Again, this is useful in cases where the image is big, and pods are recreated often (for example in a scale out scenario).
Of course, everything I share in this post is compatible with Codefresh. If you haven’t already tried Codefresh and created pipeline, you can create a free account.
Docker in Docker
Creating a Docker container that does the docker pull command is pretty easy with ‘Docker in docker’ container (aka ‘dind’) so we could easily create a pod that looks like:
containers: - name: prepull image: docker command: ["docker", "pull", "hello-world"] volumeMounts: - name: docker mountPath: /var/run volumes: - name: docker hostPath: path: /var/run
This pod attaches to the host’s docker daemon and will pull the image ‘hello-world’. That’s fine, but now we need to make sure this pod is being run at least once on every node in the cluster.
DaemonSet
That sounds like a good fit for a DaemonSet. This Kubernetes entity is suitable for infrastructure services that needs to run on all nodes. The usual use cases are log collection and monitoring agents. This sounds prefect for our case, but the issue with DaemonSet, is that it can only have a restartPolicy: Always. This means that Kubernetes will restart the pod infinitely which will constantly attempt to pull the image. It’s not a huge disaster because Docker will just detect that the image exists and will pass quickly, but still it doesn’t feel right.
DaemonSet single-use pattern
To work around this limitation, I did the following trick: I still used the same container to pull the image as explained before, but I designated it as an initContainer instead of regular container. This means it’ll run only once when the pod is scheduled, and after it’s done the regular container in the pod will run. I then used the ‘pause’ container as the pod’s main container, this container practically does nothing, which is exactly what we want to do after we pulled the image on that node – keep the DaemonSet pod alive but take as little resources as possible. I guess you can call this pattern a workaround, but it’s a useful trick for scenarios like this.
Result
The end result is shared here: https://gist.github.com/itaysk/7bc3e56d69c4d72a549286d98fd557dd
apiVersion: apps/v1beta2 kind: DaemonSet metadata: name: prepull spec: selector: matchLabels: name: prepull template: metadata: labels: name: prepull spec: initContainers: - name: prepull image: docker command: ["docker", "pull", "hello-world"] volumeMounts: - name: docker mountPath: /var/run volumes: - name: docker hostPath: path: /var/run containers: - name: pause image: gcr.io/google_containers/pause
You can just ‘kubectl create’ it, and it will make sure your ‘hello-world’ pod is pulled into every node in the cluster.
New to Codefresh? Create Your Free Account Today!