The Single-use Daemonset Pattern and Pre-pulling Images in Kubernetes

Speed up Kubernetes Pod Creation of Large Images

If you work with large Docker images (5Gbs), then you know that they can slow down pod creation. To speed up this whole process, I’ll show how to pre-pull images to Kubernetes using a DaemonSet, meaning that when a pod is scheduled to run on a node, the relevant Docker image will already be on that node instead of being pulled at the time of deployment. Again, this is useful in cases where the image is big, and pods are recreated often (for example in a scale out scenario).

Of course, everything I share in this post is compatible with Codefresh. If you haven’t already tried Codefresh and created pipeline, you can create a free account.

Docker in Docker

Creating a Docker container that does the docker pull command is pretty easy with ‘Docker in docker’ container (aka ‘dind’) so we could easily create a pod that looks like:

containers:
- name: prepull 
   image: docker
   command: ["docker", "pull", "hello-world"]
   volumeMounts:
     - name: docker
       mountPath: /var/run
volumes:
  - name: docker
     hostPath:
       path: /var/run

This pod attaches to the host’s docker daemon and will pull the image ‘hello-world’. That’s fine, but now we need to make sure this pod is being run at least once on every node in the cluster.

DaemonSet

That sounds like a good fit for a DaemonSet. This Kubernetes entity is suitable for infrastructure services that needs to run on all nodes. The usual use cases are log collection and monitoring agents. This sounds prefect for our case, but the issue with DaemonSet, is that it can only have a restartPolicy: Always. This means that Kubernetes will restart the pod infinitely which will constantly attempt to pull the image. It’s not a huge disaster because Docker will just detect that the image exists and will pass quickly, but still it doesn’t feel right.

DaemonSet single-use pattern

To work around this limitation, I did the following trick: I still used the same container to pull the image as explained before, but I designated it as an initContainer instead of regular container. This means it’ll run only once when the pod is scheduled, and after it’s done the regular container in the pod will run. I then used the ‘pause’ container as the pod’s main container, this container practically does nothing, which is exactly what we want to do after we pulled the image on that node – keep the DaemonSet pod alive but take as little resources as possible. I guess you can call this pattern a workaround, but it’s a useful trick for scenarios like this.

Result

The end result is shared here: https://gist.github.com/itaysk/7bc3e56d69c4d72a549286d98fd557dd

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
  name: prepull
spec:
  selector:
    matchLabels:
      name: prepull 
  template:
    metadata:
      labels:
        name: prepull 
    spec:
      initContainers:
      - name: prepull 
        image: docker
        command: ["docker", "pull", "hello-world"]
        volumeMounts:
        - name: docker
          mountPath: /var/run
      volumes:
      - name: docker
        hostPath:
          path: /var/run
      containers:
      - name: pause
        image: gcr.io/google_containers/pause

You can just ‘kubectl create’ it, and it will make sure your ‘hello-world’ pod is pulled into every node in the cluster.

New to Codefresh? Create Your Free Account Today!

4 thoughts on “The Single-use Daemonset Pattern and Pre-pulling Images in Kubernetes”

Alexandros Solanos says:

April 28, 2020 at 10:38 am

I found that this didn’t work for me in GKE. Even if I ssh in the node I can’t use “docker pull” without doing “docker-credential-gcr configure-docker” first.

A less hacky (but still a little hacky) solution IMO is to deploy your image in a deamonset as a normal container and change its “command” inside the yaml to make it sleep yourself. This way the image will be pulled normally and your app won’t run, so you don’t need to request a lot of resources either.

Mrinal Kanti Ghosh says:

June 7, 2020 at 6:13 pm

Hi, I also used this approach. This actually helps a lot. But just to pre-pull the image every time I have to keep running pods in every node which also eats some resources.
A workaround can be, instead of using initContainers, use containers to fetch the image and delete the daemonset after it successfully pulled the images.

1. sriram says:
  
  July 20, 2020 at 7:52 pm
  
  Well, if our nodes are auto scaling, we cannot delete daemonset. Also, the example does not set any request units, that way this pod is low prioritized as long as other pods set their requests.
  
  However, this pattern can show a node as ready but the image pull is not complete yet, making the real pod that needs the image still pulling again.
  
Ayush says:

July 22, 2024 at 3:38 am

This image is only for linux nodes. Can we have some alternative for windows node also, with light windows images