Create your FREE Codefresh account and start making pipelines fast. Create Account

Combining Progressive Delivery With GitOps and Continuous Delivery Through Argo CD, Argo Rollouts, and Codefresh

24 min read

Progressive delivery is arguably the most reliable and advanced set of deployment practices based on a simple idea. Instead of shutting down the old release and deploying a new one in its place, progressive delivery takes an iterative approach. It gradually increases the reach of a new release. That gives us quite a few benefits like zero-downtime deployments, reduced blast radius, increased security, and so on and so forth.

I will not go into depth about what progressive delivery is. Instead, I will assume that you are already familiar with it. If that’s not the case, please watch Progressive Delivery Explained – Big Bang (Recreate), Blue-Green, Rolling Updates, Canaries.

Today, we are going to go further than merely applying progressive delivery alone. Instead, we will combine it with GitOps and wrap it all up in a continuous delivery pipeline. Just as with progressive delivery, I will assume that you do understand what those are. If you don’t, please watch What Is GitOps And Why Do We Want It? and Continuous Delivery (CD) Is Not What Some Are Trying To Sell You.

Now, before we embark on a quest to combine those three practices, we’ll need to pick the tools we will use.

We’ll choose Argo Rollouts for progressive delivery, Argo CD for GitOps style of deployments, and Codefresh for continuous delivery pipelines. Follow the links if you’re not familiar with any of them.

Finally, we’ll need to pick a progressive delivery strategy we will use. Today that will be canary deployments, even though the examples should apply to any other flavor with only a few minor tweaks.

That’s it. Off we go.

Discussing The Pre-Requirements

The first thing we’ll need is a Kubernetes cluster. Create it any way you like. It can be a “real” cluster in Cloud providers like AWS EKS, Google Cloud GKE, or Azure AKS. It can also be on-prem, or it can be a local cluster like Minikube or Docker Desktop. It does not matter how you create a cluster, as long as you have one that can be used for the exercises that follow.

Further on, we’ll use Istio service mesh. It could have been ALB, NGINX Ingress, or any implementation of Service Mesh Interface (SMI) like, for example, LinkerD. I chose Istio mostly because I had to pick something, and Istio is the one I use a lot lately.

So, go ahead and install Argo CD and Istio if you do not have them already running in your cluster. Once you do, make sure to get the IP through which istio-ingressgateway is accessible and store it in environment variable ISTIO_HOST. We’ll need it later when we generate the address through which we should be able to reach the application we are about to deploy.

If you are a Windows user, I will assume that you are running the commands from a Bourne Again Shell (Bash) or a Z Shell (Zsh) and not PowerShell. That should not be a problem if you followed the instructions on setting up Windows Subsystem for Linux (WSL) explained in the Installing Windows Subsystem For Linux (WSL) YouTube video. If you do not like WSL, a Bash emulator like GitBash should do. If none of those is an acceptable option, you might need to modify some of the commands in the examples that follow.

All the commands are available in the progressive.sh Gist. Feel free to use it if you’re too lazy to type. There’s no shame in copy & paste.

That’s it. Those are all the pre-requirements. Now we are ready to set up the production cluster or, to be more precise, a simulation of production. That will be the first use-case for GitOps through Argo CD.

Setting Up Production

Since we are applying GitOps principles, everything is defined as code, and code is stored in Git. Among other things, that means that the desired state of our production environment should be in a Git repo as well.

To make things simple, I already created a repo with all the definitions, so all we have to do is fork it and clone the fork.

If you are a Linux or a WSL user, I will assume that you created the alias open and set it to the xdg-open command. If you do not have the open command (or the alias), you should replace open with echo and copy and paste the output into your favorite browser.

Please fork the repository.

If you do not know how to fork a GitHub repo, the only thing I can say is “shame on you”. Google how to do it. I will not spend time explaining that.

Next, we’ll clone the newly forked repository.

Please replace [...] with your GitHub organization in the command that follows. If you forked the repo into your personal account, then the organization is your GitHub username.

If, by any chance, you already had that repository from one of my previous exercises, you might want to merge it with upstream just in case I made some improvements in the meantime.

Please execute the commands that follow only if you already forked the repo before.

Next, we’ll create an Argo CD project that will act as a placeholder for all the applications we’ll deploy to the production environment.

*I will no go into details of how Argo CD works since that would take too much time and is not the main subject of this article. So, I’ll assume that you are already familiar with it. If that’s not the case, please read Creating Temporary Preview Environments Based On Pull Requests With Argo CD And Codefresh as an example of an implementation of Argo CD processes.

Feel free to explore project.yaml file to see all the details of what we defined. For now, the only important note is that the production project we just created will be a placeholder for all the applications we’ll deploy and manage in production.

Next, we’ll need a few tools in production. Specifically, we’ll need Prometheus to retrieve and store metrics that we’ll use to evaluate whether to roll forward or roll back releases of our applications. The definition is stored in the orig directory. However, as you will see soon, we’ll configure Argo CD to manage only the manifests in the helm/templates directory of that same repo, so let’s copy it there.

Let’s take a quick look at the definition.

The output is as follows.

There is nothing special about that Application. It uses the “official” chart stored in the prometheus-community/helm-charts repo, it will be running inside the monitoring Namespace, and synchronization will be fully automated.

Similarly, we’ll need Argo Rollouts as well. Like with Prometheus, we’ll just copy the definition into the helm/templates directory, assuming that Argo CD will make sure that it is up-and-running.

Let’s take a quick look at the definition.

The output is as follows.

As you can see, the definition follows the same pattern as the one we used for Prometheus. It will use the chart from the [argoproj/argo-helm]8https://github.com/argoproj/argo-helm) repo and it will deploy it inside the argo-rollouts Namespace.

Finally, there is one more application we’ll need to deploy. We’ll need a demo app to demonstrate the whole continuous delivery process combined with GitOps and progressive delivery. We’ll use one of my applications for that, and the definition is stored in orig/devops-toolkit-rollouts.yaml. But, before we copy it to helm/templates, we’ll need to change the host to something that will work inside your cluster.

We’ll use xip.io since I could not assume that you have a “real” domain that you can use for the exercises or, if you do, that you configured its DNS to point to the cluster.

We’ll use sed to replace the hard-coded devopstoolkitseries.com host with the one based on xip.io and store the output into the helm/templates directory.

The command that follows assumes that you have the IP through which the Istio Gateway Service can be accessed. That is typically the istio-ingressgateway Service.

The output is as follows.

That will be our initial definition of the demo application. It assumes that the base manifests are in the vfarcic/devops-toolkit repo. It also overwrites a few of the default values. It will use the latest tag (a horrible practice that we’ll correct soon), it disables NGINX Ingress, and it enables istio, hpa, and rollout. We’ll see the meaning of some of those variable overwrites later. For now, what matters is that we have a definition of the demo application stored in helm/templates together with the rest of the apps running in the production environment.

!

There is only one more thing missing. We need to create an app of apps. All those applications we defined so far are references to different sources where base definitions are stored. All three of them (Prometheus, Argo Rollouts, and devops-toolkit) are referencing the repositories of those applications. We are now missing an app that will tell Argo CD to look for production manifests in the repository we forked initially.

I already prepared the file. However, it has my GitHub organization hard-coded, so we’ll have to change it to yours. Just as before, we’ll use “sed magic” for that.

The output is as follows.

That definition is still following the same pattern. The Application is almost the same as the one we used for the other apps. The major difference is that, this time, the repoURL is set to the helm path of the argocd-production repo, which holds all the Applications that should run in production. It is an application that defines all the applications for a specific environment.

Let’s push the changes to GitHub before we create the app of the apps. Even though we will create it manually, the definitions should, as a minimum, be stored in Git.

Now we can apply the definitions of the app of the apps.

That’s it. We have the whole production environment defined in Git repositories, with those three apps combined representing the desired state. Argo CD, on the other hand, made sure that the actual state (the cluster) is converged into the desired state. We can confirm that through the Argo CD UI.

Let’s “discover” the Argo CD UI address.

Please note that, depending on how you installed Argo CD, the command might differ in your case.

The output, in my case, is as follows.

If the output is empty or there was an error, you probably did not enable Ingress for Argo CD, or you forgot to specify the address. If that’s the case, you might need to update your Argo CD installation.

Now that we know the address of Argo CD, we can open it in the default browser.

Please note that the command that follows uses HTTP. If you set up TLS for your Argo CD UI, you might need to change http to https in the command that follows.

We should see five applications in Argo CD UIs home screen. There should be production, which is app of the apps. It “discovered” the rest of the apps defined in the production repo, and it deployed the other four. We have prometheus, argo-rollouts, devops-toolkit, and devops-paradox. If you’re confused about the latter (devops-paradox), it was already defined in the production repo we forked earlier. Still, even though it’s there, we will not use it today, so you can ignore it.

!

We’re done with the production repo (for now), so we can just as well get out of the local copy.

Now that production is set up, we can turn our attention to the devops-toolkit app. That’s the one we’ll use as a guinea pig for progressive delivery.

Defining The Application

The devops-toolkit demo app is available in the vfarcic/devops-toolkit. However, given that we will start making changes to the app, you will have to fork it. Otherwise, I would need to give you write access to my repo, and that’s not something I’m willing to do. I do not trust you enough.

I’m sure you know how to fork a GitHub repo. So just do it. Fork it!

Let’s clone the forked repo.

Just as before, if, by any chance, you already had that repository from one of my previous exercises, you might want to merge it with upstream just in case I made some improvements in the meantime.

Please execute the commands that follow only if you already forked the repo before.

There are a couple of ways we can approach updates of the releases in the production repo. For example, we could modify the file in the production repo every time we want to have a new release of that application. We are not going to do that. That would mean that the pipeline needs to know about all the app definitions in the production repo. More importantly, it would mean that the structural changes to the definition would need to be done there. I prefer that everything related to a single application is in the repo of that app. That way, people in charge of the application can have (almost) full control over it from a single repo.

Long story short, we’ll create a template (of sorts) in the devops-toolkit repo. Later on, we’ll be able to instruct the pipeline to “transform” some key values of that file and copy it into the production repo whenever we need a new release.

By now, you probably know that I do not want to make you work too much, so I have already prepared the file we’ll use. We just need to copy it and, while doing that, replace the hard-coded Istio Gateway address with the one that will work your setup.

The output is as follows.

It’s yet another Argo CD application with the reference to the repoURL and the path where the definitions are, a few production-specific values we are overwriting, and the ignoreDifferences section. Assuming that you already have a basic understanding of how Argo CD works, you should not have any issue understanding that definition except, maybe, the last part (ignoreDifferences). So let’s talk about it.

Argo CD will ensure that whatever is defined in Git repos it monitors is applied to the cluster. It will do its best to always keep the actual state the same as the desired state of our applications and all the resources they need. However, that is in stark contrast with the dynamic nature of Kubernetes clusters and the processes running inside it.

Argo Rollouts will roll out new releases gradually. It will let a fraction of our users see it, and it will run the analysis to decide whether the reach of the new release should be increased to more users or it should rollback. To do that, it will need to modify the behavior of the application at runtime. In other words, Argo Rollouts will be changing the actual state. To be more specific, Argo Rollouts will be modifying the weight of Istio Virtual Services.

Argo CD, on the other hand, does not like when someone modifies the actual state. As a matter of fact, it despises it, and it will do its best to undo any divergence from the desired state. So, if Argo Rollouts modifies Virtual Services (the actual state), Argo CD will soon afterward try to undo those changes by synchronizing the actual (cluster) into the desired state (Git).

As you can see, Argo CD and Argo Rollouts might be at odds. One is trying to make the cluster resources always the same as the definitions in Git, while the other is trying to modify the resources in the cluster as part of progressive delivery processes.

We are fixing those issues through the ignoreDifferences section of the Application definition. Argo CD will ignore the section of the VirtualService manifests defined inside specified jsonPointers. Given that weight we discussed is inside spec.http, it will be ignored, except during the initial deployment.

Now that we have the production environment up-and-running and that we explored the production.yaml file that will be used to define new releases of devops-toolkit, we can turn our attention to the pipeline that will glue it all together into a continuous delivery process.

Creating The Pipeline

Before we create a Codefresh pipeline that will do all the work for us, I should probably comment on a few assumptions I made.

I am assuming that you already have codefresh CLI installed. Also, I am guessing that you created a GitHub context so that Codefresh knows how to use your repositories. If my assumptions are wrong, please go through the Samurais Do NOT Use UIs: Using CLI To Configure Codefresh And Create And Manage Kubernetes Pipelines article. It provides all the instructions you’ll need, and much more.

With that out of the way, let’s double-check whether the CLI works and whether you have a github context.

In my case, the output is as follows.

To simplify the commands that follow, we’ll put the NAME of the git.github context into an environment variable.

Please replace [...] in the commands that follow with the context you’re planning to use.

By now, you probably have the expectation that I prepared everything so that you can be as lazy as you can be. Given that I would feel bad not meeting your expectations, I created a definition of the pipeline we will use. We just need to replace a few hard-coded values.

The output is as follows.

I’ll assume that you already have at least a basic understanding of Codefresh pipeline syntax, so I’ll provide only a quick overview of what it does.

The triggers section will ensure that pipeline builds are triggered whenever we push a change to the master branch of the devops-toolkit repo.

Inside the release stage of the steps section, we are cloning the app repo (main_clone), and we are building the binary (build_app) and the container image (build_image). Those are standard steps that you are probably already doing one way or another.

The deploy stage is more interesting.

In the deploy stage, we are cloning yet another repo (clone_env_repo). This time, we are retrieving the production repository. Further on (define_production), we are checking out a new branch of the production repo and modifying the production.yaml file from the app repo using kyml. It replaces the placeholder in the definition with the new image tag (the one we built in one of the previous steps). Finally, we are pushing the changes (push_env_repo) and creating a new pull request (github_pr).

That’s it. The pipeline might look big at first, but it is a straightforward one.

The most important observation is that we are NOT interacting with the cluster in any form or way. For all we know, neither Codefresh nor anyone or anything else has access to the cluster. Instead, we interact only with Git repositories hoping that Argo CD running inside our cluster will pull the changes and do the right thing.

Let’s act as responsible adults and push those changes to Git before creating the pipeline.

Creating a pipeline is easy, so let’s just do it.

Next, let’s take a look at the pipeline from the Codefresh UI.

Please click the VARIABLES tab, and you’ll see that one was detected as non-existent. Expand the DETECTED VARIABLES section, and you’ll see that GITHUB_TOKEN is missing. I did not want to hard-code the token inside the pipeline definition (that would be irresponsible), so it’s defined as a variable.

Please click the GITHUB_TOKEN button and type the GitHub token (create it if you do not have it at hand). Since the token is confidential, we should encrypt it by clicking the Encrypt icon next to the value.

All that’s left is to click the SAVE button.

Exploring The Initial Deployment

We still need to explore the Argo Rollouts definition. We’ll go through it fast since the focus is not to go deep into any specific tool but to demonstrate how to combine it all into a unified process. If you need an overview, please watch Argo Rollouts – Canary Deployments Made Easy In Kubernetes.

Everything related to Argo Rollouts is in the helm/templates/rollout.yaml and helm/values.yaml files, so let’s take a quick look at the former one first.

I’ll let you explore the definitions yourself. Once you’re done, we’ll take a look at the values.yaml file since it contains the critical piece of information.

The output, limited to the relevant parts, is as follows.

The rollout process will have three steps, with pauses in between. It will set the weight to ten percent, wait for two minutes, change the weight to thirty percent, wait for thirty seconds, change the weight to fifty percent, wait for thirty seconds again, and, after all that, roll out fully.

The reason for having the first pause so long (compared to others) lies in Prometheus pulling frequency. By default, it pulls metrics every two minutes, so we want to ensure that the metrics are available for the analysis. That’s why the spec.strategy.canary.steps.analysis.startingStep entry in helm/templates/rollout.yaml is set to 2. Argo Rollouts will start analyzing metrics only after it reaches the second step (setWeight: 30), or, to put it in other words, after a bit more than two minutes.

The initial deployment of the app is already done. The moment we told Argo CD of the existence of the production repo, it converged the actual into the desired state. Since devops-toolkit is one of the Argo CD applications defined in that repo, it should already be running inside the cluster. We can confirm that by retrieving the devops-toolkit-devops-toolkit rollout from the production Namespace.

If that command did not work, you probably did not install the kubectl plugin for Argo Rollouts. If that’s the case, please visit the Kubectl Plugin Installation for instructions.

The output is as follows.

We can see that the first deployment (revision:1) was successful. The ActualWeight is set to 100, meaning that all the traffic is going to that release. Now, to be fair, that wasn’t progressive delivery. The first release of something cannot be deployed progressively since there is no previous release to progress from. We’ll change that soon when we deploy a second release of the demo app.

Rolling Back Releases

There are two typical outcomes of any deployment. It’s successful, or it failed. Actually, there is a third outcome that can be described as “we don’t know”, but I’ll leave that one aside.

In the case of canary deployments, success means that it is rolled out to all the users, while repeated failure results in a rollback to the previous release. We’ll explore the latter first.

Let’s see what happens when things go wrong. For that, we’ll be sending requests to a non-existing endpoint. As a result, that should generate response codes with errors in 4xx range.

First, we’ll need the IP of the Istio Gateway.

Copy the output.

Next, open a second terminal session and (re)declare the ISTIO_HOST variable.

Please replace [...] in the command that follows with the IP copied earlier.

Now we can start sending “false” requests to the app, hoping they will generate issues that will compel Argo Rollouts to roll back the release we are about to deploy.

Let’s leave the loop running and go back to the first terminal session.

Now we can “pretend” to create a new release of the application.

Typically, we’d write some code, create a pull request, deploy it to a temporary preview environment, let our colleagues review it, and do whatever else we usually do before merging PRs. But we’ll skip all that for brevity reasons. Progressive delivery (e.g., canary deployment) is generally used only in production anyway.

All in all, we’ll skip the PR part of the process and make some changes directly to the mainline.

We made a silly change to the local copy of the repository, and we pushed it to GitHub. As a result, a new pipeline build should be running (or it might have already finished). Let’s take a quick look at it.

Please select the latest build to see the progress.

Once the build is finished, we should have a new pull request with the suggested changes in the argocd-production repo. That means that we are practicing continuous delivery instead of continuous deployment. If the latter is your preference, all you would have to do is change the pipeline to push directly to the mainline of the production repo instead of making a pull request.

Let’s see the newly created PR.

Go inside the PR and explore the changes. Merge it once you’re done.

From here on, Argo CD should detect the modification to the definitions in the production repo and make the necessary changes to the cluster. Since the app definition is based on Argo Rollouts, the modification should start the process of canary deployments. We can watch the progress with the command that follows.

After a few minutes, when the process is finished, the output should be similar to the one that follows.

The process immediately changed the weight so that ten percent of the requests are going to the new release, while all the rest are redirected to the old. Two minutes later, it increased the weight of the new release to thirty percent and started analyzing metrics stored in Prometheus. After the analysis failed to reach the threshold three times, it started rolling back to the old release. At the end of the process, the situation is the same as if we never even tried to roll out a new release. Only the old release is running, and a hundred percent of the requests are redirected to it.

Pease ctrl+c to stop watching the rollout.

We will not need the loop sending requests to the non-existent endpoint, so let’s stop it as well.

Go to the second terminal session and cancel the loop by pressing ctrl+c.

Let’s see how a successful rollout looks like.

Rolling Forward Releases

Assuming that you are still in the second terminal session, please execute the commands that follow to start sending “good” requests to the app and, through them, generate “positive” metrics.

Next, we’ll make yet another silly release and see what happens.

Go to the first terminal session and execute the commands that follow to make just enough change to the repo to initiate another pipeline build.

Let’s wait until the build is finished. If you’re bored staring at a static screen, feel free to go back to Codefresh builds and watch it progress.

All that’s left to do, once the pipeline build is finished, is to go to the argocd-production repo and merge the newly created pull request.

Open the newly created PR and merge it.

Let’s see how this rollout will look like.

After a couple of existing minutes, the output should be similar to the one that follows.

All the requests should be going to the new release, and the old one should be scaled down to zero replicas. We were successful!

That’s it. You can stop watching the rollout and stop sending requests by pressing ctrl+c in both terminal sessions.

Cleaning Up The Mess

It’s always a good idea to clean up the mess after we’re finished “playing”, so let’s remove any trace of our activities.

Please make sure that you are in the first terminal session and execute the commands that follow.

I’ll use gh CLI to open the repos. Please install it if you do not have it already.

Finally, destroy the cluster if you created it only for the exercises.

Viktor Farcic

Viktor Farcic

Principal DevOps Architect @ Codefresh

Leave a Reply

* All fields are required. Your email address will not be published.

See how Codefresh helps you
deploy more and fail less!