How to Model Your GitOps Environments and Promote Releases between Them

How to Model Your GitOps Environments and Promote Releases between Them

17 min read

Two of the most important questions that people ask themselves on day 2 after adopting GitOps are:

  1. How should I represent different environments on Git?
  2. How should I handle promoting releases between environments?

In the previous article of the series, I focused on what NOT to do and explained why using Git branches for different environments is a bad idea. I also hinted that the “environment-per-folder” approach is a better idea. This article has proved hugely popular and several people wanted to see all the details about the suggested structure for environments when folders are used.

In this article, I am going to explain how to model your GitOps environments using different folders on the same Git branch, and as an added bonus, how to handle environment promotion (both simple and complex) with simple file copy operations.

GitOps promotion
GitOps promotion

Hopefully this article will help with the endless stream of questions and discussions on this hot topic.

Note that this article talks about Kubernetes manifests only. If you want to work specifically with Argo CD please see our Application Set guide as well.

Learn your application first

Before creating your folder structure you need to do some research first and understand the “settings” of your application. Even though several people talk about application configuration in a generic manner, in reality not all configuration settings are equally important.

In the context of a Kubernetes application, we have the following categories of “environment configuration”:

  1. The application version in the form of the container tag used. This is probably the most important setting in a Kubernetes manifest (as far as environment promotions are concerned). Depending on your use case, you might get away with simply changing the version of the container image. However, several times a new change in the source code also requires a new change in the deployment environment
  2. Kubernetes specific settings for your application. This includes the replicas of the application and other Kubernetes related information such as resource limits, health checks, persistent volumes, affinity rules, etc.
  3. Mostly static business settings. This is the set of settings that are unrelated to Kubernetes but have to do with the business of your application. It might be external URLs, internal queue sizes, UI defaults, authentication profiles, etc. By “mostly static,” I mean settings that are defined once for each environment and then never change afterwards. For example, you always want your production environment to use production.paypal.com and your non-production environments to use staging.paypal.com. This is a setting that you never want to promote between environments
  4. Non-static business settings. This is the same thing as the previous point, but it includes settings that you DO want to promote between environments. This could be a global VAT setting, your recommendation engine parameters, the available bitrate encodings, and any other setting that is specific to your business.

It is imperative that you understand what all the different settings are and, more importantly, which of them belong to category 4 as these are the ones that you also want to promote along with your application version.

This way you can cover all possible promotion scenarios:

  1. Your application moves from version 1.34 to 1.35 in QA. This is a simple source code change. Therefore you only need to change the container image property in your QA environment.
  2. Your application moves from version 3.23 to 3.24 in Staging. This is not a simple source code change. You need to update the container image property and also bring the new setting “recommender.batch_size” from QA to staging.

I see too many teams that don’t understand the distinction between different configuration parameters and have a single configuration file (or mechanism) with values from different areas (i.e. both runtime and application business settings).

Once you have the list of your settings and which area they belong to, you are ready to create your environment structure and optimize the file copy operations for the settings that change a lot and need to be moved between environments.

Example with 5 GitOps environments and variations between them

Let’s see an actual example. I thought about doing the classic QA/Staging/Production trilogy, but this is rather boring so let’s dive into a more realistic example.

We are going to model the environment situation mentioned in the first article of the series. The company that we will examine has 5 distinct environments:

  • Load Testing
  • Integration Testing
  • QA
  • Staging
  • Production

Then let’s assume that the last 2 environments are also deployed to EU, US, and Asia while the first 2 also have GPU and Non-GPU variations. This means that the company has a total of 11 environments.

You can find the suggested folder structure at https://github.com/kostis-codefresh/gitops-environment-promotion. All environments are different folders in the same branch. There are NO branches for the different environments. If you want to know what is deployed in an environment, you simply look at envs/ in the main branch of the repo.

Before we explain the structure, here are some disclaimers:

Disclaimer 1: Writing this article took a long time because I wasn’t sure if I should cover Kustomize or Helm or plain manifests. I chose Kustomize as it makes things a bit easier (and I also mention Helm at the end of the article). Note however that the Kustomize templates in the example repo are simply for illustration purposes. The present article is NOT a Kustomize tutorial. In a real application, you might have Configmap generators, custom patches and adopt a completely different “component” structure than the one I am showing here. If you are not familiar with Kustomize, spend some time understanding its capabilities first and then come back to this article.

Disclaimer 2: The application I use for the promotions is completely dummy, and its configuration misses several best practices mainly for brevity and simplicity reasons. For example, some deployments are missing health checks, and all of them are missing resource limits. Again, this article is NOT about how to create Kubernetes deployments. You should already know how proper deployment manifests look. If you want to learn more about production-grade best practices, then see my other article at https://codefresh.io/blog/kubernetes-antipatterns-1/

With the disclaimers out of the way, here is the repository structure:

GitOps folder structure
GitOps folder structure

The base directory holds configuration which is common to all environments. It is not expected to change often. If you want to do changes to multiple environments at the same time, it is best to use the “variants” folder.

The variants folder (a.k.a mixins, a.k.a. components) holds common characteristics between environments. It is up to you to define what exactly you think is “common” between your environments after researching your application as discussed in the previous section.

In the example application, we have variants for all prod and non-prod environments and also the regions. Here is an example of the prod variant that applies to ALL production environments.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-deployment
spec:
  template:
    spec:
      containers:
      - name: webserver-simple
        env:
        - name: ENV_TYPE
          value: "production"
        - name: PAYPAL_URL
          value: "production.paypal.com"   
        - name: DB_USER
          value: "prod_username"
        - name: DB_PASSWORD
          value: "prod_password"                     
        livenessProbe:
          httpGet:
            path: /health
            port: 8080

In the example above, we make sure that all production environments are using the production DB credentials, the production payment gateway, and a liveness probe (this is a contrived example, please see disclaimer 2 at the start of this section). These settings belong to the set of configuration that we don’t expect to promote between environments, but we assume that they will be static across the application lifecycle.

With the base and variants ready, we can now define every final environment with a combination of those properties.

Here is an example of the staging ASIA environment:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: staging
namePrefix: staging-asia-

resources:
- ../../base

components:
  - ../../variants/non-prod
  - ../../variants/asia

patchesStrategicMerge:
- deployment.yml
- version.yml
- replicas.yml
- settings.yml

First we define some common properties. We inherit all configuration from base, from non-prod environments, and for all environments in Asia.

The key point here is the patches that we apply. The version.yml and replicas.yml are self-explanatory. They only define the image and replicas on their own and nothing else.

The version.yml file (which is the most important thing to promote between environments) defines only the image of the application and nothing else.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-deployment
spec:
  template:
    spec:
      containers:
      - name: webserver-simple
        image: docker.io/kostiscodefresh/simple-env-app:2.0

The associated settings for each release that we DO expect to promote between environments are also defined in settings.yml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-deployment
spec:
  template:
    spec:
      containers:
      - name: webserver-simple
        env:
        - name: UI_THEME
          value: "dark"
        - name: CACHE_SIZE
          value: "1024kb"
        - name: PAGE_LIMIT
          value: "25"
        - name: SORTING
          value: "ascending"    
        - name: N_BUCKETS
          value: "42"         

Feel free to look at the whole repository to understand the way all kustomizations are formed.

Performing the initial deployment via GitOps

To deploy an application to its associated environment, just point your GitOps controller to the respective “env” folder and kustomize will create the complete hierarchy of settings and values.

Here is the example application as it runs in Staging/Asia

GitOps application example
GitOps application example

You can also use Kustomize on the command line to preview what is going to be deployed for each environment. Examples:

kustomize build envs/staging-asia
kustomize build envs/qa
kustomize build envs/integration-gpu

You can of course pipe the output to kubectl to deploy each environment, but in the context of GitOps, you should always let your GitOps controller deploy your environments and avoid manual kubectl operations.

Comparing the configuration of two environments

A very common need for a software team is to understand what is different between two environments. I have seen several teams who have the misconception that only with branches you can easily find differences between environments.

This could not be further from the truth. You can easily use mature file-diffing utilities to find what is different between environments just by comparing files and folders.

The simplest way is to diff only the settings that are critical to the app.

vimdiff envs/integration-gpu/settings.yml envs/integration-non-gpu/settings.yml
GitOps settings diff
GitOps settings diff

And with the help of kustomize, you can compare any number of whole environments for the full picture:

kustomize build envs/qa/> /tmp/qa.yml
kustomize build envs/staging-us/ > /tmp/staging-us.yml
kustomize build envs/prod-us/ > /tmp/prod-us.yml
vimdiff /tmp/staging-us.yml /tmp/qa.yml /tmp/prod-us.yml
GitOps environment diff
GitOps environment diff

I personally don’t see any disadvantage between this method and performing “git diff” between environment branches.

How to perform promotions between GitOps environments

Now that the file structure is clear, we can finally answer the age-old question “how do I promote releases with GitOps”?

Let’s see some promotion scenarios. If you have been paying attention to the file structure, you should already understand how all promotions resolve to simple file copy operations.

Scenario: Promote application version from QA to staging environment in the US:

  1. cp envs/qa/version.yml envs/staging-us/version.yml
  2. commit/push changes

Scenario: Promote application version from integration testing (GPU) to load testing (GPU) and then to QA. This is a 2 step process

  1. cp envs/integration-gpu/version.yml envs/load-gpu/version.yml
  2. commit/push changes
  3. cp envs/load-gpu/version.yml envs/qa/version.yml
  4. commit/push changes

Scenario: Promote an application from prod-eu to prod-us along with the extra configuration. Here we also copy our setting file(s).

  1. cp envs/prod-eu/version.yml envs/prod-us/version.yml
  2. cp envs/prod-eu/settings.yml envs/prod-us/settings.yml
  3. commit/push changes

Scenario: Make sure that QA has the same replica count as staging-asia

  1. cp envs/staging-asia/replicas.yml envs/qa/replicas.yml
  2. commit/push changes

Scenario: Backport all settings from qa to integration testing (non-gpu variant)

  1. cp envs/qa/settings.yml envs/integration-non-gpu/settings.yml
  2. commit/push changes

Scenario: Make a global change to all non-prod environments at once (but see also next section for some discussion on this operation)

  1. Make your change in variants/non-prod/non-prod.yml
  2. commit/push changes

Scenario: Add a new configuration file to all US environments (both production and staging).

  1. Add the new manifest in the variants/us folder
  2. Modify the variants/us/kustomization.yml file to include the new manifest
  3. commit/push changes

In general, all promotions are just copy operations. Unlike the environment-per-branch approach, you are now free to promote anything from any environment to any other environment without any fear of taking the wrong changes. Especially when it comes to back-porting configuration, environment-per-folder really shines as you can simply move configuration both “upwards” and “backwards” even between unrelated environments.

Note that I am using cp operations just for illustration purposes. In a real application, this operation would be performed automatically by your CI system or other orchestration tool. And depending on the environment, you might want to create a Pull Request first instead of directly editing the folder in the main branch.

Making changes to multiple environments at once

Several people have asked in the comments of the first article about the use-case of changing multiple environments at once and how to achieve and/or prevent this scenario.

First of all, we need to define what exactly we mean by “multiple” environments. We can assume the following 2 cases.

  1. Changing multiple environments at once that are on the same “level.” As an example, you want to make a change that affects prod-us, prod-eu and prod-asia at the same time
  2. Changing multiple environments at once that are NOT on the same level. As an example, you want to make a change to “integration” and “staging-eu” at the same time

The first case is a valid scenario, and we will cover this below. However, I consider the second scenario an anti-pattern. The whole point of having different environments is to be able to release things in a gradual way and promote a change from one environment to the next. So if you find yourself deploying the same change in environments of different importance, ask yourself if this is really needed and why.

For the valid scenario of deploying a single change to multiple “similar” environments, there are two strategies:

  1. If you are absolutely certain that the change is “safe” and you want it to reach all environments at once, you can make that change in the appropriate variant (or respective folders). For example, if you commit/push a change in the variants/non-prod folder then all non-production environments will get this change at the same time. I am personally against this approach because several changes look “safe” in theory but can be problematic in practice
  2. The preferable approach is to apply the change to each individual folder and then move it to the “parent” variant when it is live on all environments.

Let’s take an example. We want to make a change that affects all EU environments (e.g. a GDPR feature). The naive way would be to commit/push the configuration change directly to variants/eu folder. This would indeed affect all EU environments (prod-eu and staging-eu). However this is a bit risky, because if the deployment fails, you have just brought down a production environment.

The suggested approach is the following:

  1. Make the change to envs/staging-eu first
  2. Then make the same change to envs/prod-eu
  3. Finally, delete the change from both environments and add it in variants/eu (in a single commit/push action).
Gradual GitOps promotion
Gradual GitOps promotion

You might recognize this pattern from gradual database refactorings. The final commit is “transitional” in the sense that it doesn’t really affect any environments in any way. Kustomize will create the exact same definition in both cases. Your GitOps controller shouldn’t find any differences at all.

The advantages of this approach are of course the easy way to rollback/revert the change as you move it through environments. The disadvantage is the increased effort (and commits) you need to promote the change to all environments, but I believe that the effort outweighs the risks.

If you adopt this approach, it means that you never apply new changes to the base folder directly. If you want a change to happen to all environments, you first apply the change to individual environments and/or variants and then backport it to the base folder while simultaneously removing it from all downstream folders.

The advantages of the “environment-per-folder” approach

Now that we have analyzed all the inner workings of the “environment-per-folder” approach, it is time to explain why it is better than the “environment-per-branch” approach. If you have been paying attention to the previous sections, you should already understand how the “environment-per-folder” approach directly avoids all the problems analyzed in the previous article.

The most glaring issues with environment branches is the order of commits and the danger of bringing unwanted changes when you merge from one environment to another. With the folder approach, this problem is completely eliminated:

  1. The order of commits on the repo is now irrelevant. When you copy a file from one folder to the next, you don’t care about its commit history, just its content
  2. By only copying files around, you only take exactly what you need and nothing else. When you copy envs/qa/version.yml to envs/staging-asia/version.yml you can be certain that you only promote the container image and nothing else. If somebody else has changed the replicas in the QA environment in the meantime, it doesn’t affect your promotion action.
  3. You don’t need to use git cherry-picks or any other advanced git method to promote releases. You only copy files around and have access to the mature ecosystem of utilities for file processing.
  4. You are free to take any change from any environment to either an upstream or downstream environment without any constraints about the correct “order” of environments. If for example you want to backport your settings from production US to staging US, you can do a simple copy operation of envs/prod-us/settings.yml to envs/staging-us/settings.yml without the fear that you might take inadvertently unrelated hotfixes that were supposed to be only in production.
  5. You can easily use file diff operations to understand what is different between environments in all directions (both from source and target environments and vice versa)

I consider these advantages very important for any non-trivial application, and I bet that several “failed deployments” in big organizations could be directly or indirectly attributed to the problematic environment-per-branch model.

The second problem mentioned in the previous article was the presence of configuration drift when you merge a branch to the next environment. The reason for this is that when you do a “git merge,” git only notifies you about the changes it will bring, and it doesn’t say anything about what changes are already in the target branch.

Again this problem is completely eliminated with folders. As we said already, file diff operations have no concept of “direction.” You can copy any setting from any environment either upwards or downwards, and if you do a diff operation on the files, you will see all changes between environments regardless of their upstream/downstream position.

The last point about environment branches was the linear complexity of branches as the number of environments grows. With 5 environments, you need to juggle changes between 5 branches, and with 20 environments, you need to have 20 branches. Moving a release correctly between a large number of branches is a cumbersome process, and in the case of production environments, it is a recipe for disaster.

With the folder approach, the number of branches is not only static but it is exactly 1. If you have 5 environments you manage them all with your “main” branch, and if you need more environments, you only add extra folders. If you have 20 environments, you still need a single Git branch. Getting a centralized view on what is deployed where is trivial when you have a single branch.

Using Helm with GitOps environments

If you don’t use Kustomize but prefer Helm instead, it is also possible to create a hierarchy of folders with “common” stuff for all environments, specific features/mixins/components, and final folders specific to each environment.

Here is how the folder structure would look like

chart/
  [...chart files here..]
common/
  values-common.yml
variants/
  prod/
     values-prod.yml
  non-prod/
    Values-non-prod.yml
  [...other variants…]
 envs/
     prod-eu/
           values-env-default.yaml
           values-replicas.yaml
           values-version.yaml
           values-settings.yaml
   [..other environments…]

Again you need to spend some time to examine your application properties and decide how to split them into different value files for optimal promotion speed.

Other than this, most of the processes are the same when it comes to environment promotion.

Scenario: Promote application version from QA to staging environment in the US:

  1. cp envs/qa/values-version.yml envs/staging-us/values-version.yml
  2. commit/push changes

Scenario: Promote application version from integration testing (GPU) to load testing (GPU) and then to QA. This is a 2 step process

  1. cp envs/integration-gpu/values-version.yml envs/load-gpu/values-version.yml
  2. commit/push changes
  3. cp envs/load-gpu/values-version.yml envs/qa/values-version.yml
  4. commit/push changes

Scenario: Promote an application from prod-eu to prod-us along with the extra configuration. Here we also copy our setting file(s).

  1. cp envs/prod-eu/values-version.yml envs/prod-us/values-version.yml
  2. cp envs/prod-eu/values-settings.yml envs/prod-us/values-settings.yml
  3. commit/push changes

It is also critical to understand how Helm (or your GitOps agent which handles Helm) works with multiple value files and the order in which they override each other.

If you want to preview one of your environments, instead of “kustomize build” you can use the following command

helm template chart/ --values common/values-common.yaml --values variants/prod/values-prod.yaml –values envs/prod-eu/values-env-default.yml –values envs/prod-eu/values-replicas.yml –values envs/prod-eu/values-version.yml –values envs/prod-eu/values-settings.yml

You can see that Helm is a bit more cumbersome than Kustomize, if you have a large number of variants or files in each environment folder.

The “environment-per-git-repo” approach

When I talk with big organizations about the folder approach, one of the first objections I see is that people (especially security teams) don’t like to see a single branch in a single Git repository that contains both prod and non-prod environments.

This is an understandable objection and arguably can be the single weak point of the folder approach against the “environment-per-branch” paradigm. After all, it is much easier to secure individual branches in a Git repository instead of folders in a single branch.

This problem can be easily solved with automation, validation checks, or even manual approvals if you think it is critical for your organization. I want to stress again that I only use “cp” in the file operations for promoting releases just for illustration purposes. It doesn’t mean that an actual human should run cp manually in an interactive terminal when a promotion happens.

Ideally you should have an automated system that copies files around and commits/pushes them. This can be your Continuous Integration (CI) system or other platform that deals with your software lifecycle. And if you still have humans that make the changes themselves, they should never commit to “main” directly. They should open a Pull Request instead. Then you should have a proper workflow that checks that Pull Request before merging.

I realize however that some organizations are particularly sensitive to security issues and they prefer a bulletproof approach when it comes to Git protection. For these organizations, you can employ 2 Git repositories. One has the base configuration, all prod variants, and all prod environments (and everything else related to production) while the second Git repository has all non-production stuff.

This approach makes promotions a bit harder, as now you need to checkout 2 git repositories before doing any promotion. On the other hand, it allows your security team to place extra security constraints to the “production” Git repository, and you still have a static number of Git repositories (exactly 2) regardless of the amount of environments you deploy to.

I personally consider this approach an overkill that, at least to me, shows a lack of trust against developers and operators. The discussion on whether or not people should have direct access to production environments is a complex one and probably deserves a blog post on its own.

Embrace folders and forget branches

We hope that with this blog post we addressed all the questions that arose from the “don’t use branches for environments” article and you now have a good understanding about the benefits of the folder approach and why you should use it.

If you are working with Argo CD please also check the Application Set guide.

If you have other interesting use cases or have extra questions on the subject of organizing your GitOps environments, please ask in the comments section.

Happy GitOps deployments!

Photo by the blowup on Unsplash

How useful was this post?

Click on a star to rate it!

Average rating 4.9 / 5. Vote count: 54

No votes so far! Be the first to rate this post.

80 thoughts on “How to Model Your GitOps Environments and Promote Releases between Them

  1. Thank for this article,

    I want to challenge you for the below two points :

    1. How do you plan for rollback ? especially its a single branch, suppose we promote a change to production “commitX”, then we commit a change to staging “commitY” . I want to rollback production changes only.

    2. You know that in real automated approach , all initiated by CI system that build the image and you may have some new business configurations to be added as configmaps or envvars, how you start the process after building image from source code toward first environment “QA as example” and introducing the new business configuration as well.

    1. Re: your first point: I guess you can just revert “commitX” via “git revert commitX” which creates another commit on top of the main branch. Or, if you have a UAT environment which replicates the settings from production, you could just copy the corresponding setting file from UAT to prod.

      Re: your second point: I’ve decoupled the CI pipeline from the CD pipeline. The former just creates the binaries, Container images, and Helm Chart and publishes it to the corresponding registries/repositories. At the end, it sends an event which is consumed by the CD pipeline. Here, I’m cloning the Git environment repository, generate all required files and commit these to the repo (if you need some further adjustments to the settings, you can commit this to a feature branch and create a pull request). Then it is up to the GitOps controller to deploy/provision the new environment once the change is applied to the main branch.

    2. For rollback you can literally do “git revert commitX”. CommitX is reverted and commitY stays as is. It doesn’t get any simpler than this.

      That is a good question but it has nothing to do with environment-per-folder. You still need to answer it if you are using environment-per-branch. You can use hooks in your CI system,
      Flux/Argo image updater or something custom. There is no way around it in either case.

      1. You have to ensure each commit has at most involves 1 env, otherwise when rollback, the other env will also get changed. And you cannot rollback to commit that changes the variant or shared configurations, otherwise same issue would happen.

        1. Ensuring that a commit only touches is a single environment is a good practice that you should follow anyway (regardless of branches/folders). It is mentioned in the article as well. Not just for rollbacks but also for a sane git history record.

  2. Really interesting reading – thank you for sharing and your efforts to write this article.

    At the moment I’m using the environment-per-branch pattern, but I’ve been thinking of migrating to a trunk-based pattern. Luckily, the configuration is already split into common values (which apply to all environments) and environment specific settings (to avoid merge conflicts). But your recommendation to split this further makes a lot of sense to me. What I’m still struggling with is how the environment-per-folder approach works if you want to use different Helm Chart versions in the environments. For example, if you want to use/test the Helm Chart in version 2.0 in the QA environment, but still want to use version 1.8 in staging until 2.0 has been fully tested. Would it then make sense to move the Helm Chart (Chart.yaml) into the corresponding envs folders? Unfortunately, ArgoCD doesn’t seem to support using the Helm Chart from the Helm repository, but the Helm Value files from the Git repository – otherwise the Helm Chart version could be part of the corresponding (environment specific) Application resource file from ArgoCD. Instead you have to write another Helm Chart in your Git repository and reference the original Helm Chart as a dependency in the Chart.yaml 🙁

    1. Yes it would make absolute sense to move the Helm chart in the envs folder.

      Note however that there is an open issue in Argo CD (I am not sure about Flux) that will support getting Helm values from a different place than the chart it self, so when this is implemented you might need to re-evaluate your case.

      That being said (and I know you might not like this suggestion) I personally suggest using Helm for external applications (i.e. those you don’t develop) and instead adopt Kustomize for your own apps (the ones your development team produces).

  3. Where do you maintain the ArgoCD Application manifests (or Flux equivalents)?

    Are they also in git? Which repo? Where?

    1. Certainly in Git. Anywhere you want (same repo or other). I don’t think it matters. I mean how often do you need to change them? At least in the case of ArgoCD each application file says that this folder should be deployed to this cluster (e.g. QA folder goes to QA cluster). I think this is mostly static information. Is there another concern that I am missing?

  4. Again, huge thanks for this article and your work.

    I also want to challenge you on 2 main points:

    1. It’s exactly the same as Mohammed Abusaa, I think his point is very relevant. How do you handle rollback of your application for a specific environment since with the folder approach, all commits from all environments are mixed in the same branch.
    If you use a tool like ArgoCD to deploy your application, you will have to git revert in order to rollback your changes after some tests for example, but if you mix your commits, I don’t see any solution to rollback properly except maybe simulating a git revert by re-commit the state of the previous version of the application. But it becomes very hard to handle.

    2. My second point is how do you handle different deployment approach ?
    For example, I want to do continuous deployment on DEV and QA but I want to stick with continuous delivery on PROD (with a MR/PR and protected main branch). Seems like the only way is to create a separated repository only for prod like you said just above in the The “environment-per-git-repo” approach chapter.

    Anyway, thanks a lot, thats really instructive and gfood quality article.

    1. 1. I am not sure I understand the issue here. You promote from integration to QA with commitX. Then let’s say you promote from Prod-eu to Prod-US with commitY. The first promotion went wrong. You do “git revert commitX”. The second promotion stays as is. What is there to simulate? What is hard to handle? On a related note I would use Argo Rollout or Flagger for progressive delivery instead of manually reverting stuff

      2. The “environment-per-repo” approach was suggested ONLY for security reasons. No need to adopt it if you don’t have this limitation. To answer your question you always do MR/PRs and the main branch is of course protected. In the case of Continuous deployment you simply auto-approve the PR if all checks are ok. Auto-approving/merging PRs is an essential capability of all good CI systems. It has nothing to do with GitOps or ArgoCD.

      1. Indeed you are right, I went through the whole thing too quickly and obviously you can just git revert regardless of the commit order. Fine, thanks for explanation.

        And thanks for explanation about PR/MR.

        Considering your usage of Argo Rollouts, I agree with you that this kind of tool is very useful, but, the way I tried it, I didnt find any solution to avoid reverting my commit.
        Argo Rollouts will rollback the version if it doesnt pass the test phase, but only on a pod perspective. It means that you will have your application in a degraded state with version A in the cluster, but version B on the Git repository and then OutOfSync from ArgoCD. How do you handle the rollback from Argo Rollouts matching the actual Git state of your application ?

        1. Argo Rollouts doesn’t touch Git. It is not an alternative to Git reverts. Normally if a deployment fails (and you onlyuse ArgoCD) your environment is down and you are on the hook for making a quick git revert. With Argo Rollouts, the environment will still be up (of course with the previous version). This means that you can now git revert at your leisure, or even better fix the problem and deploy again (roll forward). Basically with Argo Rollouts you should have zero downtime even on failed deployments. It is a way to avoid hasty git reverts in the first place.

          In theory you could also use Argo Events/Rollout notification to also auto-revert in Git as well, but I think this is too complex. Rolling forward (e.g. fixing the issue) is much more realistic in my opinion.

  5. Wouldn’t the approach of copying the version from one environment to the other may cause the release of a broken version in case of a race condition?

    imagine the following flow:
    1. both staging and production are in version v22
    2. staging gets new version v23
    3. CD tool starts deploying to staging (pipeline 1), and it takes a while
    4. while staging is deploying, the CI tool pushes a new version v24 to staging, and pipeline 2 starts
    5. pipeline 1 finishes deploy to staging, and triggers a copy of the staging version to production

    now since the current version in staging is v24, that’s the version that’s gonna be applies to production, even though the second pipeline didn’t finish. In case v24 is broken, we just release a broken version to production

    1. First of all there are no “deployment pipelines” in GitOps. Only the GitOps controller is allowed to deploy applications.

      Your question is a valid one, but I don’t see why it is related to the folder approach. I can ask the exact same thing even if you have branches per environments.

      “Imagine the following flow”
      1.both staging and production BRANCHES are in version v22
      2. staging BRANCH gets new version v23
      3. GitOPS controller starts deployment of staging BRANCH to v23
      4. while staging BRANCH is deploying, the CI tool pushes a new version v24 to staging BRANCH
      5. Staging deployment is finished , and triggers a merge of the staging BRANCH to production.

      “If v24 is broken we just released a broken version to production”.

      Do you see my point?

      Anyway to answer your question you can either serialize your CI pipelines (allowing only a single instance of a specific pipeline to run is an essential function of all CI systems) or use
      a feature of the GitOps controller that solves this (sync windows come in mind for Argo CD).

      But I want to stress out that your question has nothing to do with GitOps or why the “environment-per-folder” approach is the best recommendation for GitOps environments. Or do you mean something different and I misunderstood your question?

      1. Your flow imagining isn’t quite right.

        In the “branch-per-environment” approach, the flow that promotes from staging to production will be linked to a specific commit revision, and therefore when it gets to the “merge to production” stage, what lands in production is the state that was present at the start of the flow.

        Therefore the result will be “if v24 is broken we just released v23 to production while the broken v24 is on staging”.

        Which is exactly what you want.

        TL;DR: With branch-per-environment you can pin flows to a specific state (point in revision history) of the gitops repo.

        1. The article is written in a completely generic way and doesn’t assume any specific GitOps provider. However if we are talking about ArgoCD, I assume you mean that the application entry is pointed at a specific Git revision which itself is changed by CI or other system.

          However if that is the case, you can still use the same solution with folders as well. Have your CI system commit the ArgoCD app and change both the “path” and “targetRevision” fields to specific points in time.

          So this solution will work regardless on whether you use branch-per-environment or branch-per-folder approach if you want to have the level of granularity.

      2. Hello Kostis,

        Thanks for your reply.

        In my comment I haven’t praised the branch per environment approach, so I am not sure how the discussion shifted to that. I only mentioned that copying the current state from one place to another may have race conditions, and it doesn’t matter if the current state is multiple branches or a single one.

        I think this is a question relevant to GitOps, yes. In a traditional way to deploy (where a CI tool would run a command to apply the new version), this version would be passed along a pipeline, and each step would apply the same new version without risk of a race condition. But in the proposal of this post there is this risk.

        To fix that you mentioned two approaches that I don’t think are ideal. The first one is in my opinion a hack that requires additional complexity and would delay the CD if you have multiple deploys a day. The second would just make it less likely to happen, but still make it possible.

        Please don’t think I am raging against GitOps or your approach. I think they are great, and I am using it in my current project. But it has this downside and I don’t think it should be just ignored.

        1. I think that in most CI tools, by default each pipeline can run multiple instances of itself. Unless you actively disable it or the CI tool has a feature about it, then what happens if your “deployment to production” pipeline runs multiple times in the exact same time period? Isn’t this a problem in the traditional way to deploy?

          I know that in the context of Kubernetes maybe you will not have an issue but traditional deployment pipelines did other stuff such as db upgrades, load balancer changes, etc. So the danger of race conditions was always there and you always had to do something about it.

  6. Very interesting read.

    I am curious how you handle use cases where you deploy something from a 3rd party. Usually you would have something like “https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml” in your base kustomization and then apply the necessary patches on top.

    In this example, upgrading the version of cert-manager in the base, would upgrade it on all environments. How do you get around this?

    Would you use an empty base and provide a specific cert-manager version in each environment, and once they are all equal, you again put it back into the base?
    Or just keep it into the base but have each argoCD applications refer to a specific tag or commit SHA, so each environment does not get upgraded automatically?

    Same problem would apply for a 3rd party chart, except you wouldn’t have the option of an empty base here.

    1. Hello. The article is focusing on applications that your developers create since these applications have a faster/frequent lifecycle.

      Regarding external applications, the answer is it depends 🙂 . I mean it depends on what you think about their lifecycle.
      If cert-manager is something that you update all the time then yes, I would have each environment have its own version and if they are the same move back to parent base. Essentially
      treat it like your “own” application.
      However if another external app is something that you only update from time to time, then maybe I would put in the two bases (non-prod and prod) and update in all environments
      at once (accepting the risk). Up to you.

      The point is that the “environment-per-folder” approach is much more flexible and with less limitations than the “environment-per-branch” approach.

  7. Hi, thanks for the informative article. I wanted to ask what the problem is with having the k8s manifest in the application repository instead of having 2 Git Repos. The GitOps Tools of choice can still monitor the path in the repo and changes that affect the manifest, as well as the source code, like introducing a new env variables can be done in one commit.

    1. Hello. People have already written about this here https://argo-cd.readthedocs.io/en/stable/user-guide/best_practices/ and here https://codefresh.io/argo-platform/argo-cd-best-practices/

      The biggest problem for me is reuse. If I have 10 similar microservices/applications, I don’t want to maintain 10 charts/manifests. I will split the code in 10 repos with just source code
      and one repo with a single chart (or kustomize stuff or whatever). If you keep everything along with the application, I can bet that you have duplication in your manifests. And then if I want to make a global change on all my kubernetes manifests, I need to go to 10 repositories one-by-one and make the exact same change.

      Can you share how many repositories with applications do you have (that include the manifests on the same repo I mean)?

      The second problem is waste of resources and extra complexity. If you just change a manifest, your CI system will still trigger a pipeline and try to re-build the source code. You need to instruct your CI system
      to ignore the manifest directory and not all CI systems have this capability. And for those CI systems that support it, you need to remember to enable it for each and every pipeline that you have that touches this repository. You are trying to solve a problem that should not exist in the first place.

      The last point is that by decoupling them you can follow different Git approaches for each one of them. For example you can follow Git-flow for source code and trunk-based-development for manifests.

      1. Hi Kostis, we are planing to move to Kubernetes in 1 month and I am currently building the PoC, so at the moment we have 0 :D. However we want to move 8 stateless applications to Kubernetes and more to come.

        I followed your Ideas and set up separate App and Infrastructure Repos with different Git Approaches. A CI job in the app repo commits the image tag change to the Infra Repo where argocd watches each folder. Would love to hear your opinion on the folder structure, my current intention is to have one base dir and then all the applications in the env folder.

        ├── base
        ├── env
        │ ├── app1-production
        │ ├── app1-staging
        │ ├── app1-testing
        │ ├── app2-production
        │ ├── app2-staging
        │ └── app2-testing
        └── variants
        ├── non-prod
        ├── prod

        1. Yes this looks good assuming that app1 and app2 are very very very similar and share the same configuration.

          You could also split per app the folders and do

          env/app1/production
          env/app1/staging
          env/app2/production
          env/app2/staging

          This way you can put common stuff of app1 and app2 in the middle folder.

          If app1 and app2 are completely different, I would just have multiple instances of the structure shown in the article.
          Remember also that Kustomize supports reading configuration from other Git repositories.

          So ask yourself, how “similar” app1 and app2 are. The structure you suggest is good only if they have very high coupling (as far as configuration is concerned).
          If they are unrelated or have different configuration there is no reason to mix them like this.

          1. Hi Kostis, I have another concern. So my current plan is the following, I have on Infrastructure Git Repository, in there 10 Applications with each 3 Environments gonna live there. Each application repositories checkouts the infra repo and then changes with a simple yq command the value of the image tag and commit the change to the Infrastructure repository.

            run: yq -i ‘.image.tag=”${{ inputs.image_tag }}”‘ ${{ inputs.values_file }}

            I dont really have a security concern, rather I am concerned that a developers might touch the CI Job and pass the wrong path like env/frontend-webapp/production/values.yml instead of env/broker/production/values.yml and now suddenly a completely wrong image tag is deployed.

            Do you have any recommendation to make this workflow more secure to prevent human error of deploying a wrong tag?

          2. You didn’t mention which CI system you use, but in general I am against freetext fields in CI jobs (and this was true even before GitOps)

            You should either have a CI system that creates/fills predefined choices, or have simple hardcoded values (i.e. a pipeline that promotes whatever is in staging to productio without any capacity to override the version).

            That being said, you can also add an extra layer of security by making everything a Pull Request. So the CI job doesn’t actually commit to an environment. It only creates a PR against it and then another human needs to confirm. You can start with this pattern at least in the production environments and see if you can automate it later.

  8. Thank you for this great article. I have a question regarding folder based environment promotion. Is there a way to manage merge permissions for different folders somehow? For example only test system team can deploy and approve change for Test environment and SRE can approve changes for Prod environments

    1. There isn’t any standard mechanism for this. You need to check your Git provider. There are some approaches (using submodules or Codeowner files). But I think
      simply catching these on CI or with Pull requests are much easier.

      As explained in the article it is best to have this automated anyway (or have 2 git repos for prod and non-prod)

      1. We do need pull request approvers for different environments. Then it can become a hassle to have several repositories to be able to assign different permissions.

  9. Thank you for the great article. A couple of questions.

    1) If a developer needs to change files in chart/ or common/, won’t that automatically be picked up in all of the environments? I’m assuming that the Argo applications are looking at the root of this repository for changes rather than a specific environments (i.e. evns/prod/).

    2) Our development teams current process use features branches and on push, the app is deployed to a feature namespace that is automatically created. The environment-per-folder approach works well for our static environments but has some challenges for these dynamic environments. We are looking at using Kubernetes manifests to dynamically deploy Argo CD applications (https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/). Any suggestions for this sort of a strategy?

    1. 1) Yes and No. The section “Making changes to multiple environments at once” explains exactly this scenario. Yes in the sense that if you do make a change in common it is indeed
      picked by all environments. No, because you will already have “tested” this change to all environments before committing to common. Check the picture in that section. In step 3
      the extra setting is committed to “eu common”, but it doesn’t really affect anything because the change was already present in both environments.

      2) This is actually the third article in the series that I am currently writing. The new article that I am going to write will use https://argocd-applicationset.readthedocs.io/en/stable/Generators-Pull-Request/. In general my advice is to look at application sets.

      1. Hi Kostis, thanks for the great article! I have been taking much inspiration from the approach you outline here. There are just a few more sticking points to solve before it all really clicks – especially when trying to map this to Helm.

        I would like to re-open discussion on Steve’s first point if possible. With Helm, we have a collection of parameterized manifests (templates) and a collection of environment specific values/configuration files (separated by folders not branches of course!). The configuration files contain the parameters that the templating engine will insert into our templates. It is clear that whenever a change is merged which updates the configuration file for a particular environment that the GitOps controller (eg: ArgoCD) will synchronize state for (ie: deploy to) that environment just as desired. For example, if you merge a change which modifies values-staging.yaml or envs/staging/values-settings.yaml as you describe, then ArgoCD will deploy the updated chart template + configuration values to the staging environment. If you update the collection of Helm chart template files themselves however, ArgoCD will deploy such a change to all environments simultaneously.

        Best I can tell this is a fact of life when dealing with local, “unpacked” Helm charts (ie: just a directory containing Chart.yaml, values-.yaml files and templates). The GitOps controller tracks the directory containing the chart and its templates which are environment agnostic and so all are deployed at once. I think your example with Kustomize works because Kustomize is bundling exclusively manifests which contain references to released (ie: in registry) container images. There is a clear separation between development and deployment. We only deploy to all environments at once, if we touch a “common” directory or “base” overlay. The example with unpacked Helm charts seems a bit different. The /templates directory for a chart can be seen as another /common, but one which is required to change as part of Helm chart development.

        This scenario and its solution may reduce to the classic problem of wanting to source a Helm chart from one location, namely the Helm chart repo (for which ArgoCD supports with chart version pinning so each environment to depend on/deploy a different chart version!), while sourcing our values/configuration files from another repo. I may be making an argument for separating development from deployment and using explicitly versioned and packaged (zipped) Helm chart releases.

        Wondering if this problem summary sounds reasonable to you or if there is any way to do this with local/unpacked charts?

        Your writing led to the realization that one can GitOps-ify their deployments w/o optimizing promotion between environments from the get go. Still it would be nice to know that the GitOps journey is starting off on the right foot! Thanks again!

        1. Hello

          Yes there are so many Helm setups (from Git/from Helm repo) and patterns (values in same git repo/values in other Git repo) that covering all variations would need a separate article.

          Your analysis is correct. My quick fix would be to put the Helm chart itself in the same folder as values and have different Helm charts per folder/environments. You obviously have a duplication here and you need to decide the tradeoffs. Some big companies abandon Helm altogether and use their own templating system either outside of ArgoCD or by implementing an Argo CD configuration plugin.

          If you haven’t seen this is already there is another important article in the series that explains how to avoid the problem of changing too many environments at once https://codefresh.io/blog/argo-cd-preview-diff/

          As a general rule I prefer using Helm only for external software (certmanager, nginx, sealed-secrets etc) and Kustomize for the internal apps (i.e. those created by devs that actually need promotion between environments)

          1. Hi Kostis and Calvin,

            Thank you for a great article and a poignant question!

            We’re currently looking at the same challenge, of sharing helm charts between environments and variants, and at first, we thought we could use kustomize to easily overlay a new version of the chart in f.e. dev – but it seems as though this is a non-starter, as kustomize won’t apply patches to the helmCharts statement in its base – only allow you to patch the generated manifests of that base helm chart.

            We are hosting a 3rd party application using the vendor-provided helm chart, so we’re tied to helm for managing our service, but for other helm charts such as cert-manager and nginx-ingress-controller, we also see the benefit of being able to promote both the “main” application and cert-manager etc between environments.

            Based on what I’ve found so far and this article, it seems that the only approach is to go with what Kostis suggested in this comment thread, moving the Chart.yaml files into the same folder as the environment value files and live with the duplication..?

  10. Hi Kostas,

    Thank you for the article.
    Couple questions about splitting application code and configuration (k8s manifests) to separate repos.
    Lets say we have an app change (add new envar) which is a change in application code and k8s manifest. With a single repo structure, developers can easily test locally (i.e. deploy to minikube and test if they didn’t do any mistakes in k8s manifests) and create a new image version with the git tag corresponding to our app+k8s manifest change.

    If we split application code and configuration (k8s manifests) to separate repos, what is the recommended approach for local app development?
    In other words what is the easiest way to quickly test a change locally in our app + k8s config? Just making a change in 2 repos and configuring local tooling to pull those changes? Or there are betters ways of doing it?
    Also, doesn’t it become a bit tricky to propagate that change to environments? Just because the change is now in 2 separate repos which have 2 different git tags.

    Many thanks

    1. Local development for Kubernetes apps is a whole topic on its own.

      I would use a dedicated solution such as Telepresence, Tilt, okteto, devspace, garden.io etc.
      We have some articles already on these

      See

      Regarding propagation, remember that GitOps controllers know nothing about source code. So in theory, whenever you change the source code, no deployment happens. You need to push the code to an image AND update the manifest in order for the GitOps controller to actually do something. So even though the change is in two Git repos, the GitOps agent is only working with the manifest repo (so only one repo).

  11. Hi, thanks for your very interesting article! We are already practicing a similar GitOps model, however, we are strongly bound to regulatory requirements separating developer changes (dev,test,preprod) from operational changes (prod). We would like to continue with the “one-repo-approach” but we don’t how to restrict access for developers only.

    In your article you say: “This problem can be easily solved with automation, validation checks, or even manual approvals …”. Can you give an example how you would solve this? It would be of great help

    1. Hello. Are you talking about source code changes or infra changes? Because all source code is in a separate Git repository anyway. Developers do not normally have access to the Git repo that handles Kubernetes manifests (regardless of if it uses folders or branches on its own).

      Or do you mean something different?

  12. Hey,

    first of all I want to thank you for your great article. I am just curious how I should handle promotions of an application that depends on new static configurations. Of course, manually I can just create a MR and put everything together, but in an automated pipeline the promotion of the app versions will be done by the CI/CD system, whereas the configuration must be done manually (as it is different from the previous env). How should I therefore put everything together to one single release?

    1. Hello
      It is impossible to answer this in a comment as it depends on the capabilities of your CI system. I have an example with Github actions
      in the upcoming Level 2 GitOps certification.

      May I suggest you start a new discussion in the Argo CD github? And then you can describe your assumptions, limitations, current workflow
      and what CI system you have there?

      1. I have the exact same concern about having any CI or automated system performing the promotion, the CI system itself is irelevant.
        For example, when you need to have a different value for an environment in each environment and have the image tag bumped to a new version which does need to be the same in all environments.
        This cannot be done without human intervention.

          1. Today I’m not handling it without human intervention – we do not have an automated promotion system in place.
            We open PRs to promote environments.
            I would love to automate the promotion of environments and have also tested a tool which does that (telefonistka), but it also cannot handle the use case that I’ve described before.
            In addition, having automated promotions makes git blame useless since all commits are done by a bot user.

          2. Then I don’t think we are saying something different. The first commit must always happen by a human since they have the knowledge for what needs to be changed (image version vs configuration vs both).
            Subsequent promotions can be done either by the system or humans depending on complexity. This was true before GitOps/ArgoCD and it is still true with GitOps/Argo CD.

            The article explains how to do promotions with GitOps vs the traditional way. Not how to make a system understand what a human would do.

            If your promotions require human intervention because a system cannot understand what you wish to promote, then you will still have human intervention with GitOps.

  13. Hi, thanks for your great and interesting article!

    I have some questions about the “environment-per-folder” approach. I only have two or three environments, and I don’t want to use more repositories.

    If one of CI processes pushes before the other. After that another process tries to push, then its local copy will be outdated, so it will have to pull, and then push again. However, a situation can easily arise when several CI processes end up writing to the same GitOps repository, which leads to a conflict.

    What is the suggested approach to avoid the situation? Maybe solving this problem, by adding a complex retry mechanism to the CI steps.

    Many thanks

    1. While this question is valid, it was always a problem even with just source code. What does your CI system do if 5 developers want to merge source code at the same time?
      Yes a retry approach is a possible solution.

      However do you actually see this problem in practice? Do you want to deploy to different environments AT THE EXACT SAME TIME? The whole point of having environments is to promote stuff. So a commit happens to the qa env. Then 2 hours later a commit happens to the staging env and 1 hour later a commit happens to the prod env.

      What scenario do you have where the CI system commits to multiple folders at the exact same time?

      1. First of all I want to thank your reply.

        I use the ArgoCD in my environment, it will be in units of application. If 2 or more developers trigger CI processes to build image, and update different kubernetes yaml at the same time.

        However, I only have one repository to store manifest, and using the “environment-per-folder” approach. A situation can easily arise when several CI processes end up writing to the same config repository, which leads to a conflict.

        If I use pull request mechanism, maybe can solve the problem. But, I would like to more automatically avoid the situation or reduce human intervention, especially in the development environment.

        1. If by “Git conflicts” you mean the situation where Git says you need to pull first, then this should already be handled by any CI system (probably with retries). It is the exact same problem as with source code

          If by “Git conflicts” you mean the case where two files are changed at the same time at the same place, then with environments per folder you should never have this issue as it doesn’t make sense to deploy a different version to the same environment at the same time.

          Hope that helps.

  14. Hi Kostis,
    An inspiring article, which helps me a lot to onboard multi env based on helm charts.

    I am trying to create a seperation between NonProd and Prod env. And I think it will eventually end up with repo per env, which is not a good choice.

    Here are my 2 cents:

    1. The single repo multi folders, still keep the chart and values on the same repo, which is not a good practice. I understand the requirements for an easy one-commit per change.

    2. Using helm with ArgoCD have a missing feature that was planned for Argo 2.4 and now on 2.5 and still open (Use chart from one repo and values from another)[https://github.com/argoproj/argo-cd/issues/2789]. The ability to use values on a different git repository.

    This means that the separated git repo (Prod vs. non-Prod), is not applicable, at least on the ArgoCD + Helm model. Are you aware of any alternatives?

    3. Change of charts or K8s manifest per env.
    I understand the concepts of variables and the classification of different kinds. I think that change on charts is a real challenge, and short living charts per env [see below], might be a solution. We can handle them as extra settings (on your drawing). However, the life cycle of (Delete, copy to the base, etc.), can be error pruned .

    For example
    chart/
    […chart files here..]
    common/
    values-common.yml
    variants/
    prod/
    values-prod.yml
    non-prod/
    Values-non-prod.yml
    […other variants…]
    charts-short-lived/ <– Once in base it will be removed
    prod-eu/
    charts/
    ms1/ ….
    envs/
    prod-eu/
    values-env-default.yaml
    values-replicas.yaml
    values-version.yaml
    values-settings.yaml
    [..other environments…]

    are there any alternatives for chart change management?

    Again, Thanks for the article, looking forward to part 3,

    "Model your gitops env with helm charts"

    1. Hey Chen

      Thanks for the feedback. Regarding the separate Git repo for Helm and values, I think it makes sense to wait for Argo CD 2.5, 2.6 to be released and see the “multiple-sources” feature in action. Then I can revise this article or write a new one.

      Regarding your idea with short lived charts up-to-you. The Helm structure I show in the article is just a starting point. Some people also commented that they would put the whole chart in each folder and not just the values (allowing changes to the chart itself for each environment). It depends on your organization.

      Part 3 will be either “Preview environments with ArgoCD” or “Enforcing diffs with ArgoCD” 🙂

  15. Hi Kostas,

    Thank you for the article.
    One doubt i have.
    An example would be an application source code has been updated to use some env variables defined some config.yaml and this config.yaml is part of the base/ folder.
    If we update the config.yaml in the base folder all the ArgoCD applications in all the environments will be outofsync right ? This is not expected right ? We need to promote changes from one environment to another environment.
    May i know how do we handle this ?

    1. If the config.yaml applies to a single environment it shouldn’t be in the base folder. Instead it should be placed in the overlay of the respective environment.

      If you want to apply something to all environments in a gradual way there is a specific section in the article titled : “Making changes to multiple environments at once” that covers exactly this scenario. Doesn’t this answer your question?

      1. “If you adopt this approach, it means that you never apply new changes to the base folder directly. If you want a change to happen to all environments, you first apply the change to individual environments and/or variants and then backport it to the base folder while simultaneously removing it from all downstream folders.”

        That seems to be really error prone …

  16. Hi, this was very insightful.
    However, I have a question, I have 4 environments- dev, qa, stage and prod. And I am promoting the builds(features) across environments. Developer1 is ready with feature1 commits his code to main branch and the feature is promoted from dev -> QA -> staging. And in the staging environment, some bug/change is required. Meanwhile, developer2 has started working on feature2(feature2 is branched out from main branch which also includes feature1). Now, developer2 has to wait for developer1 to fix feature1 to release feature2 . Else revert feature1 and push feature2 alone which is an overhead.
    How do I prevent this Race condition?

    1. Not sure I understand your question. The article talks about organizing Kubernetes manifests and not source code. The Git repository that developers use is outside the scope of this article.

      If by “feature” you mean manifest changes, then it depends on whether feature1 and feature2 are independent or not. If they are not independent then they should indeed be deployed in order, there is no way around it. If they are independent then dev2 should start their changes from whatever is in production (that doesn’t include feature 1)

      Let me know if you mean something else.

  17. Hi there, this article is very insightful. I’m also thinking the same approach to do CD for our environments. But facing a very challenging problem. We have thousands of environments…Basically, more than 1000 customers, each customer would need test/stage/prod env, that in total more than 3000 environments. I am thinking about folder structure:
    /repo
    -/customer1
    -/customer1/test
    -/customer1/stage
    -/customer1/prod
    -/customer2
    -/customer2/test
    -/customer2/stage
    -/customer2/prod

    Image more than 1000 thousands of folders and sub-folders…So I feel still not able to use gitops to do this. Do you have any thoughts on this particular case?

  18. Hi Kostis,

    Thanks for the article! In the article you say for promoting changes it should be “performed automatically by your CI system or other orchestration tool” do you know of any articles or tools that show good ways of automating this?

    I have seen some companies have implemented whole toolsets around that problem but have not seen anything opensource or a vendor of a tool that solves the issue.

    1. Yes there are several efforts on that front. Some are open source. I haven’t seen anything that is truly universal. Most are tied to the needs of specific workflows. I might write another article to cover some of them. Codefresh is also working on that area as well right now 🙂

      1. Since it is over one year later now. Do you have any more insights on such orchestration tools? Or anything from Codefresh?

        1. Hello

          We actually launched a commercial product specifically for environments and promotions.

          Finally there is a new guide for Argo CD Applicationsets that a lot of people requested (works for Argo OSS and not just Codefresh)
          https://codefresh.io/blog/how-to-structure-your-argo-cd-repositories-using-application-sets/

          Any feedback welcome

  19. Hi Kostis, thanks so much for a resourceful article!

    However, I’d like to challenge the glaring drawbacks you mentioned on env-per-branch:
    1) The most glaring issues with environment branches is the order of commits
    This usually not happens a lot, and only a issue if the promotion rule must stick to everything-in-prod-has-to-be-merged from lower env. But it’s not true in reality as no one can stop you from simply copy a file from another branch and make a commit, and actually it could happen even without another branch as base – e.g. another high-performance resource is added only available to prod.

    2) and the danger of bringing unwanted changes when you merge from one environment to another.
    This is not true as env-based params are usually templated and supplied by CD pipelines. e.g. the scale number of a Pod is always different between QA and Prod, therefore it’s not hardcoded in manifest anyway. Assuming manifest repo is separated from developer source repo, the typical changes between envs are most likely small, which shouldn’t be much a difference than per-folder copying. In case it’s a large set of changes, then PR/merge is actually easier to track what’s been changed without the addition cp. Of course, you can always ignore some changes in the merge and copy other files over or even create new before pushing the changes as a whole new commit. Not much difference than the per-folder PR/merge.

    The major benefit I can tell for the per-folder model is that the way it can use a base to reduce repetitive components in templating or scripting of each env, assuming a lot of the configurations would be the same or alike for the same project. This is a valid benefit especially when the number of env increases, but per-branch can also work well by extracting this base (it’s a template anyway) into a global template repo. Even better, operation team could create global base templates that works for similar type of projects and reduces even more repetitives.

    1. “as no one can stop you from simply copy a file from another branch and make a commit, and actually it could happen even without another branch as base”

      But if you are manually copying files between branches, then what is the benefit of using branches in the first place? You can do that with folders as well.

      ” In case it’s a large set of changes, then PR/merge is actually easier to track what’s been changed without the addition cp”

      I don’t think this is the case. Unless I am missing a tool. It is super trivial to diff between 5 folders/files (vimdiff). What tool exists that shows me a diff between 5 PRs?

      “Not much difference than the per-folder PR/merge.”

      That is actually reinforcing my point. If you use branches, but do the same things as using folders, why not use folders from scratch?

  20. Hi Kostis,

    Thank you for this amazing article!

    We are using this approach with Flux CD but we are not using the components (like you demonstrated with the variants folder). This is something that we should really consider, because separating common configurations from env-specific ones makes a lot of sense!

    We’re also not copying files around, and I really want to move to this method, but there’s a couple of things I’m missing:
    1. What would trigger the automated CD pipeline that handles promotion from one env to the other? Is it triggered manually by a human selecting source and target envs?
    2. Considering the first question, how do you handle the promotion of dozens of microservices that make up your application? Assuming each microservice can be promoted separately, if promotion is manually triggered, that would mean a lot of hard work for people like QA.

    Again, I appreciate the effort you put into this article.

    1. Hey Roi

      Yes the trigger can be a CI pipeline where users select source and target env. Here is an example with Github actions https://github.com/codefresh-contrib/gitops-cert-level-2-examples/blob/main/.github/workflows/promote.yml#L11

      You could do this manually for prod or automatically for non-prod environments. For micro-services, you could have one pipeline per microservice and a single one that promotes everything.
      Alternatively you could add drop-downs where the user can select which micro-services to promote and which to leave behind.

      Of course you are free to use any custom solution (e.g. a developer portal) and not your CI system. Up to you.

  21. Brilliant blog, Could you share a real folder hierarchy for helm?, it should be very useful, thanks!

    1. I don’t have one right now. But it would be easy to create one given the description on how it would work with Helm at the end of the article. Is there any particular issue that you face with Helm?

      1. Hi Kostis,
        My main issue here is trying to understand how how to manage different chart versions ( not values ) for different environments with one unique repo & branch( for example that chart v1 has only 1 web app in PRD, and the chart v2 has 2 web apps in TEST), my guess will be that you will propose folders 🙂

        -TEST
        –charts
        —…
        –templates
        —…
        –values.yaml
        -PRD
        –charts
        —…
        –templates
        —…
        –values.yaml

        What do you think?.

        Kind Regards.
        Max.

        1. Yes the suggestion is to use different folders. I think that another person asked that as well and I have offered the same answer in the comments of the article.

          1. Is there any way to be more DRY here? In the same way that a values.yaml can merge/override key values with a values-dev.yaml, can there be some sort of Chart.dev.yaml in some way/form? Or is the only option to duplicate completely and move away from being DRY?

          2. If DRY is your goal, I think you need to move away from Helm completely and go to a proper templating system. Check jsonnet, ytt and kpt for some suggestions.
            Several companies of course use their own in-house system.

  22. Hi Kostis,

    Thank you for this article. My question relates to database migrations in a GitOps folder-per-environment setup, specifically when using Atlasgo and their CRD for providing a schema via a ConfigMap (I believe you recently covered atlasgo migrations so are familiar with them?)

    My sql schema is generated from entityframework code-first migrations and it requires an atlasgo manifest to apply the required schema to a db. I would assume then that the CI merge to main that builds app images should also publish the generated sql schema script to some asset feed/folder. This follows the strategy that CI just produces assets for deployment and nothing else.

    A separate CI pipeline (or some tool like Kargo) triggers from the previous build CI pipeline completion, to update app image version in dev stage deployment manifest and also copies the sql script into the dev stage overlay folder from the asset feed. A configMapGenerator in kustomization.yaml in base ensures the relevant sql file is used from whichever overlay.

    So in this regard, a promotion of schema is a copy of the sql script from one state’s overlay to another. Would this sound about right to you?

    1. Hello

      Yes I covered Atlas here

      https://codefresh.io/blog/database-migrations-in-the-era-of-kubernetes-microservices/
      https://codefresh.io/blog/using-gitops-for-databases/

      While what you suggest (copy sql script from one overlay to the other) sounds good me, as I explain in both articles I would
      treat database migrations separately with their own lifecycle. So I would never mix database migrations with normal application promotions.
      I know that most companies do both at the same time, but in my experience this creates several issues.

      So essentially by the time an application promotion happens the underlying database is already migrated in a completely separate process so there are no dependent pipelines as you suggest.

      1. It’s true that db schema changes are rarely of the same cadence as app code changes, so yes it feels like they have their own lifecycles.

        However, from an EFCore point of view, the schema is inherently coupled to the app code when using code-first migrations that define that schema – you use a dotnet tool to generate the script from the code.

        Doesn’t this imply that the deployed schema should (and can only) change when the app that creates that database schema is also deployed which would imply a coupled lifecycle? Or do you suggest rather than generating scripts, the schema should be managed separately? Maybe I missing a piece of the puzzle?

        Just to add a little context to my questions, I have used a schema bundled in docker container approach, whereby it would be deployed as a job that the app containers were dependent on. Simple version promotion.

        This container approach has been effective but now I am using crossplane compositions so wanted a solution that uses the atlasgo operator via a sqlclaim that applies the schema via configmap.

        1. Yes I suggest to handle schema separately. With its own lifecycle and its own tools. Each database migration should be made transparently unrelated to application code.

          It is ok to use ORMs for accessing data stores, but I am personally AGAINST using them to manage db scripts.

  23. Hi Kostis,
    During major release deployments, we deploy 50 services across 5 different prod environments, but during minor release, it can be any number of applications across all environments.

    How to handle these variable number of applications in applicationSet. Is there a way to provide a list of apps from ArgoCD UI or GIT?

    Could not find anything except creating a different application set for every major minor release.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment

Ready to Get Started?
  • safer deployments
  • More frequent deployments
  • resilient deployments