A vision for GitOps 2.0 with Codefresh

In our previous article, we explained some of the issues we see with the current generation of GitOps tools (which we call GitOps 1.0). In this article, we will talk about the solution to those issues and what we expect from GitOps 2.0 – the next generation of GitOps tooling.

Visibility into the whole software lifecycle

The current generation of GitOps tools focuses only on the deployment part of an application artifact. The full software life cycle includes several other tasks until that point that deals with the packaging of the artifact, the unit tests, security scanning, etc. And even post-deployment there are several actions (such as running smoke tests) that are not covered by today’s GitOps tools.

This means that if you decide to adopt GitOps, you already need to have an existing CI setup which works effectively and produces the artifacts that your GitOps tool will use. This approach works well for small setups, but having to look in two different places (your CI system for the artifact and your GitOps solution for the deployment) does not scale well for large organizations with multiple applications.

Ideally, you would want a single deployment platform that follows the full journey of each code change, from commit to deployment. This way you will have full traceability on each production change and can also easily answer critical DevOps metrics such as lead time (the time it takes from a commit to reach production).

Observability and business metrics

As an extension to the previous point, we believe that GitOps is a process that affects not just developers/operators but also the business stakeholders (i.e. product owners and project managers). Gaining visibility into the full software lifecycle is vital for everyone involved in software delivery and not just those who handle source code.

As we explained in the GitOps 1.0 article, some of the most important questions a software team should consider are::

Does our production environment contain feature X?
Has feature X cleared our staging environment?
Are bugs X, Y only present in staging or also in production?
What percentage of deployments to environment X were successful and what had to be rolled back?
How many features exist in environment X but are not in environment Y yet?

These questions cannot be easily answered by GitOps tools today (in fact not even traditional non-GitOps deployment platforms can answer them). Having a GitOps solution in place that not only cares about Git hashes but also offers full visibility on the high-level metrics of each deployment will allow all stakeholders to monitor the deployment process and gain insights on what happens with each environment.

A very big gap today is the discrepancy between source code changes (i.e. Git commits) and business level features (i.e. issues from the ticket system such as JIRA) as they are typically handled by different systems. In order to create a correlation between them, developers often have to build additional cruft on top of the deployment platform and ever resort to custom low-level scripts that tie them together.

It is always important to remember that while on the surface a software project might seem like a series of code changes/commits, the visible value comes from business features that are getting shipped. Current GitOps tools can easily answer what Git hash is now deployed into a cluster, but more useful information like what features are present in that environment (and what features are waiting in another environment) is often missing or considered secondary knowledge.

Additionally, some of the classic DevOps metrics are very hard to monitor with current GitOps tools. Especially the lead time (time it takes for a commit to reach production) cannot be measured easily if your deployment is handled by multiple tools and custom glue code (a very common occurrence in companies of all sizes).

It should be clear that we need dedicated tools for all these questions. Ideally a GitOps solution should have native support for measuring and storing historical data for:

Business features implemented in each release
The journey of each software release from start to finish
Correlation between git hashes, features, rollbacks and environments
Several deployment metrics (i.e. lead time)

Right now it is very hard to get all these features in a single overview.

Promotion among different environments

Promoting a released artifact between different environments is arguably the biggest challenge with GitOps tools right now. The model of using a Pull request which triggers a deployment once it is merged works great for a single environment but breaks completely when it comes to different environments.

Promotion of a release can take many forms, such as a linear progression to increasingly more critical environments (such as dev -> qa -> staging -> production) or as a parallel deployment to equally significant but slightly different environments (e.g. deploying a release to different geolocations). Neither scenario is covered sufficiently today by the current GitOps tools.
Promotion between environments should be considered an integral part of any deployment solution. In the case of GitOps tooling, we can handle the promotion mechanism in a different number of ways, such as introducing an automation mechanism for Pull Requests (more on this in the next section) or even creating a higher-level abstraction layer for specifying different environments.

The end result should be that for any GitOps project a project stakeholder:

clearly sees the releases present on a current environments (including temporary dynamic environments)
can easily move a release to the next environment (e.g. staging to production)
can easily move a release to a previous environment (e.g. production to QA in order to examine a bug)
has a configuration mechanism in place to define multiple version of the same environment (i.e. for different geographical regions)

Promotion of environments should always be possible to automate via a CLI (i.e. a graphical dashboard is not enough for all use cases).

Achieving Continuous deployment and full Git automation

As we explained in our CI/CD/CDP article, Continuous Delivery is the process where each developer commit results in a release candidate that is ready to be sent to production. GitOps and Continuous Delivery are a perfect match, as the Pull Request process dictated by GitOps can function as a quality gate that approves/rejects the release candidate.

Continuous Deployment is the practice where every commit (that complies with the organization requirements) is automatically sent to production without any human intervention. This is the ultimate form of a deployment pipeline, as it takes manual steps completely out of the equation.

Using Continuous Deployment and GitOps is an open question right now, as a Pull Request is for all intents and purposes a manual process. There is no best practice on how to practice Continuous Deployment using GitOps.

Pull Requests (like all other Git actions) can be fully automated. At the time of writing, there is no GitOps tool that does that. To achieve continuous deployment with GitOps we would need a GitOps solution which not only passively handles the opening and closing of Pull Requests but also opens/closes Pull Requests on its own in an unattended manner.

Automating Pull Requests might sound like an easy task, but in order to give plenty of visibility on what is happening at any point in time, we need a deployment solution that treats Pull Requests as a first-class citizen, instead of simply responding to Pull Request events like most tools do right now.

Built-in handling for rollbacks and secrets

We explained in the first point of this article that current GitOps tools focus only on the deployment part of the release cycle and ignore all other associated tasks. You could argue that this is their primary focus and they don’t have to provide anything else regarding the software lifecycle.

Even if this was true, I consider secret management and automatic rollbacks an integral part of the deployment process. Unfortunately, both areas are currently left as an exercise to the reader if you try to adopt any of the existing GitOps tools.

We need a GitOps solution that handles secrets on its core offering. Secret management is also deeply connected with the problem of multi-environment installations that we already covered. A GitOps solution should not only offer a secure storage mechanism for all secrets, but also a comprehensive way on how to group them according to each environment and how to pass them in the respective cluster.

The underlying mechanism is not that critical. There is a common pattern that if you plan to use GitOps you must also store secrets in Git. I don’t see this as an essential requirement for handling secrets in GitOps if there is a better proposal.

On the topic of rollbacks, the ideal GitOps tool should be able to perform the following:

Deploy the latest Git hash to the cluster
Get information from metrics regarding the success of the deployment
If metrics are ok, mark the deployment as finished
If metrics show failures, then automatically rollback to the previous version
Update the Git repository with the new information so that the latest Git commit matches again what is deployed in the cluster.

Today to achieve this workflow, you need to write custom glue code in addition to adopting a GitOps tool.

Running GitOps at scale

The last critical component of our vision for GitOps 2.0 is running GitOps at scale. Working with a small number of environments might seem easy, but in several real-world scenarios, the number of pipelines, projects, deployments, and environment configuration within a company can quickly get out of control.

Scalability is important not only in regards to technical capacity but also in regards to visibility and observability as explained in the previous sections. Different environments will exist at different sync states, with multiple pull requests in flight and with multiple business features getting shipped.

We need a way to present all this information in both high-level and low-level representations. On one hand, we need a way to provide a bird’s eye view of the whole company/department deployments to all stakeholders who need an easy and understandable dashboard for monitoring deployments along with their associated metrics.

On the other hand, anybody should be able to drill down and look at a specific deployment and the business value it contains, along with the exact status of each business feature it expects.

Making the vision for GitOps 2.0 a reality

In this article, we described a vision for the next generation of GitOps tools. We have also described extensively the gaps of the existing GitOps tools in our previous article.

At Codefresh, we believe that this vision is the next big thing for deployments and this is why we think it deserves the name of GitOps 2.0.

As you might imagine we have already started working on an exciting new platform that will fulfill all these requirements and will expand the capabilities and focus of GitOps tools.