Scaling Argo CD Securely in 2024

Last updated 3/15/2024

Argo CD is used by some of the largest and most secure companies on earth with sensitive and very important workloads. In 2023, it’s all the more critical to make sure Argo CD is running securely within your organization. As Argo continues the process of CNCF graduation, additional security audits and improvements to project security are underway. Over the coming months, there will likely be additional security considerations discovered and this post will be updated to reflect those. In this post, we’ll look at some of the common ways DevOps teams scale Argo and the unique security challenges associated.

This post briefly mentions a sub-set of potential strategies for deploying Argo CD and specifically how they relate to security issues. For a comprehensive overview of potential architectures please see A Comprehensive Overview of Argo CD Architectures – 2024

Scaling with hub and spoke vs stand-alone instances.

These two models are typically associated with tradeoffs between security and management ease. Let’s explain what these models are and how they impact security.

Hub and Spoke Model for Deploying Applications with Argo CD

Argo CD can connect and manage multiple Kubernetes clusters. This is especially popular for DevOps teams who like the centralized management of a single instance of Argo CD and often leverage RBAC and SSO to control permissions between teams. Team A, B, C, and D can each operate on their own resources while an admin runs the single Argo CD instance.

Multiple hub-and-spoke Argo CD instances are sometimes used for regional control or handling cascading updates across a number of clusters.

Advantages

Easy to maintain (one instance of Argo CD to manage)
Centralized control
One instance to worry about
Better visibility across organization

Disadvantages

Single attack surface
RBAC and SSO are not perfect (more on that later)
Target cluster API’s must be accessible to the central instance
Single point of failure, if this instance goes down, all clusters cannot be updated
No separation between staging and production management
Argo CD performance may degrade with many applications and clusters as they scale

Stand-Alone Model

The stand-alone model basically means each cluster gets its own Argo CD instance. This often happens organically as different groups in organizations adopt Argo CD organically. It’s also very useful in edge clusters or clusters separated into different VPCs. Edge clusters may be installed in storefronts, hospitals, 5G towers, or just about anywhere else and often have trickier networking to deal with. So having an Argo CD instance on each instance means they can continue to operate off their git source of truth without needing to expose cluster API’s externally.

Likewise, the stand-alone model usually limits access to a small group of operators.

Advantages

Clusters operate independently without reliance on an external instance of Argo CD
Security surface is limited to each individual cluster, (compromising one does not compromise the rest)
Low memory/cpu overhead for Argo CD with small impact on each cluster.

Disadvantages

Difficult to manage
Many instances often lead to many instances being out-of-date, introducing a new class of security problem
Poor visibility across organizations
Difficult to implement policy across many instances

Safely using Helm, Kustomize, KSonnet, and other config management tools with multi-tenancy

Multi-tenancy is the practice of having many classes of users with different permissions on a single instance. “Hub and spoke” describes the model of deploying a single Argo CD instance to manage many clusters. Often times the Hub and spoke approach means teams will use muti-tenancy to support many teams. While the advantages of visibility are huge, there are some additional security concerns.

Almost every team using Argo CD uses config management tools such as Helm, Kustomize, KSonnet, or others. Argo CD event allows the easy creation of custom config management tools via plugins. These tools are incredibly flexible and open-ended with how they generate resources. A tool like JSonnet is a full programming language allowing full code execution in addition to templating. Helm or Kustomize allow referencing objects by relative paths (especially a concern in multi-tenant environments) or can reference external URIs. Referencing objects by relative paths can potentially lead to traversals that expose information from other charts or applications. That’s a big issue if you store secrets there.

Many DevOps teams store secrets and other variables within the values of their Helm charts. Storing secrets in this way should be considered less secure. This is another good reason and a reminder to use full secrets management solutions like Hashicorp Vault, or Sealed Secrets separate secrets from the configuration generated by config management tools.

External URIs are another area of concern. It is possible to limit which repositories can be added to Argo CD for example but a tool like Kustomize may potentially reference resources outside of that scope. It becomes critical to carefully monitor and control both access to Argo CD and git.

For this and other disadvantages previously mentioned, multi-tenancy should be carefully considered before adoption. We recommend using a control plane approach to limit the security attack surface of Argo CD. This can be used in combination with hub-and-spoke where it makes sense.

Tackling Security Challenges at Scale using a Control Plane

Codefresh Software Delivery Platform has two components 1) an enterprise distribution of Argo tools and 2) a control plane for managing runtimes at scale. The control plane has a few ways of addressing these and other security concerns at scale.

First, the control plane tracks all Argo instances, not just Argo CD, but Argo Workflows, Argo Events, and Argo Rollouts. The runtime management dashboard tracks individual versions and flags runtimes with known security issues with prompts to update to secure versions. Regardless of if you’re using a hub-and-spoke or stand-alone model, tracking runtimes with security notices is definitely a way to make sure you’re staying secure.

Second, the control plane also tracks the status of all applications across all runtimes. The control plane supports SSO and RBAC across all instances. That means visibility across all runtimes, which is a major advantage for teams using Argo CD to deploy at scale. The permissions can be scoped to individual runtimes making it much easier to get the security advantages of the stand-alone model with the manageability of the hub-and-spoke approach.

For most teams taking the hub-and-spoke approach the control plane actually provides a superior way of keeping everyone on the same page.

Likewise, no runtime relies on the control plane for operation. If the control plane were to go down for any reason, individual runtimes would continue to function without any problems.

In the Codefresh Control Plane, we track security issues along with all Argo instances and make upgrading very simple.

Conclusion

When deciding how to use Argo CD to do GitOps at scale, consider the security challenges of each approach, and the advantages of a control plane for managing all instances. Not only will it keep your applications and clusters more secure, but it will also make it easier for teams to work together and ultimately deliver software faster.

Additional Reading

Common blindspots when securing Argo CD – Recommended read which will likely be added to the security handbook.
Argo CD Security Manual
Argo CD Threat Model and Hardening Guide by ControlPlane – Primarily covers issues related to initial setup