Better Kubernetes at the Edge with Argo CD and Codefresh

Better Kubernetes at the Edge with Argo CD and Codefresh

8 min read

Kubernetes at the edge has become extremely popular with retail companies like Chik-Fil-A and Starbucks,  leading the way as famous examples to the more exotic US Air Force deploying Kubernetes on F-16s. At Codefresh we’ve seen and helped implement every kind of edge deployment from clusters in retail stores, mobile clusters in vehicles, air-gapped clusters for telecoms, and lots more. In this article, we’ll explore some of the patterns that have made Codefresh, GitOps, and Argo CD so successful to help teams tackle the challenges that go with having clusters in diverse locations.

What is Edge Kubernetes

Edge Kubernetes is any kind of Kubernetes deployment happening on-location rather than in a traditional data center. Edge clusters often have subpar and unreliable networking, are difficult to access for maintenance and need to be 100% rock walkaway safe and reliable.

Examples of Edge Kubernetes Deployments

Edge Kubernetes Clusters in Retail Locations

Retailers commonly run Kubernetes clusters in each location to provide services locally with high reliability. Their deployments often have these things in common

  • Availability: High availability with multiple nodes and minimal memory/cpu constraints, single-node clusters still common.
  • Networking: Closed to ingress traffic with municipal internet providers and dynamic IP addresses
  • Grouping: Typically grouped by region so updates can be applied gradually
  • Access: Poor physical security with many people having access.

Edge Kubernetes Clusters in vehicles, planes, and ships

Kubernetes in vehicles has been popular in the F-16, commercial airliners, container ships and even passenger and cargo vehicles. These use cases have some unique considerations:

  • Availability: High availability with multiple nodes but often with tight memory/cpu constraints for power and space-saving, single-node clusters common in non-mission critical functions.
  • Networking: Unreliable and intermittent internet, often only available while parked/docked/landed, with updates blocked while in use
  • Grouping: Broken up into areas of concern or contract affiliation. 
  • Access: Can be easier to gain physical access as vehicles come into common ports. Physical security is often quite good.

Edge Kubernetes Clusters for Telecoms

Telcoms have tens of thousands of locations to consider from cell-towers to relay stations. Security is a primary concern.

  • Availability: High-availability, almost always mission critical with multiple nodes and minimal memory/cpu constraints, though power may be low in some locations.
  • Networking: Sometimes completely air-gapped for security, otherwise, excellent private network connectivity. 
  • Grouping: Typically broken up into regions and with overlapping coverage to support overall network and infrastructure performance.
  • Access: Remote locations with very few people accessing but part of critical infrastructure.

Edge Kubernetes Clusters for Healthcare and Hospitals

Similar to Telcoms, Hospitals and other Healthcare related edge clusters have some unique challenges and heavy regulation. 

  • Availability: High-availability, almost always mission critical with multiple nodes and minimal memory/cpu constraints.
  • Networking: Sometimes completely air-gapped for security, otherwise, high-speed reliable internet with strict firewalls and no ingress. Often part of a private network. 
  • Grouping: Typically broken up into regions, sometimes broken up by areas of concern within a specific building. Zero downtime updates only.

Access: Generally strong physical security with IT staffing onsite.

Edge Kubernetes Clusters Are Growing in Popularity

Walkaway Safe

The challenges of edge clusters are exactly why Kubernetes has been so popular. When deploying hardware to the field you want a system that is walkaway safe and will not need technicians to visit on site. Even a single-node Kubernetes cluster offers reliability by managing container restarts under normal operation. With multiple nodes and an HA control plan, a cluster can experience multiple hardware failures and continue to run without interruption. 

Declarative Configuration

Without Kubernetes edge servers typically run with imperative operations and setup. This kind of setup is very rigid and doesn’t leave a controller in place that can adapt to changes as needed to try to recover state. 

Everything Stored in Git

With a declarative configuration, we also get all state managed with the SCM tools that we’re used to. Source control and configuration enable change management and configuration tracking. 

GitOps and Argo for Edge

By itself, Argo CD is extremely resilient, especially in small setups without tons of clusters and apps. By installing Argo CD on an edge cluster, you can easily set a source of truth in Git over the internet and manage updates without ever accessing Argo CD directly. Argo CD will automatically pick up and apply changes. If the network is down at the edge location or the connection is slow, it will simply pick up the updates as soon as a network is available and download the new configuration which is usually fairly small (though image updates may be larger afterwards). 

Chik-fil-a has famously used Argo CD to manage it’s edge clusters for the last several years with great success. 

Challenges with Argo CD on Edge Kubernetes Clusters

The biggest challenges of using Argo CD come from the format of edge itself, networking. If you have a single edge cluster to manage a simple Argo CD instance is easy to install and manage. But if you have thousands of edge clusters you need a robust system to manage things. When it comes to edge there are basically four choices for deployment, all of which we cover in depth in our article on Argo CD architectures. For edge, we will summarize the options here: 

Hub and Spoke Argo CD, that is using a single Argo CD instance connecting to many clusters is the easiest from a management perspective because everything is on a single UI that can be managed form anywhere. But from an implementation standpoint it is one of the most difficult because edge clusters almost never have open egress. Some users have decided to use network tunneling to solve this problem with Cloudflare being one of the easiest choices to setup.

A “split-instance” architecture is similar to Hub and Spoke in that there is a single Argo CD instance but in addition to using a network tunnel Argo CD repo server is sharded across all connected clusters. Repo server essentially “thinks” it’s on the same cluster as the Argo CD install but will reduce latency when applying manifests and collecting metrics. 

The third option is to use stand alone Argo CD instances on each cluster. This provides the best reliability as there is no single point of failure for the entire system but it is the most difficult to manage because all of these instances need to be updated and maintained and operating or collecting data from each instance requires connecting to each Argo CD instance individually. This is impractical when thousands of clusters are involved.

The trade-off between ease of management and reliability is at the core issue between the first three choices. The fourth option solves both by using a control plane. A control plane manages networking with egress only connections from clusters. This is similar to how GitOps is always a “pull mechanism”, a control plane has the same style “pull connection” to the control plane, requiring egress only. The Codefresh control plane additionally has 3 huge benefits:

  1. Monitor and manage all Argo CD instances from a single interface
  2. Get sync status and app health status from a single interface
  3. Have excellent reliability with no central point of failure, if the control plane goes down, each instance will continue to sync without issue from Git.

Solving Edge Challenges with Codefresh, GitOps and Argo CD

As Argo Maintainers, and the creators of the most popular enterprise Argo distribution, Codefresh has helped more teams in more diverse use cases and environments deploy their software at scale using Argo CD than anyone on the planet. 

The Codefresh platform includes a control plane for managing thousands of Argo CD instances, along with Argo Rollouts, Argo Workflows, and Argo Events. This provides substantial benefits:

  1. Manage all edge clusters from a single UI
  2. Easily bootstrap new clusters
  3. Model changes and quality gates to promote changes across fleets of clusters
  4. Automatically diff and previous diffs between environments
  5. Keep development in a single UI so devs always know where to go
  6. Provide in depth metrics, telemetry, and reporting on application and environment status.
  7. Track productivity with built-in DORA metrics.

Fleet Management with Environments, Products, and Promotion

Managing fleets of edge clusters is easy using Codefresh’s Environments and Products. Environments group any numbers of clusters and namespaces regardless of the Argo CD instances used to manage them. Each instance can have both unique and shared configuration components. This is accomplished with Products.

Products create a link between Argo CD applications no matter where they are deployed along with diffing customization for how components are upgraded and kept unique in different environments. This means a simple YAML configuration can support thousands of edge clusters even when each cluster has some unique values.

Staging and Development YAML in Argo CD

When combined into a CRD called Promotion, you can model how changes from test environments or regions are promoted, validated, rolled back, and kept in sync. 

With a few short YAML files, the entire management of thousands of clusters across dozens of regions can be managed quickly, efficiently, and with incredible reliability.

GitOps Control-plane for Argo CD

Codefresh logo surrounded by Argo and Kubernetes logos

Managing thousands of GitOps instances with Codefresh is incredibly powerful. By default, all configurations are stored in Git and tracked through the control plane. This includes auto-updates, security alerts, and debugging.

The control plane also provides a common interface for all instances, when required engineers can quickly navigate to any application without having to know which cluster, or which Argo CD instance the application is operating on. This dramatically cuts down on the time to manage applications, reduces onboarding time, and abstracts 90% of the tedium of managing individual applications in a fleet of edge clusters.

Reliability with Independent Instances

The control plane provides insights and incredible usability and scalability improvements without introducing a single point of failure. Each instance or Argo CD can operate without network connection to the control plane and continue to update, promote changes, and automatically rollback according to defined policy. These instances also can have different versions from the control plane without any issues.

If instances lose internet connectivity they will simply resume synchronization and reporting telemetry and status as soon as the network is restored.

This setup also means if local technicians are operating onsite they can still interface directly with the GitOps runtime as needed.

Air-gapped Argo CD

This reliability also provides a robust platform for managing airgapped installations. Codefresh has managed numerous air gapped Argo CD installations. As these configurations can vary dramatically based on requirements, to learn more, contact sales.

Security

Codefresh is the leading security provider for Argo CD with a secure enterprise distribution, aggressive SLA and a proven track record of closing CVEs and hardening all components including Argo CD.

Insights

Codefresh GitOps goes far beyond what is tracked in Argo CD to answer questions on daily operations, and provide insight into what is happening not only at the Kubernetes level but also at the individual application level. 

Application insights within the Codefresh UI

What’s Next

To get started with better Kubernetes edge deployments, sign up now or schedule a demo.

Ready to Get Started?
  • safer deployments
  • More frequent deployments
  • resilient deployments

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

Ready to Get Started?
  • safer deployments
  • More frequent deployments
  • resilient deployments